« Raid Content Use | Main | Guild Churn by Server Type »

March 20, 2006

A nod's as good as a wink to a blind bat!

wink.jpg The facial expression of avatars at the Black Sun nightclub, social hotspot in Neal Stephenson's Snow Crash (1992), was so natural and nuanced, that it was "just as good as a face-to-face." Unfortunately today's metaverses (i.e., MMOGs) have a loooooong way to go before they approach that kind of sophistication.

There does not currently seem to be any consensus among game developers regarding the best way to implement facial expression. Different games employ different approaches. Yet each approach seems to grapple with the same basic problem: avatars' facial expressions are usually hard to see. (Another basic problem is how to control them easily, but I'll save that for another post.) The reason for this is that although avatar features and proportions tend to reflect those of real human bodies, the player's visual perspective usually does not. Players tend to play from a perspective that is at least several feet behind and slightly above their avatar's head. In other words, the player's viewpoint is disembodied. This unnatural distance makes it hard to see the expressions on avatars with realistically proportioned heads and faces.

Now one solution might simply be to force first-person perspective, at least in certain circumstances. Nick's data suggests that some players, especially women, actually prefer first-person perspective. However this solution creates different interactional problems. First, in first-person perspective, it's often difficult to notice other players and mobs unless you're looking right at them. The reason for this is that standard computer monitors don't allow for peripheral vision. A player's view of the game world is much narrower than it is in the physical world (see Hindmarsh et al. 2001). So the "disembodied," and usually zoomable, viewpoint in most games helps mitigate this lack of peripheral vision by increasing the player's field of view.

A second problem with first-person perspective is that when in it, you don't really know what your avatar is doing. If you can't see your avatar, you can't be sure, for example, that it waved, bowed, shrugged, pointed, etc. as you intended. The reason for this is that games don't simulate proprioception, the sense of the position of one's limbs that is independent of sight and touch. Indeed we are all like the unfortunate "Disembodied Lady" documented by Oliver Sacks (1986) when it comes to controlling our avatars. So forcing first-person perspective could hinder players' ability to use avatar gesture in sophisticated ways (e.g., for RP).

So given the low visibility of avatar faces in third-person (or first-person-zoomed) perspective, how are game developers handling facial expression? Here are a few approaches:

facialexp.jpg

No facial animation - some games, like World of Warcraft and EverQuest Online Adventures (PS2), don't animate facial expressions at all. WoW has slash commands for things like /smile, /frown, and /wink, but they merely produce text emotes, e.g., "Bob smiles at Eric." EQoA doesn't even provide text emotes. So orcs always look pissed off, even when they're happy.

Facial animation + text emote - other games, like Star Wars Galaxies and Second Life, provide some nice facial animations that are accompanied by corresponding text emotes. So for example in SWG, when you type /smile your avatar smiles and you get the message, "Bob smiles at Eric." Typing /wink makes your avatar wink and you also get the message, "Bob winks suggestively at Eric" (apparently you can't wink non-suggestively). Similarly in SL, a simple text emoticon, such as :-) :-D :-( ;-) or :-P, accompanies the facial animation. Now although facial animations in SWG and SL are rather detailed, from my own experience, I find that I usually orient to the text emotes instead. Again, when my viewpoint is zoomed out, avatar faces just are hard to see.

Body animation + text emote - now in EverQuest II, the devs took a creative approach. They gave up on trying to animate the face (in most cases) and instead tried to translate common facial expressions into gross bodily gestures. So for example, /smile produces the text emote, "Bob smiles at Eric," plus an odd swing of a curled arm with a fist, but no smile. Or /wink produces, "Bob winks at Eric," plus a couple of nudges with the elbow, but no wink. Now these body animations are indeed much more noticeable than facial animations, and I tend to notice them rather than their corresponding text emotes. However, while I applaud the devs' effort, the animations in EQ2 just don't feel like an adequate substitute for facial expressions. Perhaps if they used more appropriate body animations, ones that people actually do when they smile or wink in an exaggerated way (and also included the facial animations), it might work better.

Automatic positioning of avatars - another creative approach to the problem of seeing facial expressions can be found in There. There solves the problem by automatically rearranging avatars in the supposedly optimal configuration: a semi-circle. When an avatar steps into a chat group, all the avatars are automatically shuffled to make room in the semi-circle. With this configuration, the player can see every avatar easily, including faces. (In addition, the text commands - 'wink, 'smile, 'frown - appear in the chat bubbles in case you miss the animations themselves.) Another benefit of this approach is that you can see your avatar from the front and thus can see your own avatar's facial animations. None of the other approaches enable this. Now while this is indeed a clever and effective solution, I personally don't like the system pushing my avatar around. If I had to choose between control over my avatar's position and orientation and a clear view of facial expressions, I might pick the former.

Other possible approaches...

Close-up view - one simple, although perhaps not so elegant solution, would be to create a small close-up window of the face of your target. That way you could easily see the facial animations of the avatar you're looking at but also the gross embodied gestures of your own and others' avatars in the main view. While two disconnected views of the other's avatar may not be ideal, it might be worth experimenting with.

Amplified animations - another approach might be to amplify avatars' facial expressions making them bigger in size. I once heard Raph Koster suggest borrowing the style from anime in which characters are drawn with disproportionately large heads and eyes. This certainly make facial expressions more noticeable from a distance. I think another interesting technique in anime is that of super deformed ("super-D") emotes. When characters express an extreme emotion, their face and bodies become radically morphed for a few moments. For example, in Teen Titans, when characters get angry, their heads grow huge and menacing for a moment; when scared, they shrink to a tiny, baby-like form; or when overtaken by love, their heads balloon and their eyes turn into hearts. This use of super-D really feels a lot like an emote in an MMORPG to me since it is abrupt and lasts for only a few moments. I can easily imagine super-D versions of /rofl, /OMG, /cheer, /mad, /scared, or /goggleeyes. However, I'm not sure how well it would work for the more subtle, basic expressions like winks, smiles, or frowns. Also, while these kinds of techniques would feel "natural" in an anime-themed world, I'm not sure how well they would work in Norrath or a galaxy far far away (they might work better in the ever-humorous Azeroth).

So which approaches did I leave out? Which is the best?

Post by Bob

Posted at March 20, 2006 03:34 PM

Trackback Pings

TrackBack URL for this entry:
http://blogs.parc.com/cgi-bin/mt-tb.cgi/75

Listed below are links to weblogs that reference A nod's as good as a wink to a blind bat!:

» Als blikken konden doden... from Bashers
...dan zou er weinig aan de hand zijn in de huidige generatie onlinegames. De meeste avatars zijn immers nogal emotieloos. Socioloog Bob Moore gaat op PlayOn dieper in op gezichtsuitdrukkingen in massale online-multiplayergames. Hij bespreekt een aanta... [Read More]

Tracked on March 24, 2006 06:12 AM

Comments

Great analysis.

After some thinking on the topic, I have to wonder how much of the migration away from facial emotes to body gesture is about making communication more visible, and how much is about streamlining art production.

Many of the recent release games seem to have very little in the form of facial animation. City of Heroes, like WoW, has a largely unchanging face, and although I'm a big user of emotes, I have to confess that I hardly notice it.

By making faces static, how many "bones" are eliminated from an animated form? how many production hours are saved? If I substitute facial emotes with body gestures, as EQ2 did, I'm making use of points of articulation that already exist in the game for other purposes, so the level of effort isn't as significant.

If the issue IS about the development effort, and not about the difficulty in visibility, then I see EQ2's solution (albiet, with less exaggerated animations) to be the most likely path of future development. I enjoy the superdeformed emotions shown in anime and Teen Titans, but they'd still mean additional development time- perhaps more than traditional facial animations, as the morphing variations would need testing for all points of character customization.

---

Another possible element lies in the context-based animations. Rather than just animating on command, the text parser scans for keywords in dialogue and runs animations.

In SWG, if I mentioned the term "cold" in my text, the character shivered. There were many other context-based triggers that I can't remember, and they weren't always accurate, but they were better than the static "neutral" stance.

I had high hopes for the "groundbreaking" animations I saw in SWG. It demonstrated a potential that I assumed would be carried on to other games, with more articulation, more context-animation, and more realistic body gestures. Instead, developers seem to have moved away from this path.

Imagine a rich text parser that animated something when you started typing (SL has the "typing animation"... I'd prefer something more immersion-setting, not immersion breaking). When you hit "enter" and your text balloon appeared, it could do such things as:

- animate body based on your /mood or different /say options (like /angry /sad etc)
- animate the face, doing a quick syllable parsing and associating a lip movement similar to the sound...
- animate based on context, or on specific commands inserted in the text, at the appropriate time (example ":polite: please refrain from such conduct :threatening: you wouldn't like to see me angry")
- animates eye contact realistically, with characters meeting eye contact, looking around, etc, rather than the extremes of "ignoring entirely" to "focused gaze"

Heck, I was dreaming of the far-future day where I could define my own context-based emotes (perhaps even assign a % likelihood of it happening, so "cold" doesn't trigger the same emote every time) much like players can customize their appearance

-------
One more note:

City of Heroes offers animations for "lecture" or "argue" that gave your character entertaining arm gestures... there are probably about 3 or 4 different animation sets available. I've hotkeyed them and use them while chatting to add a little body language, but they have to be used sparingly.

Posted by: Chas at March 24, 2006 08:17 AM

At least for WoW, the solution seems simple: bring back the animated portraits of targeted characters that appeared in Warcraft III to allow for a close-up of facial animations. Throw in a little 3-D imaging and you could even rotate the targeted chracter to automatically reflect your avatar's viewing angle. I imagine it's only a matter of processing power/memory that led them to change back to static portraits for WoW.

Posted by: Dan at March 27, 2006 09:43 AM

I think Dan has a good point. I feel like it might also work to have multiple (albeit smaller) portraits persisting on the screen. Maybe as soon as you type someone's name in an emote, or they type your name, their portrait pops up on your screen to signify that you are "watching" them. It could be a static image of their expression that changes depending on their emotes. You could then, I don't know, walk away to stop listening, or click an X in the corner of the portrait.

Posted by: Max at April 6, 2006 11:26 AM

I think the idea of a player window that pops up that demonstrates the facial expressions of others is a good idea. The only downside might be that if you are in a group of 5-40 or more there could be a lot of emotes coming your way and that could prove difficult to work out. Maybe a manner to deal with the computer power issue would be to limit to friends on your friends list or some kind of list of people with whom you're concerned about regarding the emotes they may show you.

I would also like the idea of exaggerated emotions, but the difficulty there again is when you start dealing with larger amounts of people (this all based on playing wow, sorry haven't tried the others). Maybe a combination of the two with a window that appears and to grab your attention when someone types /love their eyes become hearts.

BTW: 1st post here, just found the site recenlty. I really like the work you all are doing.

Posted by: Kyle at April 7, 2006 08:58 AM

Thanks, Kyle and welcome. Yeah, 5-40 close-ups of character faces is not going to work. You could of course just have close-ups for group members (5-6), but this does not support communication with players outside your group.

I was thinking just one close-up for the character you have targeted. If you want to really see someone's facial expressions, you select him or her. This would probably work decently most of the time and would be very manageable. However, it would miss reactions from other characters whom you don't have targeted.

I've been thinking more and more that 1st-Person Perspective (1PP) might be the way to go. If you want to see characters' facial expressions, you just stand close enough to see them like in real life. Now to solve the problem with lack of proprioception, you could simply create a close-up view of your own avatar (i.e., a "paperdoll" view in the corner of your screen). That way you can see when your avatar waves, shrugs, bows, etc. You could also, depending on the size of the close-up, see your own facial expressions. With this approach, you only need ONE close-up view, your own character, instead of multiple. Lack of peripheral vision would still be a problem, but perhaps it would not be such a problem in the context of a conversation. When you start to travel or do combat, you could pop back into 3rd-Person Perspective.

Posted by: Bob at April 14, 2006 04:03 PM

I've often done the "paperdoll" view via EQ2's "inventory" page to preview the emotes and see what others see- it's a bit big on the scren, but serves the purpose Bob mentions.

In SWG, I could comfortably remain in 3rd person view with my character centered, zoomed out far enough to see my feet, and still make out most facial expressions.

Even the "only seeing the target" or "only seeing the group" portraits doesn't seem too bad either- how often do you, as a person, notice ALL the nonverbal expressions going on around you? You notice your friends (group), the focus of your attention (target) and maybe those immediately around your target. (missed, in this case.)

One method of handling the "portrait mode" would be to use the software to prioritize gestures- first groupmates, then targets, then within x of target, within x of character, etc. A max of 3-5 portraits would be shown at once, the others, missed (or only appearing in the chat log.)

Heck, you could even make it hyperlinked in the chat log, so if someone wants to go back and see it again, they could...

Posted by: Chas York at April 18, 2006 12:24 PM

very interesting conversation going here... it seems the approaches could be plotted on a spectrum with realism at one end and gestural on the other. using animations, povs, close ups to get a tighter view of the face, in other words, higher preicision/realism. Or, developing gestural languages and implementing ways of invoking gestural cues that would communicate not through their facial expression but through their use of a cultural coded gesture. Both are valid from the perspective of symbolic interaction/performance/speech analysis.
Interesting would be to have a coded version that could capture and analyse game play based on the instances and frequency of gestures: friendly to unfriendly, responsiveness of others; directionality of gesture (to a person or group). etc.,

Posted by: adrian Chan at May 20, 2006 05:24 PM

Related idea: Silent films used a combination of exaggerated facial gestures, feature enhancing make-up and "cut aways with text" to overcome the limitations of their medium. Perhaps similar strategies would be applicable here?

Posted by: Mike Wilson at June 8, 2006 06:23 AM

So far, the only solution I've found to limited field of view in first-person is some form of letterboxing (possible in EVE, possible with viewport in WoW). By resorting to a shorter and wider frame you get more of the stuff off to the side, but at the cost of losing screen real estate as well as any extended below/above perriphery.

Posted by: Edward at June 8, 2006 08:28 PM

I suppose all of that "blank" space in the letterbox view could be used to hold UI clutter. But in factI don't think you will still get enough peripheral vision without a weird fisheye effect. What is the field of view on a human? About 120 degrees or so? (I am asking, I have no idea). How much of a FoV would you need to get over that feeling that you need to constantly turn your head to see things you should be able to see?

With respect to that, I remember a couple of keyboard shortcuts from the old mac-only Marathon games that let you quickly glance to the left and right (Bungie referred to that action as a "vid"). That really helped, as the glance was lightning-quick and did not require you to change the direction you were running/facing/fighting. Perhaps something like that could help to eliminate the FoV problem (once people learned to vid regularly). Or it could just be even more cumbersome... I seem to recall it felt pretty natural after a while.

Posted by: Dave at June 12, 2006 08:12 AM

Eww, I loved Marathon. It was the first FPS I ever played. I think side-glancing is pretty standard in FPSs and other genres (e.g., Grand Theft Auto has it). I've never used it much. I think in most cases, the view just switches abruptly, which I find pretty disorienting. I would think quick panning would be better. Do any games do that?

The silent-film approach is also very interesting, and I think very applicable to virtual worlds. It reminds me of certain tricks in CG movies. I once heard someone from Pixar say that while the human face has 30 points of articulation, for the Incredibles they built the characters' faces with 300! That way they could create facial expressions that were much more exaggerated than real ones. Such displays of emotion nonetheless FEEL very real to the viewer, although obviously they are unrealistic in interesting ways. The fact that the animators even know how many points of articulation are in a human face shows that they've done their homework and that they deviating from reality in very deliberate ways.

Posted by: Bob at June 20, 2006 01:23 PM

"What is the field of view on a human? About 120 degrees or so?"

Approx 180 degrees. In game a 75-90 degree FOV is often used.

Posted by: Hirebrand at July 2, 2006 08:52 PM

Post a comment




Remember Me?

(you may use HTML tags for style)