« Rate of Advancement by Class | Main | Rate of Advancement by Race »

November 04, 2005

10 Things About Conversation in Virtual Worlds...

Here's a synopsis of the talk I gave recently at the Austin Game Conference on avatar-to-avatar communication...

Bob Moore, Nic Ducheneaut & Eric Nickell. "10 Things About Conversation in Virtual Worlds that Remind Me that I'm NOT in the Real World: Improving Interactional Realism in Massively Multiplayer Persistent Worlds." Austin Game Conference, Austin, TX, October 28, 2005.

Although massively multiplayer virtual worlds have made great strides in achieving visual realism (i.e., through detailed 3D models, lighting and physics simulation, motion capture, etc.), they are much less sophisticated in terms of interactional realism, or the simulation of face-to-face interaction. Developers of MMOs are starting to grapple with fundamental questions of how ordinary conversation works as a system and how it should be modeled.

embodiedactions.jpg
Human bodies doing ordinary activities

avatar2.jpg
Which activity is this player doing? (EQ2)

As a player of MMORPGs and virtual worlds, I routinely experience a state of immersion and connection when interacting with other players. However, there are many occasions on which this immersion is broken when the system seems to do the wrong thing. There is some slippage or awkwardness in the interaction that draws attention to the limitations of the system and reminds me that I'm not in a real-life conversation. The following are 10 features of avatar interaction systems that reduce interactional realism, plus 10 tips for increasing it.

Avatars...
1. Stand and do nothing
2. Don't speak in real time
3. Use telepathy
4. Look the wrong way
5. Stare at each other
6. Hide the player's gaze
7. Lack free gesticulation
8. Gesture for fixed durations
9. Don't tightly coordinate gestures and talk
10. Lack usable facial expressions

Avatars could...
1. Display embodied actions
2. Speak in real time
3. Give IM busy signals
4. Look at the speaker
5. Look away when speaking
6. Reveal player's gaze
7. Gesticulate freely
8. Hold gestures
9. Tightly coordinate gestures and talk
10. Have visible facial expressions

Each of these points are elaborated below.

Avatars...
1. Stand and do nothing: Many ordinary activities--looking through a bag, consulting a map, reading a book, trading items, talking with a friend remotely--are hidden from the public eye. This makes avatars appear lifeless even when the player is quite active. It also makes it difficult for players to manage these private activities with joint activities (e.g., looking through a bag and leaving the scene together with another player).
2. Don't speak in real time: Text-chat systems in virtual worlds, with the exception of There, hide the composition of a turn from the public eye. As a result, players cannot predictably achieve one-speaker-at-a-time, one-topic-at-a-time, or tight coordination (minimal gap and overlap between turns).
3. Use telepathy: Players can chat with anyone in the world at anytime. At times a player can be bombarded with multiple messages at the same time ("tell hell"). There's no way for a remote "caller" to know it a recipient is already engaged in a conversation(s).
4. Look the wrong way: Some interaction systems don't enable avatars to turn their heads semi-independently of their shoulders. Consequently avatars cannot be made to use eye contact in a multiparty conversation in a natural way.
5. Stare at each other: In the better eye gaze systems (e.g., EverQuest II), avatars tend to make eye contact at the right times, but they also tend to stare at each other. (In real life, people stare at each other in order to either threaten or flirt.)
6. Hide the player's gaze: Most avatar systems enable the player to decouple her view from the avatar's. The players can zoom out and pan 360-degrees. While this helps mitigate problems with the lack of peripheral vision, it also means that you never know what another player can see or where she is looking. This can make the coordination of gestures difficult.
7. Lack free gesticulation: All avatar systems I've seen in games implement gesture by giving players a list (short or long) of pre-defined gestures from which to choose. As a result, some forms of gesture are not possible, such as, those that are used to describe objects by simulating their shape, spatial relationships, and motion ("iconics"). Also, long lists of gestures are hard for players to learn.
8. Gesture for fixed durations: All avatar systems I've seen in games limit the duration of the pre-defined gestures to a fixed period. This makes it difficult for players to coordinate gestures with other players. They cannot "hold" a gesture until they can see that the recipient has seen it and has understood.
9. Don't tightly coordinate gestures and talk: In current avatar gesture systems, most gestures and text chat must be done as separate turns. As a result, gestures cannot be precisely timed to coincide with particular keywords in the chat. While this is not a problem for gestures that can perform an action on their own ("emblems" such as waves, nods, and shrugs), it makes gestures that are dependent on talk for their meaning difficult to perform. These include gestures used for referring or pointing ("deitics"), emphasizing ("beats"), and describing ("iconics").
10. Lack usable facial expressions: Some avatar systems implement no facial expression at all. Others offer a wide array of facial animations; however, these are often too difficult to see because players tend to zoom out their view. Yet zooming out itself is critical since it is the only way to really know what your avatar is doing.

Interactional realism in current MMOs could be increased by having avatars...
1. Display embodied actions: player opens bag, avatar looks through a bag; player opens map, avatar studies a map...
2. Speak in real time: post chat on a word-by-word or character-by-character basis (There is the model)
3. Give IM busy signals: when player is in a conversation, private messages from new speakers receive an automatic "busy" message
4. Look at the speaker: player clicks on other avatar to establish eye contact (as in Star Wars Galaxies or EverQuest II)
5. Look away when speaking: when typing, avatar looks at recipient(s) only intermittently
6. Reveal player's gaze: "not looking" indicator appears when player's view is too divergent from avatar's
7. Gesticulate freely: real-time motion capture using a camera enables players to use their own bodies to gesticulate freely
8. Hold gestures: player can 'hold' a pre-defined gesture by holding down the enter-key upon executing the gesture (user-controlled duration)
9. Tightly coordinate gestures and talk: player can tie a gesture to a particular word in the chat
10. Have visible facial expressions: a close-up view of an avatar's face appears when selected

For more on the organization of talk, gesture, eye gaze, and facial expression in real-life face-to-face interaction, see the following scholars: Paul Ekman, Charles Goodwin, Gail Jefferson, Adam Kendon, David McNeill, Harvey Sacks, and Emanuel Schegloff.

Posted by Bob

Posted at November 4, 2005 05:10 PM

Trackback Pings

TrackBack URL for this entry:
http://blogs.parc.com/cgi-bin/mt-tb.cgi/59

Listed below are links to weblogs that reference 10 Things About Conversation in Virtual Worlds...:

» Avatar realism from N=1: Population of One
Thinking of implementing emotions in virtual reality has me looking at how MMOGs have done it. So Thinking of implementing emotions in virtual reality has me looking at how MMOGs have done it. So [Read More]

Tracked on November 8, 2005 11:21 AM

» 10 Things About Conversation in Virtual Worlds that Remind Me that I'm NOT in the Real World from Guardian Unlimited: Gamesblog
What are the things avatar foibles that keep players from fully immersing in virtual worlds, and what does it mean for them when they're implemented? [Read More]

Tracked on November 11, 2005 01:58 AM

» PlayOn: 10 Things About Conversation in Virtual Worlds… from The Average Gamer
Over at PlayOn, Bob said that conversations between game avatars are still too limiting to be truly immersive. Most of them are great ideas but I think he focuses too much on replicating RL conversation and ignores the limitations of the player screen... [Read More]

Tracked on November 11, 2005 02:52 AM

» Conference Postcards from The Ant Nest
Two snippets from recent games conferences that caught my eye: How unexpressive are in-game avatars, compared to real people? I’m sure this could be put to use outside of an MMO context too - I can’t remember the last time an NPC complaine... [Read More]

Tracked on November 12, 2005 11:49 AM

» AGC: Bob Moore’s talk on avatars from Raph's Website
This was my favorite talk at AGC, and Bob has posted it on the PlayOn blog. ... [Read More]

Tracked on November 14, 2005 12:50 PM

Comments

Good thoughts here. Communication in VWs is often spoken of as if it's as rich as communication in real life, but that's just not the case, as you illustrate. Except in There.com, it's not a synchronous medium, and loses a lot of the power that synchronous communication has. One way players in some games have overcome part of this problem, of course, is with TeamSpeak and other VoIP tools. These have become indispensable in a game like EVE, where large numbers of players have to be tightly coordinated, but that's a slightly different story; that's specifically military-oriented communication, not the kind of "enhanced chat room" communication you're talking about here.

That said, it should be pointed out that two things on your second list (8. Hold gestures, and 9. Tightly coordinate gestures and talk) are already possible and have been implemented in Second Life. Residents can design or buy custom animations and poses that have adjustable durations, and can tie those things to words appearing in their chat.

Posted by: Mark Wallace at November 5, 2005 07:17 AM

Mark, thanks for your comments. Second Life is indeed an interesting place since players can create features that the devs didn't think of/implement. For example, I've seen player-created cell phones in SL that come complete with the animation for holding it to the avatar's ear and a busy message that says "Sorry I'm currently in an IM" when you go into Busy or Away mode. This is nearly what I want except I want the phone to be automatically activated when I open an IM window.

In terms of the durations of avatar animations in SL, I think you're referring to the fact that users can 'edit' an animation and adjust the duration (correct me if I'm wrong). I'm talking about something different. By "holding" the gesture, I mean adjusting the duration on the fly, rather than in advance, in response to what the recipient of the gesture is doing at that moment. If it appears that you're not noticing my pointing, I just want to be able to hold down a key until you do notice. I won't have time to open the animation editor and increase the duration of my pointing.

Posted by: Bob at November 5, 2005 08:47 AM

Ooh, that's a great idea for a feature, Bob, and you're right, I didn't get that on my first read.

It is possible to do something like this in SL, but it involves a "start/end" dialogue box rather than just a key-press, which would obviously be an easier way to work it. Also, you can turn animations on/off through slash commands entered on the chat line (that don't show up in chat). But you're right, there's no seamless way to do what you describe at this point, and it would be a great feature.

Posted by: Mark Wallace at November 6, 2005 08:00 AM

Hey Bob,

I certainly think your ideas are worthy of consideration, and are of much interest in the HCI community. Indeed, gaze and gesture are important for non-2D interfaces. Items 1, 2, 4, 5, 6 and 8 really ought to be standard in virtual environments, and it's a shame that so many good VEs lack such features.

But I think that some of these ideas are actually counter-intuitive for working within a virtual environment. For example, tracking multiple conversations in real life is an intensive task, but can be made exceptionally simple in a VE through the use of multiple chat panes, chat histories, and filters. A "busy" signal is almost irrelevant, because tracking when a user is involved in a conversation is no longer a binary operation, and best left to the user to handle.

Another issue is one of detail. Screen interfaces are unbelievably limited for the human eye to track details of expressional nuance whilst still trying to make sense of a wider visual scene. The decoupling of the user's view from the avatar's is a natural extension of this limited resolution.

With large surround screens capable of imparting more information, the ability to display facial expressions will take on much greater meaning, until then the only real method will be to either constrain the visual field to an avatar with which the user us interacting, or to provide a close-up in another pane. Of course, either of these solutions is only of use for two-way conversation, and would break down in multi-party interaction.

It's worth noting that not a single player of EQ2 I know has found the facial expressions of any use, mostly because they normally employ a wider visual field that doesn't allow such details to be resolved. It's a neat gimmick that everyone comments on the first time they zoom in upon another player, but it quickly fades from notice.

Real-time motion capture is a desirable method of human-computer interaction, but within the context of a real-time virtual environment requires such processing and network overheads that it would destroy the interactivity of any simulation. I'd point to Second Life as a perfect example of how even pre-recorded motion capture is transmitted so poorly that it destroys any immersive aspects of a VE. Transmitting real-time motion capture to every client within visible range will be a difficult technical challenge to overcome, and will lag far behind the expressive capacity of pre-recorded capture.

Posted by: Seb Potter at November 7, 2005 10:53 PM

There's a pertinent article at Cognitive Daily on why we look away during a conversation.

A couple of interesting points from the article:

  • People (well, kids actually in the experiment) look away more often in face-to-face conversations than in a real-time video link; so maybe avatars staring at each other isn't too bad.
  • It seems people look away mostly because they're self-conscious or embarassed OR because they're trying to concentrate on a difficult question.

Not sure how we could simulate this in avatars, since it's pretty difficult to judge when the person is self-conscious or trying to concentrate.

Posted by: Sylvie Noel at November 9, 2005 07:30 AM

Look at the speaker

Back in WoW's beta the characters looked toward their targets as it happens in the other examples.

I still wonder why they removed that feature. It was really cool.

About other minor details: still WoW did already something with the chat bubbles and some standard graphical emotes while speaking in public or yelling.

If I was a dev I'd focus more on the physical perception of the body (better controls, movement, consistence/solidity etc..) rather than the communication.

Posted by: Abalieno at November 14, 2005 03:21 PM

What I'd really like is for vendor NPCs to actually have tables, cabinets, and other assorted paraphernalia associated with a shop. Nothing breaks immersion like a guy standing around with a few metric tons of iron ore strapped to his belt :)

Posted by: Darniaq at November 14, 2005 06:41 PM

(Sylvie, sorry for the long delay in posting your comment. Our filter thought your post looked too much like a comment bot. :P)

Posted by: Eric Nickell at November 16, 2005 04:21 PM

Sylvie, thanks for the link! That's an interesting study. My main point is simply that having avatars *lock* eye gaze when each is selected (e.g., SWG or EQII) doesn't look and feel like face-to-face conversation. It would look more "natural" if the avatars automatically looked away briefly and periodically. And studies suggest that the *speakers* should look away more often than the listeners. Thanks.


Posted by: Bob at November 21, 2005 10:04 AM

Hi Seb, thank you for all your comments. Let me clarify a few points.

People in HCI and CSCW working on "awareness" are trying to tackle the problem of managing "interruptions" from IM. That is, developing ways of telling you that the person you're about to send an IM is currently busy with some activity. (You can still send the IM if you choose, but at least you know better what to expect in terms of a response.) I'm suggesting that such "awareness" information is just as useful in virtual worlds, and it is somewhat easier to implement because in many cases the system already knows what the other player is busy doing.

Take those situations in which a friend of yours can see that you are logged on and that you are not replying to his IMs. If the system tells him that you're "currently in another conversation," it gives him more clues as to why you're ignoring him. (Note, the system could still let his message through, but it does the work of telling him "hang on a sec" for you.)

I totally agree that with today's virtual worlds, players *should* be given the power to decouple their view from the avatar and in many cases "fix" their view when important things are occluded. My point is simply to provide an indicator to tell other players when I'm "not looking" so they won't bother to do a gesture or something in front of my avatar that I wont' see.

Do you mean SWG? EQII replaces facial expressions with gross bodily gestures. For example, "/smile" makes your avatar swing its arm in a "ho hum" sort of way. In SWG, where there are lots of facial animations, I find that people tend to use the text emotes (e.g., "Bob smiles at Seb") that go with the facial expression, but they generally can't see the animations themselves.

Finally, I don't know how much real-time mo-cap would slow down an MMORPG today, but I hear the technology is coming in the near future. I simply want to point out that *if* you ever want to enable players to gesture *freely,* you'll probably need something like real-time mo-cap. The other alternative is complex controls for manipulating every part of the avatar which is probably too hard for users.

Thanks for your feedback!

Posted by: Bob at November 21, 2005 10:37 AM

Seb,
One more thing, separate "chat panes" do indeed make it easier for players to manage multiple conversations simultaneously as you point out. The problem is, most MMORPGs don't use them. (I've only seen them in Second Life and There.) For example, WoW, SWG and EQII mix up all of the private "tells" in the same chat pane.

Also, the fact that MMORPG lingo includes the term "tell hell" to refer to occasions on which you are overwhelmed with managing multiple private conversations strongly suggests that players at times find this annoying.

Posted by: Bob at November 21, 2005 12:26 PM

In my experience, There avatars support 8 out of 10 of the points above. Only points 6 and 8 are not supported. My experience chatting using voice in There has been way more immersive than any other avatar-based interaction I have had in any other virtual world.

Posted by: Adam Siegel at December 13, 2005 05:08 PM

Voice Recognition software is good enough nowadays -- why not encorporate that into the "tells"?

Posted by: Matt at January 17, 2006 02:00 PM

Дешево и быстро выполняются заявки по установке и наладке:
- Эфирное телевидение- 19 телевизионных каналов вещающих с Останкинской телебашни.
- Спутниковое телевидение Триколор ТВ- 32 телевизионных русскоязычных каналов.
- Спутниковое телевидение НТВ+ - более 100 телевизионных каналов.
- Спутниковое телевидение Hotbird - более 400 телевизионных европейских каналов.
Подробности по телефону 8-926-670-86-81 (Константин)
Гарантия САМЫХ низких цен и достойного качества!
ВНИМАНИЕ! Услуга предоставляется на территории Москвы и Подмосковья!
8-926-670-86-81 (Константин)
icq 291471738

Всем спасибо!Успехов форумчанам и адмирунчанам!

Posted by: sputnnyk at October 11, 2008 03:50 PM

Post a comment




Remember Me?

(you may use HTML tags for style)