This blog is an excerpt of an Nov 2015 article that is in O’Reilly/Radar.
In the first post in this series, I mentioned that we’re getting used to talking to technology. We talk to our cell phones, our cars; some of us talk to our TVs, and a lot of us talk to customer support systems. The field has yet to settle into a state of equilibrium, but I thought I would take a stab at defining some categories of conversational interfaces.
There is, of course, quite a range of intelligent assistants, but I want to consider specifically different types of conversational interactions with technology. You might have an intelligent agent that can arrange meetings, for example, figuring out attendees’ availability, and even sending meeting requests. Certainly, that’s a useful and intelligent agent, but working with it doesn’t necessarily require any conversational interaction.
Classifying conversational interfaces
As usual with these kinds of things, the boundaries can be fuzzy. So, a particular piece of technology can have aspects of multiple categories, but here’s what I propose.
Voice interfaces: Understand a few set phrases
The most basic level of speech interactions are simple voice interfaces that let you control devices or software by speaking commands. Generally, these systems have a fixed set of actions. Saying a word or phrase is akin to using a menu system, but instead of clicking on menu items, you can speak them. You find these in cars with voice commands and Bluetooth interfaces to make phone calls or play music. It’s the same kind of system when you call into a phone tree that routes you to a particular department or person. Some of these systems allow for variations in how you say something, but for the most part, they will only understand words or phrases from a predefined list.
Chatbots: Make us think we are really talking
Chatbots have an illustrious history, coming from early artificial intelligence programs like ELIZA, which was developed in the 1960s. Chatbots primarily try to engage people in conversation, usually just for the sake of engaging people in conversation. You may run across chatbots on Twitter, in chat rooms, and on certain websites. For the most part they use tricks to simulate conversation without incorporating much, if any, machine intelligence. For example, ELIZA worked by recognizing certain patterns that could be rephrased and turned around into a reasonable response. It could also recognize a handful of topics, so it seemed to know what you were talking about if you happened to mention those topics in the conversation. Sometimes chatbots are successful in maintaining the illusion of a conversation over several turns, but eventually the conversation starts to degrade into something that seems less like conversation and more like nonsense.
Personal assistants: Anticipate our every need
We find more modern technology in personal assistants like Siri, Google Now, and Cortana. These applications want to help by increasing productivity and keeping your life organized. Personal assistants try to know a lot about you. They have access to information like your contacts and calendar, plus they’ll try to answer arbitrary questions for you, like “What’s the capital of Wisconsin?” (Siri tells me it’s Madison.) Personal assistants are still a bit unreliable and sometimes awkward to work with, but their ambitions are clear. We all get a capable and efficient executive assistant who can anticipate our every need. It’s also worth noting that current personal agents don’t enable anything you can’t already do on your phone. However, they do provide a single point of entry for several functions, and we can expect to see their capabilities expand as the notion of conversational user interfaces comes into its own.
Virtual agents: Taking care of business
Virtual agents are similar to personal assistants, but they tend to be more focused on accomplishing a task. They are deployed by enterprises in order to automate or provide a self-help tool for their customers. An example is a system that lets you make a hotel reservation. Enterprise virtual agents know about and have access to the knowledge and processes of an organization. These are being used more and more in the context of customer care. Another important distinction is that whereas personal agents are trying to support a general set of knowledge, virtual agents support a more targeted knowledge base.
Persuasive agents: Encourage us to act
In all of my examples so far, the agents are there to do our bidding, but we can expect that to change as the technologies — and companies’ use of the technologies — get more sophisticated. Another type of agent I’m sure will be added to the list is the persuasive agent. Don’t be surprised if after a call center agent helps you solve a problem, it then tries to upsell you on the latest offerings from the company. But persuasive agents don’t have to be just profit driven. Health insurance companies might use them to nudge us into healthy behaviors like exercising more. We might even want our own personal agents to help us stick to a diet.
Like I said, there aren’t necessarily clearly defined lines for any of these, and systems can span boundaries. The Amazon Echo (also known as Alexa), for example, provides basic voice control to play music or set a timer, but it’s also happy to answer questions for you like a personal assistant. Naturally there could be other perspectives when categorizing agents. You might bucket them according to different communication modes like: voice versus just text versus animated avatars. But I like drawing the distinctions according to high-level goals, discourse and language capabilities, as well as the knowledge sources the technology has access to. The knowledge an agent possesses determines the breadth of its intelligence; however, there is a trade-off between the breadth of knowledge and the scope of its language capability. Narrowing the domain can allow an agent to sound more natural within its field of expertise, even if it does limit the possible subject matter.
It’s possible that we’ll see the lines blur even more as the technology matures and systems become more and more capable, but my guess is that we’ll continue to see distinct delineations. Airline companies have a strong motivation to provide an efficient, easy-to-use reservation system that can scale at low cost — a perfect use case for a virtual agent. The carriers want technology that can provide information from their knowledge bases and can access their reservation system. The airlines will obviously want to control that technology themselves.
On the other hand, a particular individual may or may not care about having an automatic reservation agent.
Which conversations should overlap?
But the bigger question is which technology should have access to what information. I might let my own personal assistant automatically add a contact or schedule an appointment for me, but I’m not likely to give the companies I do business with the same control. The more my personal assistant can do for me, the more access and control over my information it needs.
But here’s the cool thing: even if these agents are completely separate, they can talk to each other. If my personal assistant already knows all about my plans to attend a conference, I can turn over the tedious exercise of finding the right flight to my personal assistant. It can figure out the arrival time I need at my destination, track my loyalty points, and select my food preferences. I already trust it to have all this information. The airlines don’t need to open their reservation systems to all the personal assistants in the world. They already have a virtual agent that understands their range of options and can book the reservation. I just tell my personal assistant to make the arrangements. It talks to the airlines’ virtual agent and together they work out my best option. It makes the reservations and later reminds me to start packing my bag.
Can we trust our stuff to talk for us?
This, of course, does not solve all the problems of privacy and trust. I might want my personal assistant to share my salary information when it’s filling out a loan application on my behalf, but not when it’s ordering me my favorite takeout meal. How will it know how to distinguish? That’s an excellent question and one that reveals that there is still a lot of work to do.
By Kyle Dent, Research Manager for the Conversational Human-Agent Technology research area at PARC