Hey Google, Fix My Marriage

Read more about Hey Google, Fix My Marriage

Hey Google, Fix My Marriage

5 min read

There’s no denying that Google Assistant is useful for simple, everyday needs that keep users from having to reach for a phone. But what if it could provide more value-added experiences, becoming more intuitive and human-like in the process? Are we that far away from the kind of assistant depicted in Spike Jonze’s Her?

One of the greatest inhibitors of adoption of voice is that natural language isn’t ubiquitous, and functionality is typically limited to quick shortcuts. According to Forrester, 46% of adults currently use smart speakers to control their home, and 52% use them to stream audio. Neither of these use cases are necessities, nor are they very unique. Looking beyond shortcuts and entertainment value, we sought to experiment with Google Assistant to highlight a real-world utility that offers more human-like interactions. Think less in terms of “Hey Google, turn on the kitchen lights,” and instead something more like “Hey Google, fix my marriage.”

That’s not a joke; by providing a shoulder to cry on or a mediator who can resolve conflicts while keeping a level head, our internal R&D team MediaMonks Labs wanted to push the limits of Google Assistant to see what kind of experiences it could provide to better users’ lives and interpersonal relationships.

Who would have thought that a better quality of conversation with a machine might help you better speak to other humans? “Most of the stuff on the Assistant is very functional,” says Sander van der Vegte. “It’s almost like an audible button, or something for entertainment. The marriage counselor is neither, but could be implemented as a step before you look for an actual counselor.”

Why Google Assistant?

Google Assistant is an exciting platform for voice thanks to its ability to be called up anytime, anywhere through its close integration with mobile. “Google Assistant is very much an assistant, available to help at any moment of time,” says Joe Mango, Creative Technologist at MediaMonks.

But still, the team felt the platform could go even further in providing experiences that are unique to the voice interface. “Right now, considering the briefings we get, most of the stuff on the assistant is very functional,” says Sander van der Vegte, Head of MediaMonks Labs. “It’s designed to be a shortcut to do something on your phone, like an audible button. This marriage counselor has a completely different function to it.”

The Labs team took note when Amazon challenged developers to design Alexa skills that could hold meaningful conversations with users for 20 minutes, through a program called the Alexa Prize. It offered an excellent opportunity to turn the tables and challenge the Google Assistant platform to see how well it could sustain a social conversation with users, resulting in a unique action that requires the assistant to use active listening and an empathetic approach to help two users see eye to eye.

Breaking the Conversation Mold

As you might imagine, offering this kind of experience required a bit of hacking. To listen and respond to two different people in a conversation, the assistant had to free itself from the typical, transactional exchange that voice assistant dialogue models are designed for. “We had to break all the rules,” says Mango—but all’s fair in love and war, at least for a virtual assistant.

A big example of this is a novel use of the fallback intent. By design, the fallback intent is a response the assistant provides to users when they make a query that isn’t programmed to a response—usually something as simple as asking the user to try to state their request in another way.

But the marriage counselor uses this step to pass the query along to sentiment analysis with Google Cloud API. There, the statement is scored on how positive or negative it is. Tying this score to a scan of the conversation history for applicable details, the assistant can pull a personalized response. This allows both users to speak freely through an open-ended discussion without being interrupted by errors.

What does such an interaction look like? When a couple tested the marriage counselor action, one user mentioned his relationship with his brothers: some of them were close, but the user felt that he was becoming distant from one of them. In response, the assistant chimed in to remind the user that it was good that he had a series of close relationships to confide in. Its ability to provide a healthy perspective in response to a one-off comment—a comment not even about the user’s romantic relationship, but still relevant to his emotional well-being—was surprising.

The inventive use of the platform allows the assistant to better respond to a user’s perceived emotional state. Google is particularly interesting to experiment with thanks to its advanced voice recognition models; it built the sentiment analysis framework used within the marriage counseling action, and Google’s announcement of Project Euphonia earlier this year, which makes voice recognition easier for those with speech impairments, was a welcome sight for those seeking to make digital experiences more inclusive. “At MediaMonks, we’re finding ways to creatively execute on these frameworks and push them forward,” said Mango.

Giving Digital Assistants the Human Touch

But the marriage counselor action is more focused on listening rather than speaking, allowing two users to hash it out and doling out advice or prompts when needed. A big part of this process is emotional intelligence. Humans know that the same sentence can have multiple meanings depending on the tone used—for example, sarcasm. Another example might be the statement “Only you would think of that,” which could be viewed as patronizing or a compliment given the tone and context.

Monk Thoughts At MediaMonks, we’re finding ways to creatively execute on these frameworks and push them forward.

Joseph Mango

Creative Technologist, MediaMonks

While the assistant currently can’t understand tone of voice, a stopgap solution was to enable it to parse meaning with through vocabulary and conversational context—helping the assistant understand that it’s not just what you say, but how you say it. This is something that humans pick up on naturally, but Mango drew on linguistics to provide the illusion of emotional intelligence.

“If the assistant moves in this direction, you’ll get a far more valuable user experience,” says van der Vegte. One example of how emotional intelligence can better support the user outside of a counseling context would be if the user asks for directions somewhere in a way that indicates they’re stressed. Realizing that a stressed user who’s in a hurry probably doesn’t want to spend time wrangling with route options, the action could make the choice to provide the fastest route.

Next Stop: More Proactive, Responsive Assistants

“There’s always improvements to be made,” says Mango, who recognizes two ways that Google Assistant could provide even more lifelike and dynamic social conversations. First, he would like to see the assistant support emotion detection through more ways than examining vocabulary. Second, he’s like to make the conversation flow even more responsive and dynamic.

“Right now the conversation is very linear in its series of questions,” he says. But in a best-case scenario, the assistant could provide alternative paths based on user response, customizing each conversation to respond to different underlying issues that the marriage counselor might identify is affecting the relationship.

But for now, the team is excited to tinker and push the envelope on what platforms can achieve, inspired by a sense of technical curiosity and the types of experiences they’d like to see in the world. “It speaks a lot to the mission of what we do at Labs,” said Mango. “We always want to push the limitation of the frameworks out there to provide for new experiences with added value.”

As assistants become better equipped to listen and respond with emotional intelligence, their capabilities will expand to provide better and more engaging user experiences. In a best-case scenario, an assistant might identify user sentiment and use that knowledge to recommend a relevant service, like prompting a tired-sounding user to take a rest. Such an advancement would allow brands to forge a deeper connection to users by providing the right service at the right place in time. While Westworld-level AI is still far off in the distance, we’ll continue chatting and tinkering away at teaching our own bots the fine art of conversation—and we can’t wait to see what they’ll say next.

Voice assistants have been life changing for some users, but they can go to even further lengths in providing rich, valuable conversational experiences. The next big leap in conversational AI may be emotional intelligence, and MediaMonks Labs set out to achieve just that. Hey Google, Fix My Marriage Checking the weather or a sports score is nice, but can a smart speaker save your marriage? We’re working on it.
Google Assistant Alexa skills Google actions sentiment analysis emotional intelligence AI artificial intelligence conversational interface

Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’

Read more about Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’

Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’

5 min read

Interest has grown significantly in voice platforms over the years, and while they have proved life-changing for the visually impaired or those with limited mobility, for many of us the technology’s primary convenience is in saving us the effort of reaching for a phone. Yet we anticipate a future in which voice platforms can provide more natural experiences to users beyond calling up quick bits of information. This ambition has prompted us to look for new ways to provide added value to conversations, making smart use of the tools readily available by organizations leading the charge in consumer-facing voice assistant platforms.

The primary challenge in unlocking truly human-like exchanges with virtual assistants is that their dialogue models are best fit for transactional exchanges: you say something, the assistant responds with a prompt for another response, and so on. But we’ve found that brands that are keen on taking advantage of the platform are looking for a more than a rigid experience. “There are plenty of requests from clients about assistants, who are under the impression that the user can say whatever,” says Sander van der Vegte, Head of MediaMonks Labs. “What you expect from a human assistant is to speak open-ended and get a response, so it’s natural to assume a digital assistant will react similarly.” But this conversation structure goes against the grain for how these platforms typically work, which means we must find new approaches that better accommodate the experiences that brands seek to provide their users.

Giving Digital Assistants the Human Touch

One way to make conversations with voice assistants more human-like is to empower them with a distinctly human trait: emotional intelligence. MediaMonks Labs is experimenting with this by developing a Google Assistant action that serves as a marriage counselor that uses sentiment analysis to draw out the intent and meaning behind user statements.

Monk Thoughts This is the first step down an ongoing path for deeper, richer conversation.

Joe Mango

Creative Technologist, MediaMonks Labs

Monk Thoughts We can learn to speak more effectively to an AI, just like how AI learns to speak to us.

Joe Mango

Creative Technologist, MediaMonks Labs

To better understand what this looks like, consider how two humans effectively resolve a conflict. Rather than accuse someone of acting a certain way, for example, it’s preferable to use “I messages” about how others’ actions make you feel, so the other party doesn’t feel attacked. So whether you begin a statement with “you” (accusatory) or “I” (garnering empathy) can have a profound impact on how others invested in a conflict will respond. Likewise, our marriage counseling action analyzes the vocabulary and inflection in two users’ statements to dole out relationship advice to them. Responses are focused not just on what they say but how they say it.

“We can learn to speak more effectively to an AI, just like how AI learns to speak to us,” says Joe Mango, Creative Technologist at MediaMonks. According to him, users have been conditioned to speak to bots in, well, robotic ways through their experience with them. “When we had someone from our team test the action by simply speaking to it, he wasn’t sure what to say at first.”

Speaking a New Language

The action takes a large departure from the standard conversational setup with a voice bot. Rather than have a back-and-forth chat with a single user, the action listens attentively as two users speak to one another. Allowing Google Assistant to pull off such a feat gets at the heart of why so few actions provide such rich conversational experiences: the inherent limitations of the natural language processing platforms that power them. For example, the Google Assistant breaks conversation down into a “you say this, I say that”-style structure that limits the amount of time it opens the microphone to listen to a user response.

Monk Thoughts We always want to push the limitation of the frameworks to provide new experiences and added value.

Joe Mango

Creative Technologist, MediaMonks Labs

Conventional wisdom surrounding conversational design shies away from “wide-focus” questions, encouraging developers to be as pointed and specific as possible so users can answer in just a word or two. But we think breaking out of this structure is not only feasible, but capable of providing the next big step in richer, more genuine interactions between people and brands. “It speaks a lot to the mission of what we do at Labs,” said Mango. “We always want to push the limitation of the frameworks out there to provide for new experiences with added value.”

Next Stop: More Proactive, Responsive Assistants

While the action is effective, “It’s just the first step down an ongoing path to support more dynamic sentence structures and deeper, richer conversation,” says Mango. While the focus right now is on inflection and vocabulary, future iterations of the action could draw on users’ tone of voice to glean their sentiment even more accurately. From there, findings from this experiment aid in providing other voice apps a level of emotional intelligence that helps organizations engage with their audience in even more human-like ways.

Voice assistants have been life changing for some users, but they can go to even further lengths in providing rich, valuable conversational experiences. The next big leap in conversational AI may be emotional intelligence. Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’ Checking the weather or a sports score is nice, but can a smart speaker save your marriage? We’re working on it.
Google Assistant Alexa skills Google actions sentiment analysis emotional intelligence AI artificial intelligence conversational interface

Subscribe to Google actions