Choose your language

Choose your language

The website has been translated to English with the help of Humans and AI

Dismiss

For Creatives and AI, It Takes Two to Tango

For Creatives and AI, It Takes Two to Tango

4 min read
Profile picture for user Labs.Monks

Written by
Labs.Monks

For Creatives and AI, It Takes Two to Tango

Chances are, you’ve seen the meme before: “I forced a bot to watch over 1,000 hours of [TV show] and then asked it to write an episode of its own. Here is the first page,” followed by a nonsensical script. These memes are funny and quirky for their surreal and unintelligible output, but in the past couple of years, AI has improved to create some incredible work, like OpenAI’s language model that can write text and answer reading comprehension questions.

AI has picked up a handful of creative talents: making original music in the style of famous artists or turning your selfie into a classical portrait, to name a few. While these experiments are very impressive, they’re often toy examples designed to demonstrate how well (or poorly) an artificial intelligence stacks up to human creativity. They’re fun, but not very practical for day-to-day use by creatives. This led our R&D team, MediaMonks Labs, to consider how tools like these would actually function within a MediaMonks project.

This question fueled two years of experimentation and neural network training for the Labs team, who built a series of machine learning-enhanced music video animations that demonstrate true creative symbiosis between humans and machines, in which a 3D human figure performs a dance developed entirely by (or in collaboration with) artificial intelligence.

The Simulation Series was built out of a desire to let humans take a more active approach to working creatively with AI, controlling the output by either stitching together AI-created dance moves or by shooting and editing the digital performance to their liking. This means you don’t have to be a pro at animation (or choreography) to make an impressive video; simply let the machine render a series of dance clips based on an audio track and edit the output to your liking.

“Once I had the animations I liked, I could put it in Unity and could shoot them from the camera angles that I wanted, or rapidly change the entire art direction,” says Samuel Snider-Held. A Creative Technologist at MediaMonks, he led the development of the machine learning agent. “That was when I felt like all these ideas were coming together, that you can use the machine learning agent to try out a lot of different dances over and over and then have a lot of control over the final output.” Snider-Held says that it takes about an hour for the agent to generate 20 different dances—far outpacing the amount of time that it would take for a human to design and render the same volume.

Snider-Held isn’t an animator, but his tool gives anyone the opportunity to organically create, shoot and edit their own unique video with nothing but a source song and Unity. He jokes when he says: “I spent two years researching the best machine learning approaches geared towards animation. If I spent two years to learn animation instead, would I be at the same level?” It’s tough to say, though Snider-Held and the Labs team have accomplished much over those two years of exhaustive, iterative development—from filling virtual landscapes with AI-designed vegetation to more rudimentary forms of AI-generated dances in pursuit of human-machine collaboration.

Enhancing Creative Intelligence with Artificial Intelligence

Even though the tool fulfills the role of an animator, the AI isn’t meant to replace anyone—rather, it aims to augment creatives’ abilities and enable them to do their work even better, much like how Adobe Creative Cloud eases the creative process of designing and image editing. Creative machines help us think and explore vast creative possibilities in shorter amounts of time.

It’s within this process of developing the nuts and bolts that AI can be most helpful, laying a groundwork that provides creatives a series of options to refine and perfect. “We want to focus on the intermediate step where the neural network isn’t doing the whole thing in one go,” Snider-Held says. “We want the composition and blocking, and then we can stylize it how we want.”

Monk Thoughts The tool’s glitchy aesthetic sells the ‘otherness’ to it. It doesn’t just enhance your productivity, it can enhance the limits of your imagination.
Samuel Snider-Held headshot

It’s easy to see how AI’s ability to generate a high volume of work could help a team take on projects that otherwise didn’t seem feasible at cost and scale—like generating a massive amount of hand-drawn illustrations in a short turnaround. But when it comes to neural network-enhanced creativity, Snider-Held is more excited about exploring an entirely new creative genre that perhaps couldn’t exist without machines.

“It’s like a reverse Turing test,” he says, referencing the famous test by computer scientist Alan Turing in which an interrogator must guess whether their conversation partner is human or machine. “The tool’s glitchy aesthetic sells the ‘otherness’ to it. It doesn’t just enhance your productivity, it can enhance the limits of your imagination. With AI, we can create new aesthetics that you couldn’t create otherwise, and paired with a really experimental client, we can do amazing things.”

Google’s Nsynth Super is a good example of how machine learning can be used to offer something creatively unprecedented: the synthesizer combines source sounds together into entirely new ones that humans have never heard before. Likewise, artificial intelligence tools like automatically rendering an AI-choreographed dance can unlock surreal, new creative possibilities that a traditional director or animator likely wouldn’t have envisioned.

In the spirit of collaboration, it will be interesting to see what humans and machines create together in the near and distant future—and how it will further transform the ways that creative teams will function. But for now, we’ll enjoy seeing humans and their AI collaborators dance virtually in simpatico.

A dancing AI from MediaMonks Labs goes beyond enhancing productivity–it supercharges creative thinking and imagination, too. For Creatives and AI, It Takes Two to Tango A dance-designing AI made by MediaMonks Labs does more than just the robot.
Ai artificial intelligence machine learning ml neural network neural network training creative machines creative AI

When Speed is Key, MediaMonks Labs Enables Swift, Proactive AI Prototyping

When Speed is Key, MediaMonks Labs Enables Swift, Proactive AI Prototyping

4 min read
Profile picture for user Labs.Monks

Written by
Labs.Monks

When Speed is Key, MediaMonks Labs Enables Swift, Proactive AI Prototyping

As the COVID-19 pandemic spreads throughout the world and people retreat into their homes to practice social distancing, ingenuity and the need to digitally transform have become more apparent now than ever. Always looking for ways to jump-start innovation, the MediaMonks Labs team has experimented with ways to speed up the development of machine learning-based solutions from prototype to end product, cutting out unnecessary hours of coding to iterate at speed.

“Mental fortitude and being used to curveballs are skills and ways of working that come to the foreground now,” says Geert Eichhorn, Innovation Director at MediaMonks. “We see those eager to adapt come out on top.” Proactively aiming to solve the challenges faced by brands and their everyday audiences, the team recently experimented with a faster way to build and iterate artificial intelligence-driven products and services.

Fun Experiments Can Lead to Proactive Value

The idea behind one such experiment, the Canteen Counter, may seem silly on the surface: determine when the office canteen is less busy, helping the team find the optimal time to go and grab a seat. But the technology behind it provides some learnings for those who aim to solve challenges quickly with off-the-shelf tools.

Here’s how it works. The Canteen Counter’s camera was pointed at the salad bar, capturing the walkway from the entrance to the dishwashers—the most crowded spot in the canteen. The machine learning model detects people in the frame and keeps a count of how many are there to determine when it’s busy and when it isn’t—much like how business listings on Google Maps predict peak versus off-peak hours.

CC Screen2

Of course, now that the team is working from home, there’s little need to keep an eye on the canteen. But one could imagine a similar tool to determine in real time which spaces are safe for social distancing, measured from afar. Is the local park empty enough for some fresh air and exercise? Is the grocery store packed? Ask the AI before you leave!

“I would like to make something that is helpful to people being affected by COVID-19 next,” says Luis Guajardo, Creative Technologist at MediaMonks. “I think that would be an interesting spinoff of this project.” The sentiment shows how such experiments, when executed at speed, can provide necessary solutions to new problems soon after they arise.

Off-the-Shelf Tools Help Teams Plug In, Play and Apply New Learnings

Our Canteen Counter is powered by Google’s Coral, a board that runs optimized TensorFlow models using an Edge TPU chip. To get the jargon out of the way, it essentially lets you employ machine learning offline—a process that typically connects to a cloud, which is why you need a data connection to interact with most digital assistants. The TPU chip (which stands for tensor processing unit) is built to handle the neural network-trained machine learning directly on the hardware.

This not only allows for faster processing, but also increased privacy because data isn’t shared with anyone. Developers may simply take an existing, off-the-shelf machine learning model to quickly optimize to the hardware and the goals of a project. While the steps behind this process are simpler than training a model of your own, there’s still some expertise required in discovering which model best suits your needs—a point made clear with another tool built by Labs that compares computer vision models and the differences between them.

Monk Thoughts What is a canteen counter today could become a camera that tells you something about your posture tomorrow. Anything goes, and it changes by the day.
Portrait of Geert Eichhorn

What the team really likes about Coral is how flexible it is thanks to the TPU chip, which comes in several different boards and modules to easily plug and play. “That means you could use the Coral Board to build initial product prototypes, test models and peripherals, then move into production using only the TPU modules based on your own product specs and electronics and create a robust hardware AI solution,” says Guajardo.

Quicken the Pace of Development to Stay Ahead of Challenges

For the Labs team, tools like Coral have quickened the pace of experimentation and developing new solutions. “The off-the-shelf ML models combined with the Coral board and some creativity can let you build practical solutions in a matter of days,” says Eichhorn. “If it’s not a viable solution you’ll find out as soon as possible, which prevents you from wasting any valuable time and resources.” Eichhorn compares this process to X (formerly Google X), where ideas are broken down as fast as possible to stress test viability.

“At Labs, we jump on new technologies and apply them in new creative ways to solve problems we didn’t know we had, so any project or platform that has as much flexibility as the Canteen Counter is very much up Labs’ alley,” says Eichhorn. “What is a canteen counter today could become a camera that tells you something about your posture tomorrow. Anything goes, and it changes by the day.” He notes that more is being worked on behind the scenes as the team ponders the trend toward livestreaming, the need for showing solidarity, play and interaction while working from home.

It’s worth reflecting on how dramatically the world has changed since we settled on the idea to keep an eye on our workplace canteen through a fun, machine learning experiment. But Eichhorn cautions that in a rush for much-needed solutions, “innovation” can often begin to feel like a buzzword. “What we do differently is that we can actually build, be practical, execute, and make it work.”

Extraordinary times call for extraordinary solutions.

Focused on solutions that are both useful and practical, the MediaMonks Labs team shares its approach to rapidly prototyping machine learning-based solutions. When Speed is Key, MediaMonks Labs Enables Swift, Proactive AI Prototyping By cutting out unnecessary coding hours, MediaMonks Labs builds solutions at speed.
Machine learning artificial intelligence mediamonks labs prototyping innovation google coral coral board

MM Labs Uncovers the Biases of Image Recognition

MM Labs Uncovers the Biases of Image Recognition

4 min read
Profile picture for user Labs.Monks

Written by
Labs.Monks

MM Labs Uncovers the Biases of Image Recognition

Do you see what I see? The game “I spy” is an excellent exercise in perception, where players take turns guessing an object that someone in the group has noticed. And much like how one player’s focus might be on an object totally unnoticed by another, artificial intelligences can also notice entirely different things in a single photo. Hoping to see through the eyes of AI, MediaMonks Labs developed a tool that pits leading image recognition services against one another to compare what they each see in the same image—try it here.

Image recognition is when an AI is trained to identify or draw conclusions of what an image depicts. Some image recognition software tries to identify everything in a photo, like a phone automatically organizes photos without the user having to tag them manually. Others are more specialized, like facial recognition software trained to recognize not just a face, but perhaps even the person’s identity.

This sort of technology gives your brand eyes, enabling it to react contextually to the environment around the user. Whether it be identifying possible health issues before a doctor’s visit or identifying different plant species, image recognition is a powerful tool that further blurs the boundary between user and machine. “In the market, convenience is important,” says Geert Eichhorn, Innovation Director at MediaMonks. “If it’s easier, people are willing pick up and try. This has the potential to be that simple, because you only need to point your phone and press a button.”

Monk Thoughts With image recognition, your product on the store shelf or in the world can become triggers for compelling experiences.
Portrait of Geert Eichhorn

You could even transform any branded object into a scavenger hunt. “What Pokemon Go did for GPS locations, this can do for any object,” says Eichhorn. “Your product on the store shelf or in the world can become triggers for compelling experiences.”

Uncovering the Bias in AI

For a technology that’s so simple to use, it’s easy to forget the mechanics of image recognition and how it works. Unfortunately, this leads to an unequal experience among users that can have very powerful implications: most facial recognition algorithms still struggle to recognize the faces of black people compared to white ones, for example.

Why does this happen? Image recognition models can only identify what it’s trained to see. How should an AI know the difference between dog breeds if they were never identified to it? Just like how humans draw conclusions based on their experiences, image recognition models will each interpret the same image in different ways based on their data set. The concern around this kind of bias is two-fold.

First, there’s the aforementioned concern that it can provide an unequal experience for users, particularly when it comes to facial recognition. Developers must ensure they power their experience with a model capable of recognizing a diverse audience.

Screen Shot 2019-10-30 at 4.53.04 PM

As we see in the image above, Google is looking for contextual things in the event photo, while Amazon is very sure that there is a person there.

Second, brands and developers must carefully consider which model best supports their use case; an app that provides a dish’s calorie count by snapping a photo won’t be very useful if it can’t differentiate between different types of food. “If we have an idea or our client wants to detect something, we have to look at which technology to use—is one service better at detecting this, or do we make our own?” says Eichhorn.

Seeing Where AI Doesn’t See Eye-to-Eye

Machine learning technology functions within a black box, and it’s anyone’s guess which model is best at detecting what’s in an image. As technologists, our MediaMonks Labs team isn’t content to make assumptions, so they built a tool that offers a glimpse at what several of the major image recognition services see when they view the same image, side-by-side. “The goal for this is discovering bias in image recognition services and to understand them better,” says Eichhorn. “It also shows the potential of what you could achieve, given the amount of data you can extract from an image.”

Here’s how it works. The tool lists out the objects and actions detected by Google Cloud Vision, Amazon Rekognition and Baidu AI, along with each AI’s confidence in what it sees. By toying around with the tool, users may observe differences in what each model responds to—or doesn’t. For example, Google Cloud Vision might focus more on contextual details, like what’s happening in a photo, where Amazon Rekognition is focused more on people and things.

Monk Thoughts With this tool, we want to pull back the curtain to show people how this technology works.
Portrait of Geert Eichhorn

This also showcases some of the variety of things that can be recognized by the software, and each can have exciting creative implications: the color content of a user’s surroundings, for example, might function as a mood trigger. We collaborated DDB and airline Lufthansa to build a Cloud Vision-powered web app, for example, which recommends a travel destination based on the user’s photographed surroundings. For example, a photo of a burger might return a recommendation to try healthier food at one of Bangkok’s floating markets.

The Lufthansa project is interesting to think about in the context of this tool, because expanding it to the Chinese market required switching the image recognition from Cloud Vision to something else, as Google products aren’t utilized in the country. This gave the team the opportunity to look into other services like Baidu and AliYun, prompting them to test each for accuracy and response time. It showcases in very real terms why and how a brand would make use of such a comparison tool.

“Not everyone can be like Google or Apple, who can train their systems based on the volume of photos users upload to their services every day,” says Eichhorn. “With this tool, we want to pull back the curtain to show people how this technology works.” With a better understanding of how machine learning is trained, brands can better envision the innovative new experiences they aim to bring to life with image recognition.

MediaMonks Labs built a tool to better understand image recognition services by uncovering their biases. MM Labs Uncovers the Biases of Image Recognition Just like people, no two artificial intelligences are alike—even when they aim to do the same thing.
artificial intelligence machine learning mediamonks labs AI bias bias in ai image recognition computer vision

Hey Google, Fix My Marriage

Hey Google, Fix My Marriage

5 min read
Profile picture for user mediamonks

Written by
Monks

Hey Google, Fix My Marriage

There’s no denying that Google Assistant is useful for simple, everyday needs that keep users from having to reach for a phone. But what if it could provide more value-added experiences, becoming more intuitive and human-like in the process? Are we that far away from the kind of assistant depicted in Spike Jonze’s Her?

One of the greatest inhibitors of adoption of voice is that natural language isn’t ubiquitous, and functionality is typically limited to quick shortcuts. According to Forrester, 46% of adults currently use smart speakers to control their home, and 52% use them to stream audio. Neither of these use cases are necessities, nor are they very unique. Looking beyond shortcuts and entertainment value, we sought to experiment with Google Assistant to highlight a real-world utility that offers more human-like interactions. Think less in terms of “Hey Google, turn on the kitchen lights,” and instead something more like “Hey Google, fix my marriage.”

That’s not a joke; by providing a shoulder to cry on or a mediator who can resolve conflicts while keeping a level head, our internal R&D team MediaMonks Labs wanted to push the limits of Google Assistant to see what kind of experiences it could provide to better users’ lives and interpersonal relationships.

Who would have thought that a better quality of conversation with a machine might help you better speak to other humans? “Most of the stuff on the Assistant is very functional,” says Sander van der Vegte. “It’s almost like an audible button, or something for entertainment. The marriage counselor is neither, but could be implemented as a step before you look for an actual counselor.”

Why Google Assistant?

Google Assistant is an exciting platform for voice thanks to its ability to be called up anytime, anywhere through its close integration with mobile. “Google Assistant is very much an assistant, available to help at any moment of time,” says Joe Mango, Creative Technologist at MediaMonks.

GOOGLEVID2

But still, the team felt the platform could go even further in providing experiences that are unique to the voice interface. “Right now, considering the briefings we get, most of the stuff on the assistant is very functional,” says Sander van der Vegte, Head of MediaMonks Labs. “It’s designed to be a shortcut to do something on your phone, like an audible button. This marriage counselor has a completely different function to it.”

The Labs team took note when Amazon challenged developers to design Alexa skills that could hold meaningful conversations with users for 20 minutes, through a program called the Alexa Prize. It offered an excellent opportunity to turn the tables and challenge the Google Assistant platform to see how well it could sustain a social conversation with users, resulting in a unique action that requires the assistant to use active listening and an empathetic approach to help two users see eye to eye.

Breaking the Conversation Mold

As you might imagine, offering this kind of experience required a bit of hacking. To listen and respond to two different people in a conversation, the assistant had to free itself from the typical, transactional exchange that voice assistant dialogue models are designed for. “We had to break all the rules,” says Mango—but all’s fair in love and war, at least for a virtual assistant.

A big example of this is a novel use of the fallback intent. By design, the fallback intent is a response the assistant provides to users when they make a query that isn’t programmed to a response—usually something as simple as asking the user to try to state their request in another way.

But the marriage counselor uses this step to pass the query along to sentiment analysis with Google Cloud API. There, the statement is scored on how positive or negative it is. Tying this score to a scan of the conversation history for applicable details, the assistant can pull a personalized response. This allows both users to speak freely through an open-ended discussion without being interrupted by errors.

Screen Shot 2019-09-18 at 9.55.26 AM

What does such an interaction look like? When a couple tested the marriage counselor action, one user mentioned his relationship with his brothers: some of them were close, but the user felt that he was becoming distant from one of them. In response, the assistant chimed in to remind the user that it was good that he had a series of close relationships to confide in. Its ability to provide a healthy perspective in response to a one-off comment—a comment not even about the user’s romantic relationship, but still relevant to his emotional well-being—was surprising.

The inventive use of the platform allows the assistant to better respond to a user’s perceived emotional state. Google is particularly interesting to experiment with thanks to its advanced voice recognition models; it built the sentiment analysis framework used within the marriage counseling action, and Google’s announcement of Project Euphonia earlier this year, which makes voice recognition easier for those with speech impairments, was a welcome sight for those seeking to make digital experiences more inclusive. “At MediaMonks, we’re finding ways to creatively execute on these frameworks and push them forward,” said Mango.

Giving Digital Assistants the Human Touch

But the marriage counselor action is more focused on listening rather than speaking, allowing two users to hash it out and doling out advice or prompts when needed. A big part of this process is emotional intelligence. Humans know that the same sentence can have multiple meanings depending on the tone used—for example, sarcasm. Another example might be the statement “Only you would think of that,” which could be viewed as patronizing or a compliment given the tone and context.

Monk Thoughts At MediaMonks, we’re finding ways to creatively execute on these frameworks and push them forward.

While the assistant currently can’t understand tone of voice, a stopgap solution was to enable it to parse meaning with through vocabulary and conversational context—helping the assistant understand that it’s not just what you say, but how you say it. This is something that humans pick up on naturally, but Mango drew on linguistics to provide the illusion of emotional intelligence.

“If the assistant moves in this direction, you’ll get a far more valuable user experience,” says van der Vegte. One example of how emotional intelligence can better support the user outside of a counseling context would be if the user asks for directions somewhere in a way that indicates they’re stressed. Realizing that a stressed user who’s in a hurry probably doesn’t want to spend time wrangling with route options, the action could make the choice to provide the fastest route.

Next Stop: More Proactive, Responsive Assistants

“There’s always improvements to be made,” says Mango, who recognizes two ways that Google Assistant could provide even more lifelike and dynamic social conversations. First, he would like to see the assistant support emotion detection through more ways than examining vocabulary. Second, he’s like to make the conversation flow even more responsive and dynamic.

Sentiment

“Right now the conversation is very linear in its series of questions,” he says. But in a best-case scenario, the assistant could provide alternative paths based on user response, customizing each conversation to respond to different underlying issues that the marriage counselor might identify is affecting the relationship.

But for now, the team is excited to tinker and push the envelope on what platforms can achieve, inspired by a sense of technical curiosity and the types of experiences they’d like to see in the world. “It speaks a lot to the mission of what we do at Labs,” said Mango. “We always want to push the limitation of the frameworks out there to provide for new experiences with added value.”

As assistants become better equipped to listen and respond with emotional intelligence, their capabilities will expand to provide better and more engaging user experiences. In a best-case scenario, an assistant might identify user sentiment and use that knowledge to recommend a relevant service, like prompting a tired-sounding user to take a rest. Such an advancement would allow brands to forge a deeper connection to users by providing the right service at the right place in time. While Westworld-level AI is still far off in the distance, we’ll continue chatting and tinkering away at teaching our own bots the fine art of conversation—and we can’t wait to see what they’ll say next.

Voice assistants have been life changing for some users, but they can go to even further lengths in providing rich, valuable conversational experiences. The next big leap in conversational AI may be emotional intelligence, and MediaMonks Labs set out to achieve just that. Hey Google, Fix My Marriage Checking the weather or a sports score is nice, but can a smart speaker save your marriage? We’re working on it.
Google Assistant Alexa skills Google actions sentiment analysis emotional intelligence AI artificial intelligence conversational interface

Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’

Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’

5 min read
Profile picture for user Labs.Monks

Written by
Labs.Monks

Interest has grown significantly in voice platforms over the years, and while they have proved life-changing for the visually impaired or those with limited mobility, for many of us the technology’s primary convenience is in saving us the effort of reaching for a phone. Yet we anticipate a future in which voice platforms can provide more natural experiences to users beyond calling up quick bits of information. This ambition has prompted us to look for new ways to provide added value to conversations, making smart use of the tools readily available by organizations leading the charge in consumer-facing voice assistant platforms.

The primary challenge in unlocking truly human-like exchanges with virtual assistants is that their dialogue models are best fit for transactional exchanges: you say something, the assistant responds with a prompt for another response, and so on. But we’ve found that brands that are keen on taking advantage of the platform are looking for a more than a rigid experience. “There are plenty of requests from clients about assistants, who are under the impression that the user can say whatever,” says Sander van der Vegte, Head of MediaMonks Labs. “What you expect from a human assistant is to speak open-ended and get a response, so it’s natural to assume a digital assistant will react similarly.” But this conversation structure goes against the grain for how these platforms typically work, which means we must find new approaches that better accommodate the experiences that brands seek to provide their users.

Giving Digital Assistants the Human Touch

One way to make conversations with voice assistants more human-like is to empower them with a distinctly human trait: emotional intelligence. MediaMonks Labs is experimenting with this by developing a Google Assistant action that serves as a marriage counselor that uses sentiment analysis to draw out the intent and meaning behind user statements.

Monk Thoughts This is the first step down an ongoing path for deeper, richer conversation.

“If the assistant moves in this direction, you’ll get a far more valuable user experience,” says van der Vegte. One example of how emotional intelligence can better support the user outside of a counseling context would be if the user asks for directions somewhere in a way that indicates they’re stressed. Realizing that a stressed user who’s in a hurry probably doesn’t want to spend time wrangling with route options, the action could make the choice to provide the fastest route.

As assistants become better equipped to listen and respond with emotional intelligence, their capabilities will expand to provide better and more engaging user experiences. In a best-case scenario, an assistant might identify user sentiment and use that knowledge to recommend a relevant service, like prompting a tired-sounding user to take a rest. Such an advancement would allow brands to forge a deeper connection to users by providing the right service at the right place in time. While Westworld-level AI is still far off in the distance, we’ll continue chatting and tinkering away at teaching our own bots the fine art of conversation—and we can’t wait to see what they’ll say next.

Monk Thoughts We can learn to speak more effectively to an AI, just like how AI learns to speak to us.

To better understand what this looks like, consider how two humans effectively resolve a conflict. Rather than accuse someone of acting a certain way, for example, it’s preferable to use “I messages” about how others’ actions make you feel, so the other party doesn’t feel attacked. So whether you begin a statement with “you” (accusatory) or “I” (garnering empathy) can have a profound impact on how others invested in a conflict will respond. Likewise, our marriage counseling action analyzes the vocabulary and inflection in two users’ statements to dole out relationship advice to them. Responses are focused not just on what they say but how they say it.

“We can learn to speak more effectively to an AI, just like how AI learns to speak to us,” says Joe Mango, Creative Technologist at MediaMonks. According to him, users have been conditioned to speak to bots in, well, robotic ways through their experience with them. “When we had someone from our team test the action by simply speaking to it, he wasn’t sure what to say at first.”

Sentiment

Speaking a New Language

The action takes a large departure from the standard conversational setup with a voice bot. Rather than have a back-and-forth chat with a single user, the action listens attentively as two users speak to one another. Allowing Google Assistant to pull off such a feat gets at the heart of why so few actions provide such rich conversational experiences: the inherent limitations of the natural language processing platforms that power them. For example, the Google Assistant breaks conversation down into a “you say this, I say that”-style structure that limits the amount of time it opens the microphone to listen to a user response.

Monk Thoughts We always want to push the limitation of the frameworks to provide new experiences and added value.

Conventional wisdom surrounding conversational design shies away from “wide-focus” questions, encouraging developers to be as pointed and specific as possible so users can answer in just a word or two. But we think breaking out of this structure is not only feasible, but capable of providing the next big step in richer, more genuine interactions between people and brands. “It speaks a lot to the mission of what we do at Labs,” said Mango. “We always want to push the limitation of the frameworks out there to provide for new experiences with added value.”

What does such an interaction look like? When a couple tested the marriage counselor action, one user mentioned his relationship with his brothers: some of them were close, but the user felt that he was becoming distant from one of them. In response, the assistant chimed in to remind the user that it was good that he had a series of close relationships to confide in. Its ability to provide a healthy perspective in response to a one-off comment—a comment not even about the user’s romantic relationship, but still relevant to his emotional well-being—was surprising.

Screen Shot 2019-01-31 at 10.23.54 AM
Screen Shot 2019-01-31 at 10.35.13 AM

Next Stop: More Proactive, Responsive Assistants

While the action is effective, “It’s just the first step down an ongoing path to support more dynamic sentence structures and deeper, richer conversation,” says Mango. While the focus right now is on inflection and vocabulary, future iterations of the action could draw on users’ tone of voice to glean their sentiment even more accurately. From there, findings from this experiment aid in providing other voice apps a level of emotional intelligence that helps organizations engage with their audience in even more human-like ways.

Voice assistants have been life changing for some users, but they can go to even further lengths in providing rich, valuable conversational experiences. The next big leap in conversational AI may be emotional intelligence. Transitioning Voice Bots from ‘Book Smart’ to ‘Street Smart’ Checking the weather or a sports score is nice, but can a smart speaker save your marriage? We’re working on it.
Google Assistant Alexa skills Google actions sentiment analysis emotional intelligence AI artificial intelligence conversational interface

Choose your language

Choose your language

The website has been translated to English with the help of Humans and AI

Dismiss