Repairing interrupted questions make voice available more agents

Everyone had the experience of pausing in the middle of a conversation and trying to conjure up a forgotten word. These breaks can be so pronounced that today’s voice assistants fail them at the ends of users’ sentences. When this happens, the whole swing has been repeated.

This is frustrating for all users, but some use groups are affected more than others – often the group that can be most benefit from voice assistants. During conversations breaks, for example, people with dementia more often and for longer duration than others.

Related content

Quick technique allows researchers to generate customized training exams for light “students” models.

At Alexa AI, we experienced with several speech -costing pipelines in an attempt to solve this problem. Our most successful approach involved a model that learned to “understand” incomplete sentences. To train this model, we adapted two existing data sets, shortened their sentences and paired each sentence with a graph -based semantic representation.

One of the truncated data sets we presented at the ACM conference on conversation user interfaces (CUI) earlier this year contains only questions; The second data set we will present next week at Interspeesch contains more general sentences.

The graphs in our data set capture the semantics of each word in each sentence and the relationship between words. As we truncated the original sentences, we also removed the sections of the graphs that were contributed with the removed words.

A color -coded chart of a sentence and its corresponding graphresignation. The colors indicate which sections of the graph contribute each word.

We used these data sets to train a model that takes an incomplete phrase as input and emits the corresponding incomplete semantic graph. The partial graphs are fed on their side in a model that ends the graph and its output is converted to text -thongstream treatment.

Clarification Architecture.expanded.png

Related content

New approach improves F1 score for clarification questions by 81%.

By involving semantic parsing tests, we compared the results of the use of our utterances and using the full uninterrupted questions. In the ideal case, the outputs would be the same for both sets of input.

In the question that answers, the model that received our happy questions answered only 0.77% fewer questions than the model considering the full questions. Using the more general corpus, we only lost 1.6% in graph equality F-score, which factors in both false-positive and false-negative rate.

More natural conversation

This work is part of a broad effort to make interactions with Alexa more natural and human -like. To get a sense of the problem we are trying to add, read the following beds Fragment slowly and focus on how the addition of each word increases your understanding:

Yesterday Susan ate some biscuits with …

Maybe Susan ate biscuits with cheese, with a fork or with his older … The ending doesn’t matter. You don’t have to read the end of this sentence to understand that more biscuits were eaten by Susan Yesterdy and you build this understanding words for words.

In conversation, when the incomplete people typically ask for a clarification, like questions in this example:

Susan: “Who was the father of …”
AMIT: “Sorry, who?”
Susan: “Prince Harry”
AMIT: “Oh, King Charles III”

Amazon Science -Tri-graph-01.png

Related content

EMNLP papers examine limited generation of rewrite candidates and automatic selection of information-rich training data.

Our two papers show that computer systems can successfully understand incomplete sentences, which means that natural interactions like this should be possible.

These findings are of the utmost importance to make Alexa more accessible. People who have dementia find Alexa incredibly useful. They can set up reminders, get involved in family meals by choosing recipes and accessing music more easily. If future systems can seamlessly recover when any breaks effortlessly, people with dementia will be able to enjoy these benefits with minimal frustration.

Our work also confirms that it is possible to correct error recognition errors through natural interactions. We pronounce all words (like when we throw the weather in Llanfairpwllgwyng), but wrongly constitute are especially common among people with speech inhibitions, muscular dystrophy, motor neuron disease in the early stage and even hearing immersion.

Similarly, it is difficult to hear a word in the middle of the world when a dog beams. We show that future voice assistants can identify and clarify unclear words through natural interaction and improve the user experience for people with non-standard speech. This also improves voice agents’ robustness to noisy around, such as family houses and public spaces.

We hope that rewarding our corpora will inspire other researchers to work on this problem as well and improve the natural interactivity and availability of future voice assistants.

Leave a Comment