Simple analysis of intelligent voice interactive technology

Advances in machine learning and natural language processing technology provides the possibility of interactive voice and artificial intelligence. People can get information through dialogue and interact with the machine, and the machine will no longer exist only in science fiction. Voice interaction is the future direction of development. Intelligent voice interaction speaker is the landing of the first generation of products.

To face popular smart phone market robot, for example, his AI module consists mainly of four parts automatic speech recognition (Automatic Speech Recognition, ASR), natural language understanding (Natural Language Understanding, NLU)
, natural language generation (Natural Language Generation, NLG), text to speech (text to speech,
TTS) .
Simple analysis of intelligent voice interactive technology
Europe can voice interactive flow chart of intelligent robot as an example, take a look at the main path of dialogue Ai intelligent robot technology:
Simple analysis of intelligent voice interactive technology
On the chart we can see, when the user is done interacting with the device, and is mainly used handling voice.

To complete the full completion of voice interaction, we must understand the process, which is the key to a telephone robot really easy to use, and a successful voice interaction process involves the following four stages, and a chain.

The intention of
showing the operation of the user when using the application made (for example: ask a question or send a command), which are intended to represent the core functionality of the application. If the application successfully identify the user's intention, it is necessary after the completion of operational actions, the results back to the user.

Is intended to identify - Semantic Analysis
of the speech recognition results were analyzed appreciated that is simply to map the user's voice input to machine instructions. It is possible to define a set of grammatical structure contains the specified words or phrases spoken by the user to meet the statement of this structure, intention to call.

Processing intent - cloud interaction
structured request call intent, to make server requests feedback response after treatment. Popular terms designed primarily to handle user requests to address user questions.

Speech synthesis module - language organization
based on internal analysis module to get representation, natural language sentence generated under the action of DMs. While converting into a sentence generation module generates the speech output. (The answer machine language converted to spoken language)

The entire process may seem simple, but the key technology of natural language processing is necessary to master the middle of.

Lexical analysis
lexical analysis included morphology and vocabulary. In general, the morpheme is mainly reflected in the analysis of prefixes and suffixes of words, vocabulary words reflected in control of the entire system. In the Chinese text retrieval system, the information in Chinese, that Chinese automatic word Word Segmentation technology, lexical analysis were seen. With this technique, the characteristics can be properly parse the user input information, the search process is completed correctly. It is an important development direction of Chinese full-text search technology.

Parsing
parsing natural language is input by the user is analyzed word phrase, a sentence is to identify the purpose of syntactic structure, automatic syntactic analysis. The basic analysis method of FIG wired phrase structure analysis, complete parsing, partial parsing, dependency parsing the like.

Semantic analysis
semantic analysis is an analytical method based on natural language semantic information, which is not only lexical analysis and syntactic analysis grammar analysis on this level, but it comes to the word, the meaning of phrases, sentences, paragraphs contained. Its purpose is a configuration of the speech from the semantic structure of the sentence. Chinese semantic analysis method is an analysis method based on semantic network. Semantic Web is a structured, flexible, clear, concise expression.

Pragmatic Analysis of
a more advanced linguistic analysis with respect to the semantic analysis adds an analysis of the context, language background, environment, etc., extracted from the structure of the article in the additional information image, relationships, etc., it is pragmatic analysis. Content will be associated with the statement of the details of real life, to form a dynamic ideographic structure.

Contextual Analysis
Contextual analysis mainly refers to the original query large number of "gap" than the discourse analyzed to more accurately interpret the language to query technology. These "gaps" include general knowledge, such as knowledge and the need to query the user's specific areas. It will link the natural language of the physical world and the objective and subjective mental world together, complements lexical, semantic and pragmatic analysis of deficiencies.

 Smart phone conversation robot current problems
during a session man and machine, the user is bound to make mistakes expression, lead to deviations from the user mechanistic understanding of the language. In this case, the error correction mechanism is very important for the machine. If you can not use this mechanism, you obviously need to take a long time to explain their intentions, and the corresponding user experience is very bad now. On the other hand, it can well recognize the voice, but can not understand the purpose of the dialogue, have to understand the significance of bias. Thus, the voice dialogue process should solve the problem is to eliminate blur and unknown linguistic phenomena.

Intelligent voice assistant behind the ecosystem services
a good smart phone robot in order to fall flowering, it is not just a simple voice recognition so simple, as well as integration services, Chinese ecological, content, services and other facilities complete set, is a the ability to cover a lot of basic ecosystem.
Future based on semantic voice interaction skills, and must be able to reach tens of thousands, hundreds of thousands or even millions of times, in order to promote the era of interactive voice operating system's growing maturity, the next voice interaction products will increasingly shape and style rich.

Guess you like

Origin blog.51cto.com/14387331/2411108