Zong Qing: Human Language Technology Outlook

Author: Zong Qing  . Institute of Automation, Chinese Academy of Sciences, CAAI Fellow, International Linguistics Committee (ICCL) calculated members, Chairman of the Asian Natural Language Processing Society (AFNLP). Mainly engaged in natural language processing, machine translation. Hosted by the National more than 10 projects, deputy editor of the national R & D program focused on key special chief scientist, ACMTALLIP and "Automation Technology." Former Chairman of ACL 2015 and COLING 2020 world-class conference program committee, served as chairman IJCAI and AAAI fields. He won the second prize of National Science and Technology Progress Award, Qian Weichang Chinese information processing science and technology award first prize. Beijing was awarded excellent teachers, Baosteel excellent teachers and mentors honor outstanding Chinese Academy of Sciences.

 

Summary 

Machine Translation With the birth of the world's first computer with the emergence subsequently became one of the most challenging areas of artificial intelligence research. 70 Over the years, machine translation, automatic man-machine dialogue systems, text classification, automatic summarization and tortuous development process of information extraction as the representative of human language technology applications traversed, reflects the glory in the field of artificial intelligence from different sides Paul rise and fall. Based on a brief review of the development process of human language technology, based on research and highlights the main challenges facing the current status of the technology, and the future development trend outlook.

 

 

0 Reviews

Since 1956, AI (Artificial Intelligence, AI) since the concept was put forward, natural language understanding (Natural Language Understanding, NLU) has been one of the core issues of research in this area. Although the last century 60's proposed Computational Linguistics (Computational Linguistics, CL) and 1970s derived NLP (Natural Language Processing, NLP), respectively, from the concept of mathematical modeling and interpretation of language engineering point of each different extension, but there is no essential difference NLU, CL and NLP substance of these three terms and scientific issues to face, and its practical application goal is exactly the same. Therefore, in the case of people usually do not cause confusion with "human language technology" (Human Language Technology, HLT) refers to the study of this episode linguistics, computer science and cognitive science as one of the multi-disciplinary field.

 

Recalling the 70 years of the history of the development of human language technology, its technical methods can be roughly divided into three stages: ① from infancy to discipline the late 1980s and early 1990s, as a template using symbolic logic, rule-based method stage, belonging to the rationalist approach; ② from the last century the early 1990s to 2013, based on statistical machine learning methods for the mainstream empirical method period; ③ from after 2013, entered the mainstream with multilayer neural network method the link-era. Figure 1 shows the overall trend of approximately 70 years.

 

In the rationalist approach based historical stage, the main research work is to build high-quality dictionaries, analysis of natural language sentence rules and reasoning algorithm, implemented by symbolic reasoning and logic operations, conversion and generation, their representation theory is Noam Ki (N. Chomsky) syntactic structure theory.

 

In the empirical method for the mainstream method of historical stage, the main research work is to obtain samples of a large-scale training, study build quality labeling system and automatic annotation algorithm, calculation models and algorithms built on statistical methods, achieved through debugging and optimization for the model parameters extrapolate and predict natural language processing tasks, the main theoretical basis of probability theory and information theory. At this stage, n-gram model was born (n-gram), hidden Markov model (Hidden Markov Model, HMM), SVM (Support Vector Machine, SVM), maximum entropy (Maximum Entropy, ME) and conditions Random (Conditional Random Fields, CRFs) and a series of statistical learning methods, is widely used in natural language processing tasks. Statistical machine translation (Statistical Machine Translation, SMT) system was born, a group of open source tools publicly available, Google, Microsoft and Baidu and other companies have developed statistical machine system on the line, to promote the rapid development of this technology.

1 historical stage of technological development of HLT map

(The sign represents the point on the curve occurs in the event of the year mark, space is limited, not enumerated herein)

 

 

Following the 2006 GE Hinton (Hinton), who will be a multi-layer neural network image recognition method was successfully applied in 2009, Microsoft realized the speech recognition system based on multilayer neural network, and identify the error rate is decreasing sharply, deep learning methods large-scale application. 2014 Kyunghyun Cho New York University and Yoshua Bengio such as the University of Montreal in Canada who proposed encoder-based attention mechanism - the basic framework decoder (encoder-decoder), structural innovation neural network and secondary development, established based on neural network machine translation system, referred to as neural machine translation (neural Network based MT, NMT) system. On this basis, Google proposed Transformer model based entirely on the mechanism of attention in 2017, many domestic companies by tracking, improve, achieve their nerve translation engine, to provide users with ordinary machine translation service. Many small and medium enterprises to use open source Internet platforms and open data quickly build a machine translation system performance is acceptable, or the direct use of translation services companies such as Google and Baidu provided so that this area appears everywhere, thriving excellent situation. 2018 Google released the two-way pre-training model BERT (Bidirectional Encoder Representation from Transformers) but will this technology to a climax.

 

Status 1

Language technology involves many fields and branches, and branches of different directions with relative independence, the development of start and speed are not the same, whether it is the theoretical basis and key technology, or resource development and application of systems research and development, development on different levels the situation is different, it is difficult to generalize research status. The following is only part of the performance status of the application system are briefly summarized, hoping to achieve the effect of a piece of a jigsaw.

 

Machine translation of the most challenging research topics as natural language processing, its level of quality of the translation is largely represent the overall level of natural language processing technology. In recent years, especially after 2014, the proposed model of neural machine translation, machine translation translation quality has been significantly improved. For spoken language translation, in the language of the resources more fully on (such as English and Chinese, Japanese and Chinese, Britain and France), speaking in a scene is not very complicated, the basic standard accent, speed normal use of vocabulary and sentence is not very uncommon case, the performance of everyday language translation to meet the basic need to communicate. For text translation professional field, more fully in the training corpus translation accuracy rate can reach 75%. Translation accuracy of information in the field of large span, overall, the accuracy of news translation of the basic text around 70%. For high quality translations translation tasks, such as speech speeches or writings, literary classics, as well as speeches and dialogue in a serious scene leaders (including leaders answered reporters' questions, or have more serious talks and accent dialogue, etc.), machine translation systems could not do. In the foreseeable future can not see the machine translation system will replace the possibility of Rengongfanyi. As for the scarce resources of small languages ​​(such as Urdu, Persian, etc.) translation between Chinese and the current machine translation systems can only get information quickly for the purpose of helping people generally understand the theme and content of the original.

 

Human-computer dialogue system has been the focus of attention, as well as natural language processing highly representative of the research tasks. Dialogue system usually includes dialogue system task-oriented dialogue system (task-oriented dialog system) and open domains (open-domain dialog system) two categories. The former is called task-based dialogue system, such as airline reservation systems; the latter is called chat dialogue systems, such as chat robots. At present dialogue system of academic research have adopted the basic data-driven approach, especially after the end of the neural network model is proposed, almost become a unified framework for realization of similar tasks. Performance of such systems depends largely on the size and quality of training samples. Interestingly, task-based dialogue system is currently commercially available basically using rule-based implementation. For dialogue systems in specific areas and specific tasks, the task completion accuracy rate can reach 75%, which for certain areas or industries that require repetitive tasks a lot larger service staff is completed, We have been able to significantly reduce the human resources section, improve work efficiency.

 

Overall, natural language processing has achieved fruitful results, new models and methods are constantly being raised, and has been successfully applied; many applications has been widely used, and direct services in all aspects of social life. However, natural language processing is still facing a number of challenges, like a man far from understanding the extent of the language. The main problem currently facing can be summarized into the following five points:

 

(1) lack of knowledge representation and effective use of tools

Here's knowledge, including knowledge, domain knowledge, experience, knowledge and linguistic knowledge experts. For the most part the domain knowledge and linguistic knowledge can learn from large-scale training sample to a certain extent, but a lot of knowledge and expertise are often "outside the training samples range". For example, "Premier Li" means that once in a very long period of time on behalf of Premier Li Peng, but should refer to the current Prime Minister Li Keqiang; "transformers" refers to reformers in the political sphere, refer to the transformer in the power system, the middle finger Transformers toys in children, and converter means in natural language processing. So, what specifically refers to the need to identify the context and background areas. Again, the chickens and rabbits with cage problem solving, critical knowledge is chicken has two legs, the rabbit has four legs. Without this knowledge, this problem can not be solved. For people, this knowledge is standing; and for the machine is concerned, it is difficult from the sample (especially those with limited small-scale sample) inductive learning it.

 

(2) the lack of a phenomenon unknown language processing capabilities

For any natural language processing systems, always encounter unknown words, unknown language structure and semantic unknown. The so-called "unknown" that is not appeared in the training sample and dictionary. Any language in the world in the development of society and the dynamics of change and evolution, new words, new meanings and new sentence structures are emerging, non microblogging, chat and conversation, etc. These phenomena regulate the expression is particularly prominent. For example, the meaning of "Li Jufu" represents a "reasoned convincingly"; "sounded in cattle" means "tears"; and the like. If the front-end input system is the result of the phenomenon of voice or image, voice recognition, or OCR processing contains a lot of noise, it is also very common. Therefore, a practical natural language processing systems must have good language unknown phenomenon and processing power noise, namely robustness (robustness).

 

(3) the lack of explanatory models and "replicability" capability

Although machine learning methods include neural networks, have already played an important role in a variety of applications of natural language processing tasks and key technology research and development, but these are, after all, used to calculate the probability as the basic means of "gambling" thinking, its performance depends heavily on the quality and size of training samples, when the test sample and the training samples are quite different, the sharp decline in the performance of the model, more out of the question "learn by analogy." From a purely natural language understanding point of view, the performance of the current model is still very limited, especially the lack of reasonable explanatory. What are the reasons for a given input, model error and loss of data in the "black box" transformation process is? What each layer transformation mean? The end result of the reliability of how much? There is no reasonable explanation.

 

(4) the lack of interactive learning and the ability of independent evolution

Natural language processing system will continue to get feedback from users in the actual use of the process, including the amendments to the system results for the system to add new vocabulary to explain and to add new dimension data. Traditional machine learning methods is to add the user feedback to the training data, repeated "training - test" cycle, in order to achieve continuous optimization model. However, this method usually requires a longer iteration cycle, difficult to effectively utilize real-time feedback. Human analogy interactive learning ability, an intelligent system should have the ability to interact with online learning, ie learning from user interaction with the system, additions and amendments to existing knowledge, in order to achieve the effect of self-evolution model, and this learning and evolution is a lifetime (life-long learning).

 

Limitations (5) a single modal information processing

Current natural language processing research generally refers to text processing object, the model generally does not involve additional processing information, such as voice, image and video information, using voice recognition or OCR most distal pretreated as in some scenarios between each module is independent of, and information processing voice, images and video are out of touch, which is a serious violation of "human-like intelligence," the basic premise. For people, usually the "Big World, ears," say the words, write words, to see the actual situation is the same, and the information from the various organs are complementary and validation. Just think, the same word means different tone accents and gestures expressed, meaning it may be completely different. Thus, the multi-modal information utilization, coordination process, is imperative.

 

In addition, when talking about the overall status of human language technologies, our country had to give full recognition and praise in the rapidly emerging field. The past 10 years, China's processing natural language is developing rapidly, the number of papers published both in top international academic conferences (ACL, EMNLP, COLING, AAAI, IJCAI, WWW, etc.) and journals, or Chinese scholars in relevant international academic organization case held important positions, are indisputably marks a pivotal position and unstoppable development trend of China have in this area. However, it is regrettable that, in this area in the country but did not get their due status and the right to speak.

 

 

2 Prospects

As an important field of artificial intelligence research branch and language technology research involves not only the characteristics and patterns of morphological (form), syntax, semantics and discourse linguistics itself, the need to address the fundamental key issues, and the need for practical application to build machine translation , automatic summarization, sentiment analysis, mathematical models and methods for a particular task dialog systems. I believe that, ultimately solve the problem of human language understanding, so that the performance of the relevant application system to a higher level to meet the individual needs of users, like human beings even truly understand the language, the following three areas will be important for future development direction.

 

(1) closely integrated with neuroscience to explore the human brain to understand the neural basis of language, build more accurate, interpretable, semantic representation and computational methods computable

How the human brain is the characterization and processing of text semantics, this is a mystery. Compared to visual and auditory nervous system, the current system language for understanding the human brain is still very preliminary. In recent years, natural language processing method for data driven in many ways effectively compensate the deficiency of the conventional method, however, as previously described, there are many drawbacks inherent in data-driven approach, including dependence on the performance of training samples, the model interpretability and knowledge representation, access to and use and other issues, and summarized the human brain on a small sample of data, the ability to abstract and giving top priority at present is precisely the depth learning methods are not available, then discover how to simulate the human brain and language comprehension the mechanism, to build brain-like model for understanding language, is a challenging issue before us.

 

(2) build quality basic resources and technology platform

Whether in the form of symbolic logic and rules of operation based rationalist approach, empirical methods or data-driven, quality basic fundamental resources is essential. Here's resource base including high-quality, large-scale knowledge base, bilingual dictionary and parallel sentence pairs, for specific tasks stated in the catalog and so on. Although the mapping knowledge has become the focus of current research, and has built several large-scale mapping knowledge, however, there is no standardized mapping knowledge representation, for the common areas, the scale mapping of knowledge in the end should be how much? How granularity of knowledge representation division? How to use common sense and representation? For specific application domain-specific knowledge map should be how to build? Etc., many issues before us. For many languages, particularly minority languages, available data resources are very poor, and even many Chinese language and corresponding bilingual dictionaries are not, such as Persian and Chinese, Urdu and Chinese, Dari and Chinese, etc., and more not to mention large-scale bilingual parallel corpus. High-quality key technology tools both for subsequent application tasks which are indispensable, such as named entity recognition tools, some form of language analysis tools.

 

(3) Different barriers to open modal information processing method and model construction processing multimodal information fusion

As mentioned above, the existing voice, language, image and video processing research essentially "kept to themselves," go their perfectly normal paths, and in the application tasks in real situations often require an integrated multi-modal information use and understand the process from the perspective of language simulate the human brain, all kinds of things in the perception of the comprehensive utilization of information also makes sense.

 

In summary, the current human language technology has been widely used, but its level of performance is basically stuck in "processing" level, far from the "understanding" level, the next task is arduous and challenging. At the same time, I have to say that the Chinese have their own unique patterns and hot spots, no matter from which point of speaking, research and development at the core of the Chinese natural language processing technology should not become a neglected blind spot.

 

 

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104560850