Natural Language Processing (turned over Zhao teacher) study notes one: Summary

First, the object of natural language processing : text (Source: image, voice and text).

Second, Prerequisite:

  1. Mathematical Analysis
  2. Probability and Mathematical Statistics
  3. Line to Algebra
  4. Analytic Geometry
  5. Data Structures and Algorithms basis
  6. Programming languages: C / C ++, python
  7. Machine learning foundation

Third, the natural language concepts

  1. Natural language is human language, such as English, Chinese and so on.
  2. And computer programming in different languages, natural language is communication: speaking and writing
  3. In the form of natural language are: written and oral form. Said today's natural language processing for a processing written language.

Fourth, several terms and concepts:

  • Natural Language Processing (NLP, Natural Language Processing): through algorithms, statistical or common sense ways to deal specifically with the subject of language.
  • Natural language understanding (NLU, Natural Language Understanding): a real understanding of the text of some kind of natural language.
  • Computational Linguistics (Computational Linguistic): Linguistic analysis, natural language processing, machine or computer tries to simulate human language ability. For now, the same computational linguistics and natural language processing direction, the two can be seen as different names for the same thing.

 

Fifth, the graph in other disciplines as follows:

 

 

Sixth, the technical challenges of natural language processing

1. Knowledge angle (knowledge of the core issues of artificial intelligence) point of view, to deal with natural language processing and two types of knowledge:

    • Common sense knowledge: knowledge of natural language entities.
    • Knowledge of the language: natural language processing of all kinds of parts of speech, syntax, formal semantics.

Modern natural language processing is a processing of these two types of knowledge of research and engineering branch of the language.

2. Compared accurate than the computer programming language, a unique and unambiguous definition, the type of mapping between natural language and semantics of the form:

    • One to one, many-to-many or many-mapping
    • Many mapping a lot of knowledge to another outside need to enter the target in the form of representation to make the right choice.

3. difficult example:

    • Adhesion problems modifier (Modifier attachment problem), for the following sentence, in the end making you crazy or is modified job problems

                 Show me all problems in your job making you crazy。

    • Quantifier scoping issues (Quantifier scoping problem) for example, each be represented in English All common range (Ɐ) or (ǝ) refers to many different understood.
    • A question of interpretation of acronyms or omitted sentences (Elliptical utterances) may depend on the previous question and its interpretation. A below asking "Where?"

                 For example: A: next year is not to report the Olympics? B: Yes. A: Where? B: Tokyo, Japan

 

Seven, machine translation research history

1. In 1949 Warren Weaver out of the computer may be useful to solve the translation problem worldwide. 70 years later, today, the quality of translation is still unsatisfactory, only a rough cause, far does not apply to a document output for formal occasions. This makes people realize the problem, human language translation is a complex cognitive and processing capabilities, related to different types of knowledge:

    • Sentence Structure
    • Meaning of a word
    • Listener model (user model)
    • Regular dialogue (dialogue translation)
    • Widespread sharing of information about the world

2.1964 published by John R. Pierce ALPAC (Automatic Language Processing Advisory Committee) report states that the likelihood of meaningful machine translation of the short term impact of negative. Since then, the machine translation has entered a low period of 30 years.

The era of the late 3.80's and early 90's, the IMB model proposed opening Statistical Machine Translation Statistical machine translation (SMT), the machine translation trough started to recover.

4. early twentieth century, the minimum error rate training (MERT) in conjunction with automatic translation quality assessment scores (BLEU) led the victory statistical machine translation into the period, especially NMT (Neural Machine Transaction) 2014 Nian Google DeepMind proposed nerve machine translation making the machine translation has entered a new era.

 

Eight language processing levels

 Research objectives (a) natural language processing:

    • The development of practical and effective language processing and analysis system
    • Better understand the nature of language and of low intelligence

(B) James Allen raised the level of linguistic analysis

1. morphological analysis (lexical analysis) Morphological Analysis, refers to the identified stem from the intact form of written word, the word forms also may comprise stem syntactic category identification, i.e. speech analysis. For example, the English word cowardly = coward (stem) + ly (suffix), ly noun becomes an adjective.

  Chinese morphological or lexical analysis or most East Asian languages ​​and English is different, Chinese is no spacing between words sentence writing style, so this requires, from the sentence (that is, the word sequence) Parsing words, this called Chinese word processing.

  Most natural language analysis systems typically first need to sign the text is divided into units have linguistic significance. Broadly speaking, the process comprising word (segmentation), the prototype extraction word, speech tagging and named entity / phrase recognition, a large class of lexical processing tasks.

2. parsing (deep, shallow parsing) (Syntax) 

  Syntax and semantics of the language is the concept of two levels of the association. Syntax sometimes not strict enough to be called syntax or grammar (grammer). Strictly speaking, grammar = syntax + semantics.

  It refers to syntax defining the relative positional relationship between the inner formal sentence components. In general, the syntax = dictionary + rules. Syntax goal of the analysis is to assign to each sentence ingredients syntactic category labels, and to determine the relationship between the various components of syntax.

3. Semantic Analysis Semantic

   The purpose is to complete semantic analysis of the meaning of words (utterances) give meaning, including the meaning and sense Combination, and this is the significance of a context-free.

  Contextual semantic analysis comprising:

    • Sentence level semantic role annotation tasks: internal predicate sentence given - argument structure.
    • WSD
    • Anaphora resolution

4. Pragmatic Analysis of Pragmatics

It refers to the relationship between text or symbols from session to session producer / user. His significant impact on the different situations in the context of the background, the interpretation of discourse. This part of the work difficult, there is no breakthrough in this regard.

5. Discourse Analysis (text analysis) Discourse analysis for the overall discussion of the structure of the text, it is also responsible for analyzing the relationship between the text sentences.

6. 世界知识分析:世界知识是指不受限制的常识知识,这个任务是负责推断出每个语言用户必须具备的一般世界知识。例如,用户在对话中的目的和价值观。

从词汇、句法直到世界知识,下一层就是上一层的基础,当下一层表述不合理时,上一层也无法实现正确表达。

(三)层次分析举例

    1. Grey elephants have long noses. [ syntax √  sematics √ ]

    2. White cloud have long noses. [ syntax √ Sematics × ]

    3.  Long have white cloud noses. [ Syntax × Sematics ×  ]

 

 九、自然语言处理系统的两个应用

从历史上出现的两个人机对话系统来看,我们看看自然语言处理是如何用语言知识来应对一般知识的实际应用场景的。

一个是ELIZA【Webizenbaum, 1966】,它 是1966年在麻省理工学院完成的最著名的模式匹配自然语言处理系统。该系统在对话中扮演心理医生的角色,使用模式匹配的方法进行输入并把它转换成适当的输出。通过下面地址访问:https://www.masswerk.at/elizabot/

第二个,20世纪70年代,专家系统确立知识是AI核心问题的共识。也是使人们重拾AI信心的标志。LUNAR【William Woods,1973】是通过使用ATN句法分析器和规则驱动的语义解释过程来将英语输入的问题翻译成正式的查询语言的表达式来进行操作,以帮助地质学家访问、比较和评估月球岩石上的化学数据和土壤成份。

 

 十、研究主题及主要学术组织和会议

1. 对比ACL 2020和2010的征稿启事的主题列表,十年中这个主题变化并不是很大, 这与深度学习深度渗透NLP 的情景似乎不符,但也清理之中,这是因为征稿主题是任务列表而非方法,但机器学习,包括深度学习恰恰是方法。

2. 自然语言处理的主要的国际学术组织ACL及其学术会议

    • ACL 的全称如下:
      •  Annual Meeting of the Association for Computational Linguistics 
      • the Association for Computational Linguistics
    • ACL的在线文献库:https://www.aclweb.org/anthology/

3. 国内的自然语言处理会议

1). 中国计算语言学大会  —— The China National Conference on Computational Linguistics (CCL)

    • 首届主办于1991年
    • 中国中文信息学会(Chinese information processing society, CIPS)
    • 计算语言学专委会(Technical committee of computational linguistics)
    • 网址:http://www.cips-cl.org/static/CCL2019/index.html 

2).自然语言处理与中文计算国际会议 —— The CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC)

    • 首届主办于2012年
    • 中国计算机学会(China Computer Federation, CCF)
    • 自然语言处理与中文计算专委会(Technical Committee of NLP and Chinese Computation)
    • 网址:http://tcci.ccf.org.cn/conference/2019/

 

 

Guess you like

Origin www.cnblogs.com/markkang/p/12107506.html