Introduction to NLP Natural Language Processing

1. What is NLP

NLP (Natural Language Processing, Natural Language Processing) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language (excerpted from Baidu Encyclopedia ).

Different languages ​​cannot communicate directly. For example, humans cannot understand dogs barking, and even humans of different languages ​​cannot communicate directly. They need translation to understand their respective meanings.

For humans and computers, NLP is a bridge between machine language and human language to achieve the purpose of human-computer communication.

 NLP consists of the following two parts:

  • NLU (Natural Language Understanding, natural language understanding)
  • NLG (Natural Language Generation, natural language generation)

2. NLU Natural Language Understanding

NLU (Natural Language Understanding) is a general term for all method models or tasks that support machine understanding of text content, including word segmentation, part-of-speech tagging, syntactic analysis, text classification/clustering, information extraction/automatic summarization and other tasks. Simply put, it is hoped that computers can have normal language understanding capabilities like humans.

Let’s take an example of “booking a flight ticket”: we can express it in many ways

  • Is there a flight to Shanghai?
  • Book a plane ticket to Shanghai and leave next Tuesday.
  • I will be on a business trip to Shanghai next Tuesday, please check the air ticket for me.
  • I want to take the nearest plane to Shanghai.
  • ……

It can be said that there are infinitely many expressions in natural language for "booking a flight ticket", and this is a huge challenge for computers. Before artificial intelligence was introduced, computers could only identify intent based on rules. For example, if "booking air tickets" is used as a keyword, if there is no such keyword in the text, it will not be possible to accurately identify the user's intention. Or as long as there are keywords, such as "I want to cancel the flight ticket", it will also be processed as the user wants to book a flight ticket.

The purpose of natural language understanding is to accurately identify the user's intent.

Natural language understanding is similar to the development history of the entire artificial intelligence, and has gone through 3 iterations:

  1.  Rule-based method: Judging the intent of natural language by summarizing the rules. Common methods include: CFG ( Contextually Concerned Grammar ), JSGF (JSpeech Grammar Format), etc.
  2. Statistics-based methods: perform statistics and analysis on language information, and extract semantic features from them. Common methods include: SVM (Support Vector Machine), HMM (Hidden Markov Model), MEMM (Maximum Entropy Markov Model), CRF (Conditional Random Field), etc.
  3. Methods based on deep learning: CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory Network), Transformer , etc.

3. NLG natural language generation

NLG (Natural Language Generation) is a software process that automatically converts structured data into human-readable text.

6 Steps to NLG

Step 1: Content Determination – Content Determination

As a first step, the NLG system needs to decide what information should be included in the text being constructed and what should not be included. Often the data contains more information than is ultimately conveyed.

Step 2: Text Structuring – Text Structuring

After determining what information needs to be conveyed, the NLG system needs to organize the order of the text reasonably. For example, when reporting a basketball game, it will give priority to expressing "when", "where" and "which 2 teams", then express "the general situation of the game", and finally express "the end of the game".

Step 3: Sentence Aggregation – Sentence Aggregation

Not every piece of information needs to be expressed in an independent sentence. It may be more fluent and easier to read by combining multiple pieces of information into one sentence.

Step 4: Grammaticalization – Lexicalisation

When the content of each sentence is determined, the information can be organized into natural language. This step will add some linking words between various information, which looks more like a complete sentence.

Step 5: Referring Expression Generation – Referring Expression Generation|REG

This step is very similar to grammaticalization, where words and phrases are selected to form a complete sentence. However, the essential difference between him and grammaticalization is that "REG needs to identify the domain of the content, and then use the vocabulary of this domain (rather than other domains)".

Step 6: Language Realization – Linguistic Realization

Finally, when all relevant words and phrases have been identified, they need to be combined to form a well-structured complete sentence.

No matter how NLG is applied, most of them have the following three purposes:

  1. Capable of generating personalized content at scale
  2. Help humans gain insight into data and make data easier to understand
  3. Accelerate content production

4. Three levels of analysis in NLP processing

The first level: lexical analysis

Lexical analysis includes Chinese word segmentation and part-of-speech tagging.

  • Word segmentation: split the input text into individual words
  • Part-of-speech tagging: Assign a category to each word. Classes can be nouns, verbs, adjectives, etc.; words belonging to the same part of speech assume similar roles in syntax.

The second level: syntactic analysis

Syntactic analysis is the process of analyzing the input text in units of sentences to obtain the syntactic structure of the sentence.

Three mainstream syntactic analysis methods:

  • Phrase structure syntax system, which is used to identify the phrase structure in a sentence and the hierarchical syntactic relationship between phrases (between dependency parsing and deep grammar parsing)
  • Dependency structure syntax system (belonging to shallow syntactic analysis), the function is to identify the interdependence between words in sentences; the implementation process is relatively simple and suitable for application in multilingual environments, but the information it can provide is relatively small
  • Deep grammar and syntax analysis, using deep grammar to perform deep syntactic and semantic analysis on sentences. For example, lexical tree adjacency grammar, combined category grammar, etc. are all deep grammars; deep grammar syntax analysis can provide rich syntactic and semantic information; deep grammar is relatively complex, and the operating complexity of the analyzer is relatively high, which is not suitable for processing large-scale data.

The third level: semantic analysis

The ultimate goal of semantic analysis is to understand the true semantics expressed by sentences. Semantic representation has not yet had a unified solution.

1. Semantic role labeling (semantic role labeling) is a relatively mature shallow semantic analysis technology.
Semantic role labeling is generally done on the basis of syntactic analysis, and the syntactic structure is crucial to the performance of semantic role labeling. Usually, cascading is used to train the model module by module.

  • Participle
  • part-of-speech tagging
  • Syntax analysis
  • Semantic Analysis

2. The joint model (a newly developed method) jointly learns and decodes multiple tasks. The joint model can usually significantly improve the analysis quality, but the joint model is more complex and slower.

  • part-of-speech combination
  • part-of-speech combination
  • part-of-speech syntactic combination
  • syntax-semantic combination, etc.

5. Reference documents

Understand natural language processing NLP in one article (4 applications + 5 difficulties + 6 implementation steps)

Understand natural language generation in one article - NLG (6 implementation steps + 3 typical applications) - Product Manager's Artificial Intelligence Learning Library

Introduction to Artificial Intelligence (9) - Natural Language Processing (Natural Language Processing)_hustlei's Blog-CSDN Blog_Artificial Intelligence Natural Language Processing

Guess you like

Origin blog.csdn.net/qq_37771475/article/details/126765564