A natural language processing NLP study notes: Study concept and model

  • Foreword

First look at some of the demo, to a number of intuitive understanding.

Natural Language Processing:

Do Chinese word, part of speech analysis, text summaries, in preparation for the following knowledge map.

http://xiaosi.trs.cn/demo/rs/demo

Knowledge map:

https://www.sogou.com/tupu/person.html?q = Andy Lau

There are two examples of practical applications, deepen their understanding of the NLP

 Nine Songs robot:

https://jiuge.thunlp.cn/

Microsoft couplet robot:

http://duilian.msra.cn/

 

 

  • NLP Overview:

Natural Language Processing, is a technology research how to use computer technology language text (sentences, chapters, or discourse, etc.) for processing and processed.

One of the key ideas of NLP is to convert words into digital vector and then put these figures into vector machine learning models predict.

Deep learning is one of the key technologies of NLP.

Natural language processing technology is the basis for building knowledge map.

 

The main challenge facing the subject:

New words appeared in large numbers: Example: Buddha Department

Everywhere ambiguity: Example: like the city's young 

Metaphors: Example: In the circle of friends diving

Different concepts in different languages: understanding between translation

 

Corresponding core challenges:

 -> semantic analysis

-> pragmatic analysis of the scene

 

  • NLP Application examples:

One: Chinese word

1. The maximum matching (rule based or template)

2. Based on the sub-lexical n-gram (conventional statistical methods, formula)

3. segmentation method based on neural network (most mainstream of the word method)

 

II: Machine Translation:

1. The method is based on templates / rules

2. Corpus-based approach

3. Neural machine translation method

 

Three: Simultaneous translation:

 Application Status:

1, various input methods;

2, voice

 

  • Common NLP methods:

1. The rule-based approach

2. Based on the statistical learning methods

---------------------------------------

Based on the basic methods of statistical methods / frameworks:

Framework: Learning System -> Model -> prediction system

Principle: training data (samples), with a label, the next input a predicted output

Such as buying insurance, buying and not buying a label, has bought insurance data for the samples. Our emphasis is on getting the training model

 

Commonly used statistical models:

6,7 model mainly species, different emphases, be selected to use different models.

1. The statistical model

1) language model (LM) - more complicated

2) Hidden Markov Model (HMM) ----- multi-classification (greater than 2)

3) K- neighbors (the KNN) ---- less data than can be selected

4) Naive Bayesian (NB)

5) decision tree (DT)

6) maximum entropy (maximum entropy)

-------- ------- second-class classification

7) Support Vector Machine (SVN)

8) Perceptron

------ ------- sequence labeling

9) Conditional Random Fields (CRFConditional random fields)

  

Based on a statistical algorithm, Viterbi algorithm: Only keep the best algorithm

 

Has a lot of open source tools, there is no list for yourself.

 

The basic learning-based approach:

Artificial neural networks are the hottest deep learning

 Deep learning was a major breakthrough in 2009, the recognition rate increased 10.4 percent a few points.

 

Neuro-linguistic model:

Conditional probability model, the probability of the number of occurrences statistics before, to maximize the probability forecast

The computer is unable to understand human language (intent), only the probability of occurrence of the largest taken out, then he needs a lot of training samples.

 

Neural network divided into two kinds: 

Shallow learning: LR, SVM, Bayes, boosting

Depth study: CNN, RNN, DBM, AutoEncoder

 

CNN: convolution RNN is not too much restraint

Core 2, a convolution: actually a weighted calculation; 2 is the largest of the pool.

RNN: CNN cycle for data input dimension constraints are more serious, what training is, what is the forecast

 

LSTM (length memory networks, RNN A) core processes:

3 doors, each door corresponding to a function. The results of each door is either 0 or 1

Forgotten Door: select which forgotten

Enter the door: Decide what input

The output of gate: decide what output

 

--------------------------------------------------------- 

notes:

1. N number of mathematical formulas, to in-depth study, have to learn math

2. The model ordinary people do not study

 

 

  • Knowledge Mapping

 

Mapping knowledge and depth of learning is equivalent to two roads to go in the future will be deep learning.

Can be used to predict the depth of learning, knowledge can not map, mapping knowledge similar to all exhaustive knowledge together, you want to search for knowledge on the line.

Knowledge maps can provide a depth of knowledge to learning, deep learning model and provide tools for knowledge map construction.

 

Knowledge maps are based on Semantic Web developed.

Semantic Web: The network has a certain relationship

 

Knowledge Mapping examples:

http://kw.fudan.edu.cn

http://zhishi.me

 

  • Text Mining

TF-IDF weighting

 

A bunch of concepts and terms:

NLU Natural Language Understanding

NLP Natural Language Processing 

MT Machine Translation

HLT human language technologies, including NLU, CL, MT

DL (deep learning) deep learning

NN (Neural Networks) neural network

RNN (Convolutional Neural Networks): convolution neural network

CNN: Recurrent Neural Networks

LSTM: short and long term memory network

 n-gram: input word, the output probability of the sentence

Word Embedding (insert word), the word is mapped to the vector space, and is represented by a vector.

Word2vec word vector representation, similar to map the packets to different parts of a word vector space, i.e., learn the relationship between words and words.

There are two methods Word2Vec, skip-gram and CBOW.

skip-gram: Enter a word, and then try to estimate the probability of other words that appear in the near term.

CBOW: continuous learning vocabulary

 

Guess you like

Origin www.cnblogs.com/xiaoer/p/11059069.html