Study Notes CB006: Dependency syntax, LTP, n-gram model, N-shortest path segmentation method, word segmentation method by word, graph theory, probability theory

Dependency syntax analysis was proposed by French linguist L.Tesniere in 1959. Syntax, sentence rules, sentence composition rules. Dependency syntax, inter-component dependencies. Dependency, no A, B is buggy. Semantics, sentence meaning.

Dependency syntax emphasizes the division of prepositions and auxiliary words, while semantic dependency emphasizes the logical relationship between actual words. Dependency syntax varies with literal words, semantics depends on different literal words can have the same meaning, and sentences with different syntactic structures can have the same semantic relationship. Depending on the combination of syntactic analysis and semantic analysis, the computer understands the meaning of the sentence, matches the most appropriate answer, and realizes the chat answer through confidence matching.

Dependency syntax analysis, to determine the syntactic structure (phrase structure) or the lexical dependencies of sentences. Dependency parse tree, child nodes depend on parent nodes. In the dependency projection tree, the solid line represents the dependency relationship, the low position component depends on the high position component, and the dashed line is the projection line. Five Axioms of Dependency: 1. A sentence has only one independent component. 2. Other components are directly dependent on one component. 3. No ingredient can depend on two or more ingredients. 4. If the A component directly depends on the B component, and the C component is between the sentence components A and B, C either directly depends on B, or directly depends on a component between A and B. 5. The other components on the left and right sides of the central component have no relationship with each other.

LTP dependency flags. Subject-verb relationship SBV subject-verb, verb-object relationship VOB direct object, verb-object, indirect-object relationship IOB indirect object, indirect-object, pre-object FOB pre-object, fronting-object, concurrent DBL double, fixed-center relationship ATT attribute, middle structure ADV adverbial, dynamic complement structure CMP complement, parallel relationship COO coordinate, prepositional relationship POB preposition-object, left additional relationship LAD left adjunct, right additional relationship RAD right adjunct, independent structure IS independent structure, core relationship HED head.

Dependency calculation, machine learning and manual annotation, machine learning relies on manual annotation, part-of-speech, dependency tree bank, semantic roles, and machine learning analyzes the dependency syntax of new sentences.

LTP cloud platform. Registered users, free 20G data per month. http://www.ltp-cloud.com/Register an account, log in to http://www.ltp-cloud.com/dashboard/ , view api_key, traffic usage, documentation http://www.ltp-cloud.com/ document . curl -i " http://api.ltp-cloud.com/analysis/?api_key=ApiKey&text=I am Chinese.&pattern=dp&format=plain " . Word segmentation (pattern=ws), part-of-speech tagging (pattern=pos), named entity recognition (pattern=ner), semantic dependency analysis (pattern=sdp), semantic role tagging (pattern=srl).

Natural language, mathematical connections are language models. Mathematical model, using mathematical logic methods and mathematical language to construct scientific or engineering models. Explain the facts mathematically. Mathematical modeling, calculating results to explain practical problems, accepting actual testing, and establishing the entire process of mathematical models. Language model, based on the objective facts of language, abstract mathematical modeling of language. Explain natural language facts with mathematical models.

The industry recognizes effective language models, n-gram models, and Markov models. The appearance of the next word in a utterance is related to the last n words. n=1, the latest word is only related to itself, independent, and has nothing to do with the previous word, monadic grammar. n=2, the latest word is related to the word before it, binary grammar, first-order Markov chain. In engineering, n=3 is the most, the larger the n, the more constraint information, and the smaller the n, the higher the reliability. There are two major research directions in natural language processing: rule-based and statistics-based. The n-gram model is based on statistics. Maximum likelihood, most similar to history, estimates the probability using the frequency of historical occurrences.

The ever-changing natural language leads to 0 probability problems. The finite corpus is difficult to exhaustively enumerate language phenomena, and the probability of occurrence of a certain sentence in the n-gram model is 0. Data smoothing technology, which mathematically makes the probability of each sentence greater than 0. The problem of high probability of close-up words in specific fields. The cache has just appeared and the vocabulary increases the probability of occurrence later. Drawbacks of a single language model. Differences between different corpora lead to inaccurate single language model and mixed calculation of multiple language models. Or multiple language models are calculated separately, and finally the maximum entropy is selected. Neural network language model, special model smoothing method, more correct probability through deep learning.

Language model application, Chinese word segmentation, machine translation, spelling error correction, speech recognition, phonetic conversion, automatic summarization, question answering system, OCR.

In the last century, each sentence of Chinese automatic word segmentation had to be searched in the Chinese vocabulary, including forward maximum matching method, reverse maximum matching method, two-way scanning method, and auxiliary word traversal method. There are two most difficult problems in Chinese automatic word segmentation: 1) ambiguity elimination; 2) unregistered word recognition.

N-shortest path segmentation method, unigram grammar model, each word is unigram, exists independently, the probability of occurrence is obtained based on a large number of corpus statistics, a sentence is listed based on the vocabulary of various word segmentation results, there are many combinations of words, there are For multiple candidate results, the probability of occurrence of each word is multiplied to obtain the final result. Based on the n-gram grammar model segmentation method, the unigram model is extended to an n-gram model on the basis of the N-shortest path word segmentation method. The statistical probability is not a word probability, but is based on the conditional probability of the previous n words.

The word segmentation method by word formation. Words have word formation positions in words, at the beginning of words, in the middle of words, at the end of words, and in word formation alone. According to different word formation positions of words, design features, the previous word, the first two words, the length of the previous word, the beginning of the previous word, the ending of the previous word, and the ending of the previous word plus the current word to form a word. Based on a large number of corpora, the average perceptron classifier is used to score the features, and the weight coefficients are trained, and the model is used for word segmentation. There is an extra word on the right side of the sentence, and the model is used to calculate the weighted score of these features. The highest score is the correct word segmentation method.

The n-gram grammar model method has word segmentation in the vocabulary. Word formation method, identification of unregistered words.

Jieba Chinese word segmentation, based on the word graph scan of the prefix dictionary, generates a directed acyclic graph (DAG) of all possible word formations of Chinese characters in the sentence, dynamic programming finds the maximum probability path, and finds the maximum segmentation combination based on word frequency. For unregistered words, Adopt the HMM model based on the ability of Chinese characters to form words, and use the Viterbi algorithm. Combine word lists and word segmentation from words.

ik tokenizer, which is based on the shortest path of the vocabulary.

LTP cloud platform word segmentation, based on machine learning framework and partially combined with vocabulary method.

The judgment methods of other word segmentation tools are similar. Most of the online judgments on the quality of various word segmentation tools are based on functional comparisons. Personally, it is recommended to judge by principle. The word segmentation tool is the best

Graph Theory. Figure, connect isolated points with lines, any point may be connected. Different from trees, trees have a parent-child relationship, but graphs do not. Graphs express the relationship and transformation between things. The degree of expression association and the possibility of expression transformation.

Probability theory, the probability of flipping a coin is 1/2, the conditional probability P(B|A), the joint probability P(A,B), the Bayesian formula P(B|A)=P(A|B)P( B)/P(A).

Bayes, based on conditional probability P(B|A). Markov, a chain structure or process, the first n values ​​determine the current value, and the current value is related to the previous n values. Entropy, a thermodynamic term, refers to the chaotic state of a system of matter. Extend mathematics to express uncertainty. Extended information theory, basic theory of computer network information transmission, uncertainty function f(p)=-logp, information entropy H(p)=-∑plogp. Shannon, the originator of information theory. Field (field), domain, value space. Random field, random variable assigns the whole space.

Probabilistic graphical models, illustrated with graphs, calculated with probability. Directed graph model and undirected graph model, whether the edges in the graph have directions. There is a direction to express the deduction relationship, and B appears under the premise of A, which is a generative model. There is no direction to express a "that's right" relationship, A and B are right at the same time, discriminative model. The generative model uses joint probability calculation, and the discriminant model uses conditional probability calculation. Generative models, n-gram models, Hidden Markov models, Naive Bayes models. Discriminant models, maximum entropy models, support vector machines, conditional random fields, perceptron models.

Bayesian Networks, Conditional Probability, Generative Models, Directed Graph Models. The probability that x6 is True if x1 is False, P(x6=T|x1=F)=P(x6=T,x1=F)/P(x1=F). Continue the derivation, and finally obtain it by calculating the probability data of each node. The Bayesian network model estimates the probability of each node through sample learning to predict the results of various problems. Bayesian network learns reasoning under the condition of known limited, incomplete and uncertain information, and is widely used in fault diagnosis, maintenance decision-making, automatic Chinese word segmentation, word sense disambiguation and other problems.

Markov Models and Hidden Markov Models. A value is related to the previous n values, conditional probability, generative model, directed graph model. Markov model, about the state transition process at time t, stochastic finite state machine, the state sequence probability is obtained by calculating the probability product on the transition arc between all states forming the sequence. The training samples are obtained for each probability value, and the next probability is predicted based on the first two by training the model. Hidden Markov model, in which the information of a certain order is unknown, there is a lot of lack of information, and the model algorithm is more complicated. Hidden Markov models are widely used in part-of-speech tagging and Chinese word segmentation. At first, I didn’t know how to segment the words. Only when the front words were separated, I knew where the back boundary was. After the latter word segmentation, it was necessary to verify whether the previous word segmentation was correct, and there was a dependency before and after. The uncertain intermediate state situation is best explained by the Hidden Markov Model.

Maximum entropy model, H(p)=-∑plogp. Under a certain information condition B, the maximum probability of some possible result A is obtained, and the conditional probability P(A|B) is the maximum candidate result. The maximum entropy uncertainty is the largest, and the conditional probability is the largest. Finding the maximum conditional probability is equivalent to finding the maximum entropy. Entropy H(p)=H(A|B)=-∑p(b)p(a|b)log(p(a |b)). Using training data estimation, p(a|b) is estimated by training data features, such as feature fi(a,b), and the model trains the λ parameter process in ∑λf(a,b). Machine Learning Linear Regression. Therefore, the maximum entropy model uses the entropy principle and entropy formula to describe the reality with probability laws.

Conditional random field, the field represents the value range, the random field represents the value range of the random variable, each random variable has a fixed value, and the condition means that the value of the random variable is determined by a certain conditional probability, and the condition comes from the observed value. Conditional random field, undirected graph model, the probability of a particular label sequence Y given an observation sequence X is an exponential function exp(∑λt+∑μs), t is the transition function, and s is the state function. λ and μ need to be trained. Conditional random fields are used in labeling and segmenting ordered data, natural language processing, bioinformatics, machine vision, and network intelligence.

References:

"Python Natural Language Processing"

http://www.shareditor.com/blogshow?blogId=77

http://www.shareditor.com/blogshow?blogId=78

http://www.shareditor.com/blogshow?blogId=80

http://www.shareditor.com/blogshow?blogId=81

Welcome to recommend machine learning job opportunities in Shanghai, my WeChat: qingxingfengzi

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325020114&siteId=291194637