[NLP] NER NER - CRF Detailed Method

 

Named entity tagging

For a given sequence X of length m, is assumed to result marked [y1, ..., ym], yi = PER / LOC / ORG / O, the named entities can be represented Dimensioning under conditions known sequence X, find that [y1, ..., ym] of the probability P (y1, ..., ym) as the highest sequence [Y1, ..., Ym].

The problem for modeling random linear chain conditions:

Binding model in front of the general form, the problem we define an energy function as follows:

b [y_1] y1 tag is a sequence of the first score; e [y_m] expressed sequence tag is the last one to Y_M score, indicating the current score for the word in the case where the label of y represents the score before and after the state transition label.

More than four also more clearly describes several factors to consider when making our mark: the location information of the current word and the tag information appears.

Labeling optimal solution sequence y satisfy the following conditions:

We can use the Viterbi algorithm (dynamic programming) to find the optimal sequence annotation.

 

CRF (Conditional Random Fields)

To understand the conditions of the airport, we need to explain a few concepts: probabilistic graphical models, Markov random field.

Probabilistic graphical models (Graphical Models): FIG is a collection of nodes connected to the node and consisting of edges, nodes and edges and are denoted by E, the set of nodes and edges, respectively denoted as V v and E, for the stamp G = (V, E), undirected graph refers to no side direction in FIG. Probabilistic graphical model is represented by a probability distribution map. Provided joint probability distribution P (Y), is a set of random variables. It is represented by an undirected graph G = (V, E) the probability distribution P (Y), i.e., in graph G, a node represents a random variable; probability between edges represent dependencies random variables.

Pairs of Markov (Pairwise Markov): Let u and v are not free any two nodes connected by edges to FIG. G, the nodes u and v, respectively, and to the random variable Y_u Y_v. It refers to the pairs of Markov random variables under the given set of random variables and Y_v Y_o are independent of Y_u conditions, namely:

4 pairs of Markov FIG.

Local Markov (Local Markov): undirected graph G is provided in any one of the nodes, W is and v have all the nodes connected by edges, O is v, all nodes other than W, v represents a random y_v variables are random variables W represents the group, represents a random variable is Y_w group O, topical refers to a Markov random variable given set of conditions Y_w random variables are independent random variables group, i.e., :

FIG local Markov

Global Markov (Local Markov): set the node set A, B is the graph G at the junction point set without any separate set of node C, the node set random variable groups A, B, and C corresponding to They are Y_A, y_B and Y_C. Refers to the global Markov random variable given the set of random variables Y_C conditions set Y_A, Y_B independent conditions, i.e.,

FIG 6 global Markov

Markov global, local Markov and three pairs of Markov property may prove to be equivalent.

MRF (Markov Random Field / MRF): with joint probability distribution P (Y), is represented by an undirected graph G = (V, E), in G, the node represents a random variable, the edges represent random dependencies between variables, if the joint probability distribution P (Y) meet in pairs, local or global Markov property, call this the joint probability distribution of Markov random fields.

Group (clique): Junction subset have no connection to the edge graph G groups referred to any two nodes. If C is a non-directed graph G of the group, and can not fit any more nodes of G to become a larger group, this C is called the maximum group.

The probability is introduced into the joint probability may be the probability that a non-directed graph of FIG important role in the distribution of the product split into groups each joint probability distribution:

Hammersley-Clifford Theorem: The probability of the joint probability distribution model of FIG P (Y) can be expressed as follows:

Wherein, C is a group of non-directed graph, Y_c is a random variable corresponding to the node C is strictly positive function defined on C (also referred to as potential function), is the product of all the groups to FIG without ( these groups just cover all the nodes in the entire undirected graph).

We usually potential function written in the form:

Where E (Y_c) became the group C of the energy function, by the thermodynamic properties of the Boltzmann distribution, "the greater the energy of the smaller state probability" of inspired.

Further, we model the energy function E group C (Y_c), that a series of functions F_k (Y_c) it is Y_c by respective random variable C from a linear combination of:

And f_k (Y_c) characteristic function is called.

Conditional Random Fields (Conditional Random Fields) is a set of input conditions given random variables, conditional probability of a further set of output models distributed random variables, which is characterized by assuming that the output variables including Markov Random Field.

Each expression can simply be replaced in the above Y Y | X, so we have:

CRFs can be used in different prediction problems, this article discusses only use it in the labeling issue. So it focuses conditional random linear chain of (Linear Chain CRFs), shown in Figure 8 corresponding to the probability:

FIG conditional random linear chain of 8

Wherein the white output nodes represent random variables Y, the input gray nodes represent random variable X. Conditional random linear chain, each output variable dependency exists only between two adjacent input and output variables with X variables. This time, we can generally CRF model simplifies to:

Sequence labeling problem

Here is the sequence labeling problem different kinds of named entities (person, place, organization name) that appear in sequence marked, for example:

John ( B-PER ) lives ( O ) in ( O ) New ( B-LOC ) York ( I-LOC ) and ( O ) works ( O ) for ( O ) the ( O ) European ( B-ORG ) Union ( I-ORG ) .

In brackets is marked: PER representation of names, LOC represents a place name, ORG represents the organization name, O for non-named entity, BX represents the first named entity word (B-PER represents the first word of names), IX represents a second later named entity word.

An easy way to solve the problem of named entity tagging is all of these entities are named a pre-existing list, and then put each sub-sequences that appear to match from the list. One of the biggest problem with this approach is for the named entity is not in the list can not be identified. We can imagine the process of human NER to do: In addition to use of a priori knowledge (New York place names), there will be no prior knowledge of the word to speculate depending on the context, such as the example above, if we do not know What is the european Union, it can be a proper name based on capitalized judge, and then combined with previous works for speculation that it could be an organization name.

 

Reprinted from: https://blog.csdn.net/u010159842/article/details/82222527

Guess you like

Origin blog.csdn.net/zkq_1986/article/details/90902368