Sequence labeling summary

Sequence labeling can generally be divided into two categories:

1, the original label (Raw labeling): Each element needs to be marked as a label.

2, denoted by the joint (Joint segmentation and labeling): all of the segments are denoted as the same label.

      NER (Named entity recognition, NER) is a sub-task information extraction problem, we need to locate and classify elements, such as name, organization name, location, time, quality and so on.

      For example NER and joint label. A sentence is: Yesterday, George Bush gave a speech which includes a named entity:. George Bush. We hope that the label of "names" to mark the entire phrase "George Bush", rather than the two words are labeled. This is a joint label.

 

BIO label

      The easiest way to mark a joint solution to the problem is to convert the original labeling problem. Standard practice is to use the BIO label.

      BIO label: each element labeled "BX", "IX" or "O". Wherein, "BX" represents a fragment of this element is located belongs to the type X and the element at the beginning of this segment, "IX" denotes a fragment of this element is located belongs to the type X and the element in an intermediate position of this segment, "O" indicates no of any type.

      For example, we will be represented as X NP (Noun Phrase, NP), the three markers BIO is:

(1) B-NP: noun phrase at the beginning of

Intermediate NPs: (2) I-NP

(3) O: not a noun phrase

    Thus a passage may be divided into the following result;

 

 

 

      We can further be applied to the NER in BIO, to define all the named entities (person's name, organization name, location, time, etc.), then we will have a lot of class B and I, such as the B-PERS, I-PERS, B- ORG, I-ORG like. Then you can get the following results:

 

 

Guess you like

Origin www.cnblogs.com/shona/p/12121473.html