Sequence labeling can generally be divided into two categories:
1, the original label (Raw labeling): Each element needs to be marked as a label.
2, denoted by the joint (Joint segmentation and labeling): all of the segments are denoted as the same label.
NER (Named entity recognition, NER) is a sub-task information extraction problem, we need to locate and classify elements, such as name, organization name, location, time, quality and so on.
For example NER and joint label. A sentence is: Yesterday, George Bush gave a speech which includes a named entity:. George Bush. We hope that the label of "names" to mark the entire phrase "George Bush", rather than the two words are labeled. This is a joint label.
BIO label
The easiest way to mark a joint solution to the problem is to convert the original labeling problem. Standard practice is to use the BIO label.
BIO label: each element labeled "BX", "IX" or "O". Wherein, "BX" represents a fragment of this element is located belongs to the type X and the element at the beginning of this segment, "IX" denotes a fragment of this element is located belongs to the type X and the element in an intermediate position of this segment, "O" indicates no of any type.
For example, we will be represented as X NP (Noun Phrase, NP), the three markers BIO is:
(1) B-NP: noun phrase at the beginning of
Intermediate NPs: (2) I-NP
(3) O: not a noun phrase
Thus a passage may be divided into the following result;
We can further be applied to the NER in BIO, to define all the named entities (person's name, organization name, location, time, etc.), then we will have a lot of class B and I, such as the B-PERS, I-PERS, B- ORG, I-ORG like. Then you can get the following results: