Viterbi (Viterbi) algorithm in CRF (Conditional Random Fields) is how it works?

Before we introduced BERT + CRF to be named entity recognition, and one of the BERT and the concept and role of CRF to do related to the introduction, and then to calculate the optimal sequence of CRF in principle the label, we only mention the Viterbi algorithm, and no further explanation, this article will Viterbi algorithm to make a popular explanation, so that we can better understand why CRF optimal tag sequences.

By reading this article you will be able to answer the following questions:

  • What is the Viterbi algorithm?
  • Why Viterbi algorithm is a dynamic programming algorithm?
  • Viterbi algorithm specifically how to achieve?

First, let's briefly review the BERT and the respective roles of CRF in named entity recognition:
named entity recognition, the law BERT responsible for learning input sentence each word and symbol to the corresponding entity tag, and CRF adjacent entity responsible for learning transfer rules between labels. For more information please refer to this article CRF in NER is how it works? . The article we did on the CRF a straightforward presentation, which refers to the loss of function of CRF use to calculate the optimal path, because CRF is the probability of loss of function of seeking the best path and the path likelihood cent of all proportion, and our goal is to maximize the ratio. This problem is related to the calculation of the optimal path. In the example here, the path named entity recognition, the final output is a sequence tag words or symbols in one-sentence. Tag sequence order of the different compositions of different paths. The CRF is to find the most correct sequence path piece label, that label path probability this will be the largest of all paths, then we can exhaustive of all possible label path, calculated each path probability and then compare the biggest piece, but the cost of doing so is too great, so choose a crf called Viterbi algorithm to solving such problems.

Viterbi algorithm (English: Viterbi algorithm) is a dynamic programming algorithm. It is used to find the most likely sequence of events observed Viterbi path.

Consider the case of the following named entity recognition:

FIG 5 the total layers (the length of the observation sequence), each node 3 (the number of states), our goal is to find the optimal path from the first layer to the fifth layer.
First, we calculate the red, the probability of yellow, blue three input node of the connection, red node, for example, we assume that the red nodes on the optimal path, then the input to connect the three nodes, the most probable a certain piece on the optimal path, the same way, we'll assume that the yellow and blue respectively nodes on the optimal path, we can each find the greatest probability of a connection, so you get the following picture:

then we then just keep looking behind the idea of a layer of the three optimal connection:

optimal connections assumptions found as follows:

then, then applied this method in the back layer:


at this point, look at the above chart last one, we have 3 candidate optimal path, are brown, green and purple, with a label to be expressed as follows:

so which is the best path it?
Behold probability is the largest and paths, that path is the optimal path.
However, in the actual implementation, usually in the calculation of the best candidate wiring layers, the following is recorded prior probability and connection, and recording the corresponding index node status (where the recording has been calculated result when down manner for subsequent use, the Viterbi algorithm is referred to as the dynamic programming algorithm reasons), so that the last layer, the last layer connecting the respective candidate greatest probability is the piece on the optimal path connection, and then back from this connection, full path is to find the optimal path.

We have been saying that the greatest probability path, then the probability of particular refers to what?
Remember the article describes the conditions over time airport (CRF) mentioned, CRFs actually given the observed sequence of Markov random field in the first order Markov model defines the following three concepts :

  • State set Q, corresponds to the above example is:
    {on BP, the IP, O}
  • 初始状态概率向量Π,对应到上面的例子就是:
    {B-P:0.3, I-P:0.2, O:0.5}
    这里的概率数值是随便假设的,仅为了方便举例说明。
  • 状态转移概率矩阵A:

CRF中给定了观测序列做为先验条件,对应到上面的例子就是:

其中的概率数值同样是随便假设的,为了方便举例。

下图中红色节点的概率(可以看成是一个虚拟的开始节点到该节点的连线的概率)的计算方式如下:
初始状态为B-P的概率Π(B-P) * 该节点的观测概率P(小|B-P)

下图中红色节点的三条连线概率的计算方式如下:
上一层对应节点的概率 * 上层对应节点到该节点的转移概率 * 该节点的观测概率P(明|B-P)

其它层之间的节点连线的概率同理计算可得,然后通过上面介绍的维特比算法过程就可以计算出最优路径了。

ok,本篇就这么多内容啦~,感谢阅读O(∩_∩)O。

Guess you like

Origin www.cnblogs.com/anai/p/11938089.html