Speech and Language Processing之Part-of-Speech Tagging

​Tagging
is a disambiguation task, where words are ambiguous and have more than one possible part of speech, and the goal is to find the correct label for the situation. For example, book can be a verb (book that flight) or a noun (hand me that book). This can be a determiner (Does that flight serve dinner) or a complement (I thought that your flight was earlier). The goal of pos tagging is to resolve these ambiguities, choosing an appropriate tag for the context. How common is label ambiguity?

1. HMM Algorithm
In this section, we will introduce the use of hidden Markov models for part-of-speech tagging. HMM is a sequence model. A sequence model or sequence classifier is a model whose job is to assign a label or class to each unit in the sequence, thereby mapping a sequence of observations to a sequence of labels. HMM is a probabilistic sequence model: given a sequence of units (words, letters, morphemes, sentences, etc.), it computes the probability distribution of possible label sequences and chooses the best one.

  HMM是基于马尔可夫链的增广。马尔可夫链是一个模型,它告诉我们随机变量,状态序列的概率,每一个都可以取某个集合的值。这些集合可以是单词、标签或表示任何东西的符号,例如天气。马尔可夫链做了一个非常强的假设,如果我们想要预测序列的未来,所有重要的是当前状态。所有在当前状态之前的状态,除了通过当前状态之外,对未来没有任何影响。这就好像为了预测明天的天气,你可以查看今天的天气,但你不能查看昨天的天气。

More formally, consider a series of state variables q1,q2,...,qi. Markov models embody this Markovian assumption of sequential probability: when predicting the future, the past does not matter, only the present.

   当我们需要计算一系列可观察事件的概率时,马尔可夫链是有用的。然而,在许多情况下,我们感兴趣的事件是隐藏的:我们没有直接观察到它们。例如,我们通常不会在文本中观察到词性标签。相反,我们看到单词,并且必须从单词序列中推断出标签。我们称这些标签为隐藏的,因为它们不被观察到。

Hidden Markov Model (HMM) is a commonly used sequence modeling tool, which is often used in natural language processing, speech recognition, bioinformatics and other fields. The HMM model is a statistical model used to describe a Markov process with implicit unknown parameters.

An HMM model usually consists of three parts: state sequence, observation sequence and model parameters.

  • State sequence S = { s 1 , s 2 , . . . , s T } S = \{s_1, s_2, ..., s_T\}S={ s1,s2,...,sT} indicates a hidden state.

  • View order O = { o 1 , o 2 , . . . , o T } O = \{o_1, o_2, ..., o_T\}O={ o1,o2,...,oT} denote observations generated from a sequence of states.

  • model reference number λ = ( π , A , B ) \lambda = (\pi, A, B)l=( p ,A,B ) includes:

    • Initial state probability vector π = ( π 1 , π 2 , . . . , π N ) \pi = (\pi_1, \pi_2, ..., \pi_N)Pi=( p1,Pi2,...,PiN) , whereπ i \pi_iPiiIndicates that it is initially in state iiprobability of i .
    • State transition matrix A = [ aij ] N × NA = [a_{ij}]_{N \times N}A=[aij]N×N, among which aij a_{ij}aijIndicates from state iii transitions to statejjThe probability of j .
    • Emission matrix B = [ bj ( ot ) ] N × MB = [b_j(o_t)]_{N \times M}B=[bj(ot)]N×M, in which bj ( ot ) b_j(o_t)bj(ot) means that in statejjThe observation value ot o_tis generated under jotThe probability.

In the HMM model, there are two basic problems to be solved:

  1. Given a model and an observation sequence OOO , how to calculate the observation sequenceOOThe probability P corresponding toO ( O ∣ λ ) P(O|\lambda)P(Oλ)
  2. Given a model and an observation sequence OOO , how to find the most probable state sequenceSSS

Forward-Backward Algorithm for HMM Models

The forward-backward algorithm can be used to solve the first problem: Given a model and an observation sequence OOO , how to calculate the observation sequenceOOThe probability P corresponding toO ( O ∣ λ ) P(O|\lambda)P(Oλ)

α t ( i ) \alpha_t(i) at( i ) represents timettt state isiii , the observation sequence iso 1 , o 2 , . . . , ot o_1, o_2, ..., o_{t}o1,o2,...,otThe sum of the probabilities of all possible paths of , namely:

α t ( i ) = P ( o 1 , o 2 , . . . , o t , q t = s i ∣ λ ) \alpha_t(i) = P(o_1, o_2, ..., o_t, q_t=s_i|\lambda) at(i)=P(o1,o2,...,ot,qt=siλ)

Using the dynamic programming method, it is possible to recursively calculate α t ( i ) \alpha_t(i)at(i)

α t ( i ) = ∑ j = 1 N α t − 1 ( j ) a j i b i ( o t ) \alpha_t(i) = \sum_{j=1}^N \alpha_{t-1}(j) a_{ji} b_i(o_t) at(i)=j=1Nat1( j ) hasjibi(ot)

where α 1 ( i ) = π ibi ( o 1 ) \alpha_1(i) = \pi_i b_i(o_1)a1(i)=Piibi(o1)

The probability of observing the sequence P ( O ∣ λ ) P(O|\lambda)P ( O λ ) isα T ( i ) \alpha_T(i)aT( i ) all statesiiThe sum of i :

P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O|\lambda) = \sum_{i=1}^N \alpha_T(i) P(Oλ)=i=1NaT(i)

The backward algorithm can be used to solve the second problem: given the model and observation sequence OOO , how to find the most probable state sequenceSSS

β t ( i ) \beta_t(i) bt( i ) represents timettt state isiii , the observation sequence isot + 1 , ot + 2 , . . . , o T o_{t+1}, o_{t+2}, ..., o_Tot+1,ot+2,...,oTThe sum of the probabilities of all possible paths of , namely:

β t ( i ) = P ( o t + 1 , o t + 2 , . . . , o T ∣ q t = s i , λ ) \beta_t(i) = P(o_{t+1}, o_{t+2}, ..., o_T|q_t=s_i, \lambda) bt(i)=P(ot+1,ot+2,...,oTqt=si,l )

Also using the dynamic programming method, β t ( i ) \beta_t(i) can be calculated recursivelybt(i)

β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) \beta_t(i) = \sum_{j=1}^N a_{ij} b_j(o_{t+1}) \beta_{t+1}(j) bt(i)=j=1Naijbj(ot+1) bt+1(j)

Among them, β T ( i ) = 1 \beta_T(i) = 1bT(i)=1

Given an observation sequence OOO , you can use the forward-backward algorithm to calculate each timettt state isiiThe probability of i P ( qt = si ∣ O , λ ) P(q_t=s_i|O,\lambda)P(qt=siO,λ ) , that is, in the observation sequenceOOO next, momentttt is in statesi s_isiThe probability. The probability can be calculated by the results of the forward algorithm and the backward algorithm:

P ( q t = s i ∣ O , λ ) = α t ( i ) β t ( i ) P ( O ∣ λ ) P(q_t=s_i|O,\lambda) = \frac{\alpha_t(i)\beta_t(i)}{P(O|\lambda)} P(qt=siO,l )=P(Oλ)at( i ) bt(i)

Next, the most probable state sequence SS can be solved using the Viterbi algorithmS

Viterbi algorithm for HMM models

The Viterbi algorithm is a dynamic programming algorithm for solving the most probable state sequence SSS. _ Specifically, letδ t ( i ) \delta_t(i)dt( i ) represents timettt state isiii , the observation sequence iso 1 , o 2 , . . . , ot o_1, o_2, ..., o_to1,o2,...,otThe probability of the path with the highest probability among all possible paths of , namely:

δ t ( i ) = max ⁡ s 1 , s 2 , . . . , s t − 1 P ( s 1 , s 2 , . . . , s t − 1 , q t = s i , o 1 , o 2 , . . . , o t ∣ λ ) \delta_t(i) = \max_{s_1,s_2,...,s_{t-1}} P(s_1,s_2,...,s_{t-1},q_t=s_i,o_1,o_2,...,o_t|\lambda) dt(i)=s1,s2,...,st1maxP(s1,s2,...,st1,qt=si,o1,o2,...,otλ)

Using the dynamic programming method, it is possible to recursively calculate δ t ( i ) \delta_t(i)dt(i)

δ t ( i ) = max ⁡ 1 ≤ j ≤ N { δ t − 1 ( j ) a j i } b i ( o t ) \delta_t(i) = \max_{1 \leq j \leq N} \{\delta_{t-1}(j) a_{ji}\} b_i(o_t) dt(i)=1jNmax{ dt1( j ) hasji}bi(ot)

where δ 1 ( i ) = π ibi ( o 1 ) \delta_1(i) = \pi_i b_i(o_1)d1(i)=Piibi(o1)

In order to find the most probable state sequence SSS , need to maintain a ψ t ( i ) \psi_t(i)in the recursion processpt( i ) array, representing timettt state isiiThe value of the previous state on the path with the greatest probability of i . specifically:

ψ t ( i ) = arg ⁡ max ⁡ 1 ≤ j ≤ N { δ t − 1 ( j ) a j i } \psi_t(i) = \arg\max_{1 \leq j \leq N} \{\delta_{t-1}(j) a_{ji}\} pt(i)=arg1jNmax{ dt1( j ) hasji}

δ t ( i ) \delta_t(i) calculated according to recursiondt( i )ψ t ( i ) \psi_t(i)pt( i ) array, the most likely state sequenceSSS

  1. at time TTT , choose such thatδ T ( i ) \delta_T(i)dT( i ) Maximum stateiii as the final state.
  2. For t = T − 1 , T − 2 , . . . , 1 t=T-1, T-2, ..., 1t=T1,T2,...,1 , select in turn to makeδ t ( i ) × aij \delta_t(i)\times a_{ij}dt(i)×aijmax state jjj as timettstatest s_t of tst
  3. The final state sequence S = { s 1 , s 2 , . . . , s T } S = \{s_1, s_2, ..., s_T\}S={ s1,s2,...,sT} is the most likely state sequence.

In conclusion, the forward-backward algorithm and Viterbi algorithm of HMM model solve two basic problems of HMM model respectively. The former is used to calculate the probability of the observation sequence, and the latter is used to solve the most likely state sequence.

Guess you like

Origin blog.csdn.net/qq_23953717/article/details/130684303