Parsing Detailed

First, understand the syntax analysis, let's ponder a question: How do people understand a sentence?

1, from a linguistic point of view (SVO)

2, (language model) from the perspective of feeling

So syntactic analysis to understand is simply to analyze the relationship between each word from a linguistic point of view, the word in a sentence is subject, predicate that word, and so on with a tree to represent.
Suppose now saying:

Beijing belongs to China

After analyzing the syntax is so like the following figure:
Here Insert Picture Description
Let no matter how constitute such a syntax tree, we look at this tree features:

Leaf node is the word, the other node is the corresponding part of speech, then build the syntax tree, what use is it?

Used in engineering features, we use such a syntax tree for each data when we can consider:
1, the shortest path

2, symbol between the two words

Constitute the syntax tree

So what constitutes a syntax tree is it? First we have a syntax Grammers follows:
Here Insert Picture Description
As we can see on the right in FIG syntax, for example: a sentence (s) can be divided into: noun phrase (NP) and a verb phrase (VP), then we according to the syntax and the corresponding such sentences can build a syntax tree up.

So syntax is how did this happen? In fact, the syntax is written based on years of experience of experts linguistics. You know how you can use it

In fact, we can use the syntax tree Translation

Here Insert Picture Description
When the above chart we now convert the Chinese translation into English to a Chinese syntax tree, then converted according to Chinese syntax tree into English syntax tree, and finally into English according to the English text of the syntax tree conversion.

In these areas we need to know:

Chinese grammar, English grammar and Chinese to English grammar.

This is more consumption of labor costs, it is now generally used seq2seq model to do.

Here Insert Picture Description

我们一般成这种语法为CFG，但是在使用的时候我们一般都是用PCFG，也就是在每个语法后面标注使用这个语法的概率：
Here Insert Picture Description
为什么要使用PCFG呢？是因为我们根据语法一个句子可以构成很多个树，我们需要找到那个最优的数因此使用PCFG计算每棵树的score，用来选择最好的那棵树。

那么我们计算每个语法的概率是用了一批训练数据：
Here Insert Picture Description
$NP\rightarrow V,NP\quad\dfrac{NP(V,NP)}{NP_总}$

那么我们怎么才能找到最优的句法树呢？

1、枚举所有的句法树。一一计算找到score最大的那棵树

缺点：计算量太大，指数级增长

2、使用CKY算法

CKY算法指的是将一个大的问题转换成子问题来进行操作（DP）算法的思想

也就是将计算 $(w_1,w_2,w_3...w_n)$ 转换成计算 $((w_1),(w_2,w_3...w_n))$ 每次比较当前语法与剩余语法的语法大小。动态规划的思想

现在我们知道cky算法的核心是大的问题转换成子问题进行两两比较，这个时候我们也要对我们的语法进行一些转换，保证它能够符合CKY算法

Here Insert Picture Description
我们的CKY算法主要是两两之间进行比较，也就是一个语法最多有两个分支，像 $VP\rightarrow NP,V,PP$ 这样的我们尝试把它转换成两个分支的形式，说到这里我提一点CNF的形式，

CNF

CNF是指的只允许右面有两个，其他条件是不允许的，如果出现两个以外的形式我们必须将其转换，如上图语法，还出现了 $NP\rightarrow e$ 和 $VP\rightarrow N$ 的形式。下面我们来转换一下：

第一步去掉 $N P\rightarrow e$ ：
Here Insert Picture Description
如上图我们去掉 $NP\rightarrow e$ 相当于把 NP $为空代入规则中，这样会多出几条规则（去掉NP时的规则）如上图同时又可以将NP\rightarrow e$ 去掉

下一步我们去掉 $VP\rightarrow N$ 这种右边为1个的形式：
Here Insert Picture Description
如上图我们去掉 $S\rightarrow VP$ 设置几个新的规则就是把VP指向的规则换成S，然后就可以去掉 $S\rightarrow VP$

还有一种比较特殊的情况： $VP\rightarrow V$ is pointing to a word when
Here Insert Picture Description
the same token, we removed the above chart $VP\rightarrow V$ set several new rules is to point to the rules of VP into V, then can be removed $VP\rightarrow V$
so sequentially delete all of the right is only one rule

Finally delete $VP\rightarrow NP,V,PP$ this form:
Here Insert Picture Description
In such cases we merge a rule such as $VP\rightarrow NP,V,PP\quad$ transform into: $VP\rightarrow NP,@VP_p\quad$ with $\quad @VP_p\rightarrow V,PP$
Of course, CKY algorithm is not so strict, as long as it is at most only two branches

CKY algorithm

Here Insert Picture Description We are calculated between the selected best one twenty-two each calculation, and the continuous product according to the rules of syntax tree

Ze less handsome

Published 18 original articles · won praise 8 · views 970

Private letter concerns

Constitute the syntax tree

CNF

CKY algorithm

Guess you like