基于神经网络的依存句法分析总结及代码详解

上一篇文章CS224n之句法分析总结，介绍了句法分析以及具体的依存分析中的arc-standard算法。arc-standard系统是transition systems中最流行的一个系统之一。而本文将介绍一个基于神经网络的依存句法分析器，它基于arc-standard 系统，使用分类器根据从配置信息中提取的特征来预测正确的转换操作。尽管它的性能比基于搜索的解析器略差，但是它的计算效率非常高。本代码已经上传github

1、模型
模型如下图所示：

模型包括输入层，隐含层和softmax层。隐含层的激活函数比较特殊，使用cube激活函数也就是取3次方。arc-standard 算法介绍过，一个configuration包括Stack，Buffer，依存弧集合。如图最下方所示，这就是一个具体的configuration，包含了stack，buffer，依存弧集合的信息。模型根据configuration信息来提取出一个特征向量，这个特征向量由(words，POS tags，arc labels)三个向量拼接而成。这个向量因此包含了对应configuration的信息。模型的目标就是输入特征向量，然后预测出对应的转换类型，如LEFT-ARC，SHIFT等。预测出转换类型就进行相应的转换操作，这样就更新了配置信息，然后得到新的向量，再输入模型中预测，如此循环。最后就能得到依存弧集合找出句子中依存关系。

首先模型使用了 word embeddings，将one-hot编码转为词向量，不仅word，对应单词的词性和依存关系标签也被映射为向量，同样有自己的embeddings矩阵。那么如何根据配置信息提取出特征向量呢？根据论文所讲述的，系统会分别选择18、18、12个元素作为xw，xt，xlxw，xt，xl的值。
对于xwxw，栈和缓冲区的前3个单词：s1,s2,s3,b1,b2,b3；堆栈顶部两个单词的第一个和第二个最左/最右边的子项，也就是依赖于该单词的单词，如果没有就是NULL。lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2.；堆栈顶部两个单词的最左边或最左边节点的最左边或最左边节点（孩子的孩子）lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. 最后将这18个单词输入经过word embeddings映射为词向量。
对于xtxt 选择xwxw中18个词的对应词性作为输入值。
对于xlxl 选择除了堆栈/缓冲区上的6个字之外的单词的相应弧标签。这些元素同样根据自己的embeddings映射为向量。
这样将所有的向量拼接起来就是input layer的输入。

2、代码详解
（1）数据集操作
首先语料库是这样的：

第一列是单词在句子中的序号，第二列是单词，第五列是POS tags，第七列是所依赖的序号，第八列是依赖关系也是arc labels。

先用read_conll方法分别读取训练集，验证集，测试集数据，将语料库中的word，pos，head，label存储在examples变量中，代码如下：

def read_conll(in_file, lowercase=False, max_example=None):
examples = []
with open(in_file) as f:
word, pos, head, label = [], [], [], []
for line in f.readlines():
sp = line.strip().split('\t')
if len(sp) == 10:
if '-' not in sp[0]:
word.append(sp[1].lower() if lowercase else sp[1])
pos.append(sp[4])
head.append(int(sp[6]))
label.append(sp[7])
elif len(word) > 0:
examples.append({'word': word, 'pos': pos, 'head': head, 'label': label})
word, pos, head, label = [], [], [], []
if (max_example is not None) and (len(examples) == max_example):
break
if len(word) > 0:
examples.append({'word': word, 'pos': pos, 'head': head, 'label': label})
return examples

接下来把获得的信息向量化，使用one-hot编码，代码如下：

def vectorize(self, examples):
vec_examples = []
for ex in examples:
word = [self.ROOT] + [self.tok2id[w] if w in self.tok2id
else self.UNK for w in ex['word']]
pos = [self.P_ROOT] + [self.tok2id[P_PREFIX + w] if P_PREFIX + w in self.tok2id
else self.P_UNK for w in ex['pos']]
head = [-1] + ex['head']
label = [-1] + [self.tok2id[L_PREFIX + w] if L_PREFIX + w in self.tok2id
else -1 for w in ex['label']]
vec_examples.append({'word': word, 'pos': pos,
'head': head, 'label': label})
return vec_examples

接下来使用create_instances方法产生训练集，先初始化stack，buffer，arc，将三个参数放入get_oracle方法中self.get_oracle(stack, buf, ex) ex为examples其中一句话的部分。这个方法会根据输入的参数的信息根据ex中head返回一个转换操作，用数字代替，0为left-arc，1为right-arc，2为SHIFT。代码如下：

def get_oracle(self, stack, buf, ex):
if len(stack) < 2:
return self.n_trans - 1

i0 = stack[-1]
i1 = stack[-2]
h0 = ex['head'][i0]
h1 = ex['head'][i1]
l0 = ex['label'][i0]
l1 = ex['label'][i1]

if self.unlabeled:
if (i1 > 0) and (h1 == i0):
return 0
elif (i1 >= 0) and (h0 == i1) and \
(not any([x for x in buf if ex['head'][x] == i0])):
return 1
else:
return None if len(buf) == 0 else 2
else:
if (i1 > 0) and (h1 == i0):
return l1 if (l1 >= 0) and (l1 < self.n_deprel) else None
elif (i1 >= 0) and (h0 == i1) and \
(not any([x for x in buf if ex['head'][x] == i0])):
return l0 + self.n_deprel if (l0 >= 0) and (l0 < self.n_deprel) else None
else:
return None if len(buf) == 0 else self.n_trans - 1

create_instances会根据stack, buf, arcs, ex信息提取特征向量，函数为extract_features(stack, buf, arcs, ex)，代码如下：

def extract_features(self, stack, buf, arcs, ex):
if stack[0] == "ROOT":
stack[0] = 0

def get_lc(k):
return sorted([arc[1] for arc in arcs if arc[0] == k and arc[1] < k])

def get_rc(k):
return sorted([arc[1] for arc in arcs if arc[0] == k and arc[1] > k],
reverse=True)

p_features = []
l_features = []
features = [self.NULL] * (3 - len(stack)) + [ex['word'][x] for x in stack[-3:]]
features += [ex['word'][x] for x in buf[:3]] + [self.NULL] * (3 - len(buf))
if self.use_pos:
p_features = [self.P_NULL] * (3 - len(stack)) + [ex['pos'][x] for x in stack[-3:]]
p_features += [ex['pos'][x] for x in buf[:3]] + [self.P_NULL] * (3 - len(buf))

for i in xrange(2):
if i < len(stack):
k = stack[-i-1]
lc = get_lc(k)
rc = get_rc(k)
llc = get_lc(lc[0]) if len(lc) > 0 else []
rrc = get_rc(rc[0]) if len(rc) > 0 else []

features.append(ex['word'][lc[0]] if len(lc) > 0 else self.NULL)
features.append(ex['word'][rc[0]] if len(rc) > 0 else self.NULL)
features.append(ex['word'][lc[1]] if len(lc) > 1 else self.NULL)
features.append(ex['word'][rc[1]] if len(rc) > 1 else self.NULL)
features.append(ex['word'][llc[0]] if len(llc) > 0 else self.NULL)
features.append(ex['word'][rrc[0]] if len(rrc) > 0 else self.NULL)

if self.use_pos:
p_features.append(ex['pos'][lc[0]] if len(lc) > 0 else self.P_NULL)
p_features.append(ex['pos'][rc[0]] if len(rc) > 0 else self.P_NULL)
p_features.append(ex['pos'][lc[1]] if len(lc) > 1 else self.P_NULL)
p_features.append(ex['pos'][rc[1]] if len(rc) > 1 else self.P_NULL)
p_features.append(ex['pos'][llc[0]] if len(llc) > 0 else self.P_NULL)
p_features.append(ex['pos'][rrc[0]] if len(rrc) > 0 else self.P_NULL)

if self.use_dep:
l_features.append(ex['label'][lc[0]] if len(lc) > 0 else self.L_NULL)
l_features.append(ex['label'][rc[0]] if len(rc) > 0 else self.L_NULL)
l_features.append(ex['label'][lc[1]] if len(lc) > 1 else self.L_NULL)
l_features.append(ex['label'][rc[1]] if len(rc) > 1 else self.L_NULL)
l_features.append(ex['label'][llc[0]] if len(llc) > 0 else self.L_NULL)
l_features.append(ex['label'][rrc[0]] if len(rrc) > 0 else self.L_NULL)
else:
features += [self.NULL] * 6
if self.use_pos:
p_features += [self.P_NULL] * 6
if self.use_dep:
l_features += [self.L_NULL] * 6

features += p_features + l_features
assert len(features) == self.n_features
return features

然后将返回的向量以及对应get_oracle返回的转换保存在instances数组中，分别作为训练数据和标签。接下来执行刚才得到的转换操作，更新stack，buffer，arc。再循环执行上面的操作。如果一句话有n个单词，则执行循环2×n次，因为每个单词都要进栈一次，出栈一次。这样模型的数据集就完成了。 create_instances函数的代码如下：

def create_instances(self, examples):
all_instances = []
succ = 0
for id, ex in enumerate(logged_loop(examples)):
n_words = len(ex['word']) - 1

# arcs = {(h, t, label)}
stack = [0]
buf = [i + 1 for i in xrange(n_words)]
arcs = []
instances = []
for i in xrange(n_words * 2):
gold_t = self.get_oracle(stack, buf, ex)
if gold_t is None:
break
legal_labels = self.legal_labels(stack, buf)
assert legal_labels[gold_t] == 1
instances.append((self.extract_features(stack, buf, arcs, ex),
legal_labels, gold_t))
if gold_t == self.n_trans - 1:
stack.append(buf[0])
buf = buf[1:]
elif gold_t < self.n_deprel:
arcs.append((stack[-1], stack[-2], gold_t))
stack = stack[:-2] + [stack[-1]]
else:
arcs.append((stack[-2], stack[-1], gold_t - self.n_deprel))
stack = stack[:-1]
else:
succ += 1
all_instances += instances

return all_instances

（2）模型训练
其实模型非常简单，就是普通的全连接神经网络加sotfmax分类。框架用的tensorflow。过程依次是：1. 添加占位符 2. 创建feed_dict 3. 创建embedding层，4、add_prediction_op 网络的前向传播操作 5、add_loss_op 定义loss 使用softmax_cross_entropy_with_logits损失。6、定义优化器 add_training_op
代码都在q2_parser_model.py 文件中。大家可以去看看。

（3）解析
系统在解析中执行贪心解码。在每一步，我们从当前配置c中提取所有相应的单词，POS和标签嵌入，计算隐藏层h(c)，并选择具有最高分数的转换，然后再执行转换。

总结
本篇文章跟上一篇文章介绍了NLP中的技术句法分析，以及讲解了基于神经网络的依存分析算法。这两篇文章只是笔者自己的总结概括，如果大家想深入了解句法分析可以去CS224n公开课上学习，对于基于神经网络的依存分析算法可以查看这篇论文：《A Fast and Accurate Dependency Parser using Neural Networks》，我的github上就有.

基于神经网络的依存句法分析总结及代码详解

猜你喜欢