中文短文本分类实例十二-HAN（Hierarchical Attention Networks for Document Classification）

一.概述

HAN（Hierarchical Attention Networks for Document Classification），层次化注意力机制等，是Zichao Yang等2016年提出的一种新型文本分类模型。它主要使用词(字)语级别，和句子级别的注意力机制Attention，构建文本特征表示，层次化文本(即字、词、句子)，十分符合人类直觉，也是近年来图像、NLP等深度学习迅猛发展的关键点。

2016年，那时候NLP自然语言处理领域是RNN+Attention的天下，我还记得，那时候，Bi-LSTM+CRF横空出世，自此，NLP任务似乎被双向LSTM统治了，至多是BiLSTM和Attention联合使用。但是，仅仅才过了一年，2017推出的Self.Attention，在NLP领域，瞬间将CNN、RNN打落凡尘，虽然那时候我还懵懵懂懂，固执于Bi-LSTM，对于时代的变革毫无察觉。5G就要来啦，但也不知道会发生什么巨变。正如过去的web1.0、web2.0，以及走到下半场的移动互联网时代，不知不觉，或许AI和深度学习也快步入中年，你我也成为了历史。

正如那句话所说，"未来已来，只是尚未流行!"，说得那么文艺，无非就是我们抓不住罢了，5G前夜，诸君共勉吧!

回到HAN模型，其实直观上理解起来并不困难。粗看起来，这不就是两层一样的（BiLSTM+Attention）的叠加么，也没什么特别的地方呀。今天再回过头来看，似乎也没有什么更加多的意思。Bi-GRU对文本作序列编码? word-(encode+attention)？sentence-(encode+attnetion)？没什么意思。

不过，以paper中对论文结果的分析来看，似乎证明了一点，Attention的特征效果贼强。当然，还有更高一级的特征组合。

github项目地址:

https://github.com/yongzhuo/Keras-TextClassification/tree/master/keras_textclassification/m12_HAN

二. HAN模型原理等

2.1 HAN模型图（HAN）

三.HAN实战

试验了github上的HAN，效果还不错，但是由于有两个BI-GRU，训练速度还是比较有点慢的。

开始实现的时候遇到了点小问题，就是word-attention和sentence-encode拼接的时候，出了个bug，后来解决了。

github项目地址:

https://github.com/yongzhuo/Keras-TextClassification/blob/master/keras_textclassification/m12_HAN/graph.py

核心代码:

    def create_model(self, hyper_parameters):
        """
            构建神经网络
        :param hyper_parameters:json,  hyper parameters of network
        :return: tensor, moedl
        """
        super().create_model(hyper_parameters)
        # char or word
        x_input_word = self.word_embedding.output
        x_word = self.word_level()(x_input_word)
        x_word_to_sen = Dropout(self.dropout)(x_word)

        # sentence or doc
        x_sen = self.sentence_level()(x_word_to_sen)
        x_sen = Dropout(self.dropout)(x_sen)

        x_sen = Flatten()(x_sen)
        # 最后就是softmax
        dense_layer = Dense(self.label, activation=self.activate_classify)(x_sen)
        output = [dense_layer]
        self.model = Model(self.word_embedding.input, output)
        self.model.summary(132)

    def word_level(self):
        x_input_word = Input(shape=(self.len_max, self.embed_size))
        # x = SpatialDropout1D(self.dropout_spatial)(x_input_word)
        x = Bidirectional(GRU(units=self.rnn_units,
                              return_sequences=True,
                              activation='relu',
                              kernel_regularizer=regularizers.l2(self.l2),
                              recurrent_regularizer=regularizers.l2(self.l2)))(x_input_word)
        out_sent = AttentionSelf(self.rnn_units*2)(x)
        model = Model(x_input_word, out_sent)
        return model

    def sentence_level(self):
        x_input_sen = Input(shape=(self.len_max, self.rnn_units*2))
        # x = SpatialDropout1D(self.dropout_spatial)(x_input_sen)
        output_doc = Bidirectional(GRU(units=self.rnn_units*2,
                              return_sequences=True,
                              activation='relu',
                              kernel_regularizer=regularizers.l2(self.l2),
                              recurrent_regularizer=regularizers.l2(self.l2)))(x_input_sen)
        output_doc_att = AttentionSelf(self.word_embedding.embed_size)(output_doc)
        model = Model(x_input_sen, output_doc_att)
        return model

希望对你有所帮助!

大漠帝国

发布了96 篇原创文章 · 获赞 72 · 访问量 12万+

私信关注

中文短文本分类实例十二-HAN（Hierarchical Attention Networks for Document Classification）

猜你喜欢