LSTM intelligent use poetry to send New Year wishes

LSTM Introduction

I.e., the sequence data associated with the presence of each sample and the sample before it, and the previous data has a data sequence relationship. Deep learning there is an important branch is designed to handle such data - Recurrent Neural Networks. Recurrent neural network is widely used in natural language processing (NLP), starting from today we bring you a practical example, we introduced a recurrent neural network algorithm important improvement model -LSTM. This article does not LSTM principles of depth, would like to learn more about LSTM can refer to this [translation] understand LSTM network . This article focuses on examples of ancient poetry automatic generation of view, take you step by step process from data to build the model, and then to train a generation model of ancient poetry, the last New Year wishes to achieve automatic generation of ancient poetry poetry.

data processing

We use 76 748 ancient poems as a data set, data set download link , stored in the form of the original ancient poetry is as follows:
image
we can see the original ancient poetry is in the form of text symbols, can not be directly machine learning, so the first step we need the text information is converted into data form, this conversion method called embedded word (word embedding), we used a common nested words (word embedding) algorithm -Word2vec of ancient poetry is encoded. About Word2Vec not explain in detail here, are interested can refer to [NLP] seconds to understand the essence of the word vector Word2vec . In the words of nested procedures in order to avoid the final classification number is too large, you can choose to remove the word appears less frequently, for example, you can remove the word appears only once. Word2vec algorithm trained to produce a model file, we can use this model file to ancient poetry text word nested code.

The first step has been treated to ancient poetry words can be converted to digital form machine learning model, because we use LSTM algorithm to generate the ancient poetry, so it needs to build input to output of the mapping process. For example:
"[river down the yen]" As train_data, while the corresponding train_label is the "river of yen]]", that is,
"[" -> "long", "long" -> "The River", "The River" -> "fall", "fall" -> "day", "day" -> "circle", "circle" -> "]", "]" -> "]", like this sequence of eleven pairs of phase. It is also circulating a neural network of important features.
Here, "[" and "]" is the start symbol and end symbol, for generating a start and end tag poetry.

Summarize the data processing steps of:

  • The original text read ancient poetry, the statistics of all the different words, using the corresponding coding algorithm Word2Vec;
  • For each poem, each word, the dictionary punctuation are converted to the corresponding number constituting the neural network input data train_data;
  • Input data output constituting the label left movement train_label;

After data processing we get the following data files:

  • poems_edge_split.txt: ancient poetry of the original file, arranged in rows, each conduct a poetry;
  • vectors_poem.bin: Word2Vec trained using the vector model word, beginning with, arranged according to word frequency, word remove low frequency;
  • poem_ids.txt: input-output relation according to the mapping document corpus after treatment;
  • rhyme_words.txt: rhyming word is stored for generating rhyming poetry;

In the source code provided above have been provided four data files in the data folder, a data file processing code data_loader.py See, source link

Model Construction and Training

Here we use LSTM frame 2 layers each hidden layer nodes 128, we use to define the module library tensorflow.nn network structure layer, which is a basic unit tensorflow RNNcell implemented RNN is an abstract class, in practical applications in multi RNNcell implementation subclasses BasicRNNCell or BasicLSTMCell, BasicGRUCell; if you need to build a multilayer RNN, in TensorFlow may be used tf.nn.rnn_cell.MultiRNNCell RNNCell stacking function. Model of the network of the first layer embedding that data to be input, as will be appreciated dimensional data conversion after two LSTM, followed softMax output probability obtained on a full dictionary.
Network model is structured as follows:
image

Network program code class definition is as follows:


class CharRNNLM(object):
    def __init__(self, is_training, batch_size, vocab_size, w2v_model,
                 hidden_size, max_grad_norm, embedding_size, num_layers,
                 learning_rate, cell_type, dropout=0.0, input_dropout=0.0, infer=False):
        self.batch_size = batch_size
        self.hidden_size = hidden_size
        self.vocab_size = vocab_size
        self.max_grad_norm = max_grad_norm
        self.num_layers = num_layers
        self.embedding_size = embedding_size
        self.cell_type = cell_type
        self.dropout = dropout
        self.input_dropout = input_dropout
        self.w2v_model = w2v_model

        if embedding_size <= 0:
            self.input_size = vocab_size
            self.input_dropout = 0.0
        else:
            self.input_size = embedding_size

        # 输入和输入定义
        self.input_data = tf.placeholder(tf.int64, [self.batch_size, self.num_unrollings], name='inputs')
        self.targets = tf.placeholder(tf.int64, [self.batch_size, self.num_unrollings], name='targets')

        # 根据定义选择不同的循环神经网络内核单元
        if self.cell_type == 'rnn':
            cell_fn = tf.nn.rnn_cell.BasicRNNCell
        elif self.cell_type == 'lstm':
            cell_fn = tf.nn.rnn_cell.LSTMCell
        elif self.cell_type == 'gru':
            cell_fn = tf.nn.rnn_cell.GRUCell

        params = dict()
        if self.cell_type == 'lstm':
            params['forget_bias'] = 1.0
        cell = cell_fn(self.hidden_size, **params)

        cells = [cell]
        for i in range(self.num_layers-1):
            higher_layer_cell = cell_fn(self.hidden_size, **params)
            cells.append(higher_layer_cell)

        # 训练时是否进行 Dropout
        if is_training and self.dropout > 0:
            cells = [tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1.0-self.dropout) for cell in cells]

        # 对lstm层进行堆叠
        multi_cell = tf.nn.rnn_cell.MultiRNNCell(cells)

        # 定义网络模型初始状态
        with tf.name_scope('initial_state'):
            self.zero_state = multi_cell.zero_state(self.batch_size, tf.float32)
            if self.cell_type == 'rnn' or self.cell_type == 'gru':
                self.initial_state = tuple(
                        [tf.placeholder(tf.float32,
                            [self.batch_size, multi_cell.state_size[idx]],
                            'initial_state_'+str(idx+1)) for idx in range(self.num_layers)])
            elif self.cell_type == 'lstm':
                self.initial_state = tuple(
                        [tf.nn.rnn_cell.LSTMStateTuple(
                            tf.placeholder(tf.float32, [self.batch_size, multi_cell.state_size[idx][0]],
                                          'initial_lstm_state_'+str(idx+1)),
                            tf.placeholder(tf.float32, [self.batch_size, multi_cell.state_size[idx][1]],
                                           'initial_lstm_state_'+str(idx+1)))
                            for idx in range(self.num_layers)])

        # 定义 embedding 层
        with tf.name_scope('embedding_layer'):
            if embedding_size > 0:
                # self.embedding = tf.get_variable('embedding', [self.vocab_size, self.embedding_size])
                self.embedding = tf.get_variable("word_embeddings",
                    initializer=self.w2v_model.vectors.astype(np.float32))
            else:
                self.embedding = tf.constant(np.eye(self.vocab_size), dtype=tf.float32)

            inputs = tf.nn.embedding_lookup(self.embedding, self.input_data)
            if is_training and self.input_dropout > 0:
                inputs = tf.nn.dropout(inputs, 1-self.input_dropout)

        # 创建每个切分通道网络层
        with tf.name_scope('slice_inputs'):
            sliced_inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(
                axis = 1, num_or_size_splits = self.num_unrollings, value = inputs)]

        outputs, final_state = tf.nn.static_rnn(
                cell = multi_cell,
                inputs = sliced_inputs,
                initial_state=self.initial_state)
        self.final_state = final_state

        # 数据变换层,把经过循环神经网络的数据拉伸降维
        with tf.name_scope('flatten_outputs'):
            flat_outputs = tf.reshape(tf.concat(axis = 1, values = outputs), [-1, hidden_size])

        with tf.name_scope('flatten_targets'):
            flat_targets = tf.reshape(tf.concat(axis = 1, values = self.targets), [-1])

        # 定义 softmax 输出层
        with tf.variable_scope('softmax') as sm_vs:
            softmax_w = tf.get_variable('softmax_w', [hidden_size, vocab_size])
            softmax_b = tf.get_variable('softmax_b', [vocab_size])
            self.logits = tf.matmul(flat_outputs, softmax_w) + softmax_b
            self.probs = tf.nn.softmax(self.logits)

        # 定义 loss 损失函数
        with tf.name_scope('loss'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
                    logits = self.logits, labels = flat_targets)
            self.mean_loss = tf.reduce_mean(loss)

        # tensorBoard 损失函数可视化
        with tf.name_scope('loss_montor'):
            count = tf.Variable(1.0, name='count')
            sum_mean_loss = tf.Variable(1.0, name='sum_mean_loss')

            self.reset_loss_monitor = tf.group(sum_mean_loss.assign(0.0),
                                               count.assign(0.0), name='reset_loss_monitor')
            self.update_loss_monitor = tf.group(sum_mean_loss.assign(sum_mean_loss+self.mean_loss),
                                                count.assign(count+1), name='update_loss_monitor')

            with tf.control_dependencies([self.update_loss_monitor]):
                self.average_loss = sum_mean_loss / count
                self.ppl = tf.exp(self.average_loss)

            average_loss_summary = tf.summary.scalar(
                    name = 'average loss', tensor = self.average_loss)
            ppl_summary = tf.summary.scalar(
                    name = 'perplexity', tensor = self.ppl)

        self.summaries = tf.summary.merge(
                inputs = [average_loss_summary, ppl_summary], name='loss_monitor')

        self.global_step = tf.get_variable('global_step', [], initializer=tf.constant_initializer(0.0))
        self.learning_rate = tf.placeholder(tf.float32, [], name='learning_rate')

        if is_training:
            tvars = tf.trainable_variables()
            grads, _ = tf.clip_by_global_norm(tf.gradients(self.mean_loss, tvars), self.max_grad_norm)
            optimizer = tf.train.AdamOptimizer(self.learning_rate)
            self.train_op = optimizer.apply_gradients(zip(grads, tvars), global_step=self.global_step)

You can define training batch_size value, whether dropout, to the diversity of the results, each can choose topK probability of training in the output layer softmax characters as output. After training you can use tensorboard network structure and training process visual display. Here we recommend a line of artificial intelligence modeling platform momodel.cn , complete with machine learning framework and Python runtime environment, and can be used free of GPU can be trained when we can try on this platform. Code and the trained model training section, see link .

Poetry generation

Call in front of the trained model we can achieve a ancient poetry of the application, and I am here to use Mo platform to achieve the acrostic poetry and possession of child automatically generated function, the effect of running as follows:
image
image

New Year is coming, and quickly using the algorithm poetry, sent to friends and family "smartest" blessing!
PC end complete code

Reference article:
https://www.jianshu.com/p/9dc9f41f0b29
https://zhuanlan.zhihu.com/p/26306795
https://github.com/norybaby/poet

------------------------------------ Mo (URL: http://momodel.cn ) is a Python support artificial intelligence modeling platform, training can help you quickly develop and deploy AI applications.

Published 36 original articles · won praise 4 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_44015907/article/details/86700531