[Natural Language NLP] TensorFlow uses LSTM to implement text matching tasks

In the field of NLP natural language processing, sometimes we need to calculate the similarity between different texts, encode different texts, and then process them into Embedding fixed-length representation vectors, and then use LSTM for output text representation, define multiple multi-input source data for calculate.

Sentence 1: I don't like to eat fish head with chopped pepper, but I like to eat fish head

Sentence 2: I love potatoes, but not sweet potatoes

The LSTM network is also used to abstract each sentence into a vector representation. By calculating the similarity between the two vectors, the text similarity calculation task can be quickly completed. In practical scenarios, we usually use the hidden result of the last step of the LSTM network to abstract a sentence into a vector, and then measure the similarity of two sentences by means of vector dot product or cosine similarity.

image-20220114191557152

code show as below:

"""
 * Created with PyCharm
 * 作者: 阿光
 * 日期: 2022/1/14
 * 时间: 18:55
 * 描述:
"""
import tensorflow as tf
from keras import Model
from tensorflow.keras.layers import *


def get_model():
    x_input = Input(shape=30)
    y_input = Input(shape=30)

    x_embedding = Embedding(input_dim=252173,
                            output_dim=256)(x_input)
    y_embedding = Embedding(input_dim=252173,
                            output_dim=256)(y_input)

    x_lstm = LSTM(128)(x_embedding)
    y_lstm = LSTM(128)(y_embedding)

    def cosine_distance(x1, x2):
        x1_norm = tf.sqrt(tf.reduce_sum(tf.square(x1), axis=1))
        x2_norm = tf.sqrt(tf.reduce_sum(tf.square(x2), axis=1))
        x1_x2 = tf.reduce_sum(tf.multiply(x1, x2), axis=1)
        cosin = x1_x2 / (x1_norm * x2_norm)
        return tf.reshape(cosin, shape=(-1, 1))

    score = cosine_distance(x_lstm, y_lstm)

    output = Dense(1, activation='sigmoid')(score)

    model = Model([x_input, y_input], output)

    return model


model = get_model()
model.summary()

image-20220114192918296

Guess you like

Origin blog.csdn.net/m0_47256162/article/details/122500900