Deep learning: Sentiment analysis based on long short-term memory network LSTM

Table of contents

1 Introduction to LSTM network

1.1 Overview of LSTM

1.2 LSTM network structure

1.3 LSTM gate mechanism

1.4 Bidirectional LSTM

2 Pytorch LSTM input and output

2.1 LSTM parameters

2.2 LSTM input

 2.3 LSTM output

2.4 Hidden layer state initialization

3 Implement sentiment analysis based on LSTM

3.1 Introduction to sentiment analysis

3.2 Introduction to data sets

3.3 Code implementation based on pytorch

3.3.1 Dataset loading

3.3.2 Model construction

3.3.3 Model training

3.3.4 Model verification

3.3.5 Model prediction

3.4 Running results

3.5 Complete code

4 Summary


1 Introduction to LSTM network

1.1 Overview of LSTM

Long Short Term Memory Network (LSTM) is a special type of RNN. LSTM was proposed to solve the problems of gradient disappearance and gradient explosion in long sequence data processing in RNN networks. It was first proposed by Hochreiter & Schmidhuber in 1997, and was later refined and promoted by many experts and scholars. Now it has been widely used due to its excellent performance.

LSTM has wide applications in many fields. Typical application scenarios are as follows:

  • Natural Language Processing (NLP). In the field of NLP, LSTM can be used for tasks such as text classification, sentiment analysis, and machine translation. By modeling text sequences, LSTM is able to capture long-term dependencies in text, thereby improving the accuracy of the model.
  • Speech Recognition. In speech recognition, the LSTM network model can be used to model acoustic models and language models. By jointly modeling speech signals and language models, LSTM can improve the accuracy of speech recognition.
  • Stock trend predictions. LSTM can be applied to stock market predictions and can help investors predict future stock trends by analyzing historical data.

1.2 LSTM network structure

LSTM mainly relies on the introduction of a "gate" mechanism to control the spread of information. Compared with the recurrent neural network RNN, the network structure of LSTM is much more complex. In the LSTM network, the transmission of information is controlled by introducing three gates, namely forget gate, input gate and output gate. The gate mechanism is an important concept in LSTM, so what is a "gate" and how does the gate mechanism solve the problem of long-distance dependency in LSTM.

RNN is a chained form that contains a large number of repeated neural network modules. In standard RNN, these repeated neural network structures are often very simple, such as only containing a single tanh layer:

The standard RNN only contains the repeating module of a single tanh layer. The LSTM also has a similar chain structure, but the difference is that its repeating module structure is different. It is four neural networks that interact in a special way. The LSTM diagram is as follows:

The symbols in the figure have the following meanings:

In a schematic diagram, each line passes a complete vector from the output of a node to the input of another node. Pink circles represent pointwise operations such as node summation, while yellow boxes represent neural network layers used for learning. The two lines that merge represent a connection, and the two lines that separate represent that the information has been copied into two copies and will be delivered to different locations.

The internal structure diagram of LSTM is as follows:

The figure ca2e01a80a0220069168f9b012ff4216.pngrepresents the forget gate, cf5bcb9803a46fb739b2800951c1c319.pngthe input gate, and 3091516090d9993be68608a46fdd16f3.pngthe output gate. CIt is a memory cell, which stores memory information. 681ac60ef0697ba97c40ebbbff9872d3.pngRepresents the memory information of the previous moment, 3802974735f2ec4406ecb0f40bb75db7.pngrepresents the memory information of the current moment, his the output of the LSTM unit, and 5a90fed5f45bb9b10a9202b28941df7a.pngis the output of the previous moment.

The hidden layer of the original RNN has only one state, h, which is very sensitive to short-term input. Then, if we add another state, namely c, and let it save the long-term state, as shown in the figure below:

The newly added state c is called cell state. Expand the above figure according to the time dimension:

At each time t, there are three inputs to LSTM: the current input value x(t), the output value h(t-1) of the LSTM at the previous time, and the unit state c(t-1) at the previous time. There are two outputs of LSTM, the output value h(t) of LSTM at the current moment and the unit state c(t) at the current moment.

The key to LSTM is how to control the long-term state c. Here, the idea of ​​LSTM is to use three control switches.

  • The first switch is responsible for controlling the continued preservation of long-term state c;
  • The second switch is responsible for controlling the input of the immediate state to the long-term state c;
  • The third switch is responsible for controlling whether to use the long-term state c as the output of the current LSTM.

The functions of the three switches are as shown in the figure below:

1.3 LSTM gate mechanism

The "door" in reality is usually interpreted as an entrance and exit. The door in the LSTM network is also a kind of entrance and exit, but it is the entrance and exit of control information. There are usually three states of the door, namely fully open (the probability of information passing is 1), fully closed (the probability of information passing is 0), and half-open (the probability of information passing is between 0 and 1). Here, we find that the information in the three states of fully open, fully closed and half open can be represented by probability. In the neural network, the sigmoid function is also a representation between 0 and 1, which can be applied to LSTM The middle gate is being calculated.

The switch is implemented in the algorithm using a gate. The gate is actually a fully connected layer. Its input is a vector and the output is a real number vector between 0 and 1. Assuming that W is the weight vector of the gate and b is the bias term, then the gate can be expressed as:

The use of a gate is to multiply the gate's output vector element-wise by the vector we need to control. Because the output of the gate is a real vector between 0 and 1, then when the gate output is 0, any vector multiplied by it will get a 0 vector, which is equivalent to nothing passing; when the output is 1, any vector Multiplying it will not change anything, which is equivalent to anything passing. Because the value range of σ (that is, the sigmoid function) is (0,1), the status of the door is half open and half closed.

LSTM has three gates in total, and LSTM uses two gates to control the content of unit state c:

  • Forget gate: It determines how much of the unit state c(t-1) at the previous moment is retained until the current moment c(t);
  • Input gate: It determines how much of the input x(t) of the network at the current moment is saved to the unit state c(t).
  • Output gate: It controls how much of the unit state c(t) is output to the current output value h(t) of the LSTM.

Forgetting gate: Control the unit state at the last moment

The main function of the forget gate is to determine the current state that requires discarding previous information. The forgetting gate is determined by the output h of the previous moment and the input x of this moment. The calculation propagation of the forgetting gate is as follows:

Input gate: Control the input gate at the current moment

 The input gate is also determined by the output h of the previous moment and the input t of this moment. The calculation propagation is as follows:

In addition to the forget gate and input gate, there is also the unit state c at the current moment (this is just a temporary value, describing the current input, controlled by the input gate), which is also determined by the output h at the previous moment and the input x at this moment. The calculation formula as follows:

The spread is calculated as follows:

The final unit state c at the current moment to be passed to the next layer is determined by the forget gate, the input gate, the temporary value of the unit state describing the input, and the final unit state at the previous moment.

The circles represent element-wise multiplication, and the calculation rules are as follows:

Broadcast when acting on a vector and a matrix.

The final unit state at the previous moment is determined by the forget gate, and the temporary value of the current unit state is determined by the input gate. The current memory and long-term memory are combined through the forget gate and the input gate to form the final unit state at the current moment.

Due to the control of the forgetting gate, it can save information from a long time ago, and due to the control of the input gate, it can prevent currently irrelevant content from entering the memory.

Output gate: controls the unit state at the current moment

The output of the LSTM at the current moment is controlled by the output gate. The output gate controls the impact of long-term memory on the current output. It is related to the input x at the current moment and the output h at the previous moment. The calculation formula is as follows:

The computational propagation of the output gate is as follows:

The calculation of the final output is jointly determined by the final unit state c at the current moment and the output gate. The final unit state includes not only the memory at this time, but also the previous memory. Calculated as follows:

The final calculation of LSTM is as follows:

1.4 Bidirectional LSTM

Bidirectional LSTM, composed of two ordinary LSTMs, can simultaneously use information from past and future moments to derive information at the current moment. Bidirectional LSTM can process contextual information in the sequence for better processing. Bidirectional LSTM can predict the output at each time step by looking at the preceding and following text in the sequence.

Taking natural language processing as an example, the two-way LSTM model can obtain a better language model than the one-way LSTM model. In NLP, people often use bidirectional LSTM to capture lexical context information and sentence structure information to improve the performance of language models. The specific network structure is as follows:

The multi-layer LSTM structure is as follows:

2 Pytorch LSTM input and output

2.1 LSTM parameters

torch.nn.LSTM(*args, **kwargs)

Parameters:

  • input_size – The number of expected features in the input x

  • hidden_size – The number of features in the hidden state h

  • num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

  • bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

  • batch_first – If True, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default: False

  • dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0

  • bidirectional – If True, becomes a bidirectional LSTM. Default: False

  • proj_size – If > 0, will use LSTM with projections of corresponding size. Default: 0

LSTM has a total of seven parameters, of which only the first three are required. Since everyone generally uses PyTorch's DataLoader to form batch data, batch_first is also important. Two common application scenarios of LSTM are text processing and time series prediction:

  • input_size: input feature dimension

(1) Text processing: In text processing, since a word cannot participate in the operation, we have to use Word2Vec to embed the word and represent each word as a vector. At this time, input_size=embedding_size. For example, there are five words in each sentence, and each word is represented by a 100-dimensional vector, then here input_size=100;

(2) Time series data processing: In time series forecasting, for example, if load needs to be predicted, each load is a separate value and can be directly involved in the operation. Therefore, there is no need to represent each load as a vector. In this case, input_size =1. But if we use multi-variables for prediction, for example, we use [load, wind speed, temperature, pressure, humidity, weather, holiday information] at each moment in the previous 24 hours to predict the load at the next moment, then input_size=7 at this time.

  • hidden_size: the number of hidden layer nodes. It can be set according to the actual situation.
  • num_layers: number of layers. Compared with nn.LSTM, nn.LSTMCell has num_layers defaulting to 1.
  • batch_first: Default is False.

LSTM defaults to batch_first=False, that is, the default batch_size dimension is the dimension in the middle of the data dimension, that is, the input data is in the format of [seq_len, batch_size, hidden_size]. at this time:

lstm_out:【seq_len, batch_size, hidden_size * num_directions】
lstm_hn:【num_directions * num_layers, batch_size, hidden_size】

When batch_first=True is set, the input data is in the format of [batch_size, seq_len, hidden_size]. at this time:

lstm_out:【 batch_size, seq_len, hidden_size * num_directions】
lstm_hn:【num_directions * num_layers, batch_size, hidden_size】

2.2 LSTM input

 It can be seen that the input consists of two parts: input, (initial hidden state h_0, initial unit state c_0) ​ where input:

input(seq_len, batch_size, input_size)

When setting batch_first=True:

input(batch_size, seq_len, input_size)
  • seq_len: In text processing, if a sentence has 7 words, then seq_len=7; in time series prediction, assuming we use the load of the previous 24 hours to predict the load of the next moment, then seq_len=24.
  • batch_size: The number of samples input into LSTM at one time. In text processing, many sentences can be input at one time; in time series prediction, many pieces of data can also be input at one time.
  • input_size: input feature dimension.

(h_0, c_0):

h_0(num_directions * num_layers, batch_size, hidden_size)
c_0(num_directions * num_layers, batch_size, hidden_size)

The shapes of h_0 and c_0 are consistent.

  • num_directions: If it is a bidirectional LSTM, then num_directions=2; otherwise num_directions=1.
  • num_layers: number of layers.
  • batch_size: The number of samples input into LSTM at one time.
  • hidden_size: the number of hidden layer nodes.

Summarized as follows:

Input data includes input,(h_0,c_0):

  • Input is a tensor with shape==(seq_length,batch_size,input_size), batch_first defaults to False
  • h_0's shape==(num_layers×num_directions,batch,hidden_size) tensor, which contains the initial hidden state of each sentence in the current batch_size, num_layers is the number of LSTM layers, if
  • bidirectional=True,num_directions=2, otherwise it is 1, which means there is only one direction.
  • The shapes of c_0 and h_0 are the same, and they contain the initial cell state of each sentence in the current batch_size.
  • If h_0,c_0 are not provided, the default is 0
  • batch_first=True, is to put the batch in the first dimension of input and output.

 2.3 LSTM output

The output also consists of two parts: output, (hidden state h_n, unit state c_n), where the shape of output is:

output(seq_len, batch_size, num_directions * hidden_size)

When setting batch_first=True: 

output(batch_size, seq_len, num_directions * hidden_size)

The shapes of h_n and c_n remain unchanged

Summarized as follows:

Output data includes output,(h_n,c_n):`

  • The shape of output==(seq_length,batch_size,num_directions×hidden_size), which contains the output features (h_t) of the last layer of LSTM, t is the length of each sentence in batch_size. batch_first defaults to False.
  • h_n contains information of all layers. shape==(num_directions × num_layers,batch,hidden_size)
  • c_n.shape==h_n.shape
  • h_n contains the hidden state of the last word of the sentence, and c_n contains the cell state of the last word of the sentence, so they are independent of the length of the sentence seq_length.
  • output[-1] is equal to h_n, because output[-1] contains exactly the hidden state of the last word of each sentence in batch_size sentences. Note that the hidden state in LSTM is actually the output, cell state. It is what is always hidden in LSTM and records information.

2.4 Hidden layer state initialization

During the LSTM network training process, the hidden layer will be re-initialized for each batch.

    for epoch in range(epochs):
        for index, (x_train, y_train) in enumerate(train_loader):
            cur_batch = len(x_train)
            h = model.init_hidden(cur_batch)  # 初始化第一个Hidden_state

            x_train, y_train = x_train.to(device), y_train.to(device)
            step += 1  # 训练次数+1

h and c are states, not parameters. They need to be initialized to 0 in each batch. The parameters in LSTM are W and b. What is trained in the LSTM network are parameters, not states.

For each batch training, propagation must be restarted, regardless of the relationship between the two batches, so the hidden state must be initialized.

3 Implement sentiment analysis based on LSTM

3.1 Introduction to sentiment analysis

Text sentiment analysis (Sentiment Analysis) is a common application in natural language processing (NLP) methods and is also an interesting basic task, especially classification with the purpose of extracting the emotional content of text. It is the process of analyzing, processing, summarizing and reasoning on subjective texts with emotional color.
This article will introduce sentiment polarity (tendency) analysis in sentiment analysis. The so-called emotional polarity analysis refers to the judgment of praise, derogation, and neutrality on the text. In most application scenarios, they are only divided into two categories. For example, the two words "love" and "disgust" belong to different emotional tendencies.
This article will introduce in detail how to use the LSTM model in the deep learning model to implement sentiment analysis of text.

3.2 Introduction to data sets

Taking the comments of a certain product on an e-commerce website as the corpus (corpus.csv), the data set has a total of 4310 comment data. The sentiment of the text is divided into two categories: "positive" and "negative", of which 1,908 are positive data. , 2375 pieces of negative data.

evaluation,label
用了一段时间,感觉还不错,可以,正面
电视非常好,已经是家里的第二台了。第一天下单,第二天就到本地了,可是物流的人说车坏了,一直催,客服也帮着催,到第三天下午5点才送过来。父母年纪大了,买个大电视画面清晰,趁着耳朵还好使,享受几年。,正面
电视比想象中的大好多,画面也很清晰,系统很智能,更多功能还在摸索中,正面
不错,正面
用了这么多天了,感觉还不错。夏普的牌子还是比较可靠。希望以后比较耐用,现在是考量质量的时候。,正面
物流速度很快,非常棒,今天就看了电视,非常清晰,非常流畅,一次非常完美的购物体验,正面
非常好,客服还特意打电话做回访,正面
物流小哥不错,辛苦了,东西还没用,正面
......
价格给力,买的时候有点贵了,现在便宜了。,负面
价格欺诈,先把价格抬很好,然后降价,刚买了没几天,电视还没怎么体验呢,又降了200。有点心塞。询问客服,客服都不搭理的。物流师傅倒是不错,只是物流公司不按约定时间送货,随意更改。,负面
留意微鲸好久了,一直等待活动入手。可惜越等越贵,今年电视全部都涨价了,6月果断入手。用着还不错,一到跟儿子一起打开,他很开心,可以看巧虎,挂架是找了楼下装空调的师傅打了墙孔,有钻头我就自己搞定了,负面
买的第三台微鲸电视了,效果很好,送货快,推荐哦(????????)哦,就是今年涨价了,去年618 55寸只要2600而且还是lg的屏,今年换成京东方了,负面
买完就降价两百,负面
去nmd京东,说好30天价保。5月24号下单买的3198,6月1日直接给我降到2698!当时安装师傅都说买贵了,我还没在意,现在给我来这一出。去nmlgb的京东,当别人都是傻子吗?!,负面
去年1299元购买一台微鲸W43F,觉得不错。打算再买一台,谁知涨价了。经反复比较觉得还是微鲸性价比高,最后决定升级55寸4K大屏。内存2G+16G,还有语音遥控。悲催的是遥控器可能是坏的,正在申请售后。希望不要出什么麻烦。,负面
去年双十一的时候买了微鲸55寸的电视,使用感觉还可以,后来电视一直涨价,好在这次京东618的价格还算可以,虽然还是相比去年的价格还是上涨了不少,但这个也没办法,显示屏,内存,闪存的那些上游厂家都涨价了,可惜的就是43寸的电视在使用中感觉性能上略有不足,不过普通家庭看电视也可以了,负面
速度快,价格一般速度快,价格一般,负面
微鲸的性价比和操作性一直是比较好的,从去年到现在,给自己和亲戚买了五六个了,虽然液晶面板涨价,但总体来说还是划算的。和市面上大多的智能电视相比,软件安装不受限是个最大的优点,而且蓝牙遥控也很好用,比红外好操作。界面也很友好。,负面
微鲸电视不是第一个人,已经是买第二台了,一直觉得不错。嗯,但是价格比之前买的贵了,希望,又不过趁着618还是很划算的。,负面
微鲸电视超棒(⊙o⊙)哦,就是比去年贵不少,负面

Download address of the data set: corpus.csv

3.3 Code implementation based on pytorch

3.3.1 Dataset loading

data_path = 'data/corpus.csv'
    df = pd.read_csv(data_path)

    x = df['evaluation']
    y = df['label']

    texts_cut = [jieba.lcut(one_text) for one_text in x]

    label_set = set()
    for label in y:
        label_set.add(label)
    label_set = np.array(list(label_set))

    labels_one_hot = []
    for label in y:
        label_zero = np.zeros(len(label_set))
        label_zero[np.in1d(label_set, label)] = 1
        labels_one_hot.append(label_zero)
    labels = np.array(labels_one_hot)

    num_words = 3000
    tokenizer = Tokenizer(num_words=num_words)
    tokenizer.fit_on_texts(texts=texts_cut)
    num_words = min(num_words, len(tokenizer.word_index) + 1)

    sentence_len = 64
    texts_seq = tokenizer.texts_to_sequences(texts=texts_cut)
    texts_pad_seq = pad_sequences(texts_seq, maxlen=sentence_len, padding='post', truncating='post')

    # 拆分训练集和测试集
    x_train, x_test, y_train, y_test = train_test_split(texts_pad_seq, labels, test_size=0.2, random_state=1)

    train_dataset = TensorDataset(torch.from_numpy(x_train), torch.from_numpy(y_train))
    test_dataset = TensorDataset(torch.from_numpy(x_test), torch.from_numpy(y_test))

    batch_size = 32
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

3.3.2 Model construction

import torch.nn as nn
import torch


class SentimentNet(nn.Module):
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device("cpu")

    def __init__(self, vocab_size, input_dim, hid_dim, layers, output_dim):
        super(SentimentNet, self).__init__()
        self.n_layers = layers
        self.hidden_dim = hid_dim
        self.embeding_dim = input_dim
        self.output_dim = output_dim
        drop_prob = 0.5

        self.lstm = nn.LSTM(self.embeding_dim, self.hidden_dim, self.n_layers,
                            dropout=drop_prob, batch_first=True)

        self.fc = nn.Linear(in_features=self.hidden_dim, out_features=self.output_dim)
        self.sigmoid = nn.Sigmoid()
        self.dropout = nn.Dropout(drop_prob)

        self.embedding = nn.Embedding(vocab_size, self.embeding_dim)

    def forward(self, x, hidden):
        x = x.long()
        embeds = self.embedding(x)

        lstm_out, hidden = self.lstm(embeds, hidden)
        out = self.dropout(lstm_out)
        out = self.fc(out)
        out = self.sigmoid(out)
        out = out[:, -1, :]
        out = out.squeeze()
        out = out.contiguous().view(-1)
        return out, hidden

    def init_hidden(self, batch_size):
        hidden = (torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(self.device),
                  torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(self.device))
        return hidden

3.3.3 Model training

    model = SentimentNet(num_words, 256, 128, 8, 2)

    lr = 0.0001
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.BCELoss()
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device("cpu")

    epochs = 32
    step = 0
    model.train()  # 开启训练模式

    for epoch in range(epochs):
        for index, (x_train, y_train) in enumerate(train_loader):
            cur_batch = len(x_train)
            h = model.init_hidden(cur_batch)  # 初始化第一个Hidden_state

            x_train, y_train = x_train.to(device), y_train.to(device)
            step += 1  # 训练次数+1

            x_input = x_train.to(device)
            model.zero_grad()

            output, h = model(x_input, h)

            # 计算损失
            loss = criterion(output, y_train.float().view(-1))
            loss.backward()

            nn.utils.clip_grad_norm_(model.parameters(), max_norm=5)
            optimizer.step()

            if step % 32 == 0:
                print("Epoch: {}/{}...".format(epoch + 1, epochs),
                      "Step: {}...".format(step),
                      "Loss: {:.6f}...".format(loss.item()))

        epoch_loss.append(loss)

3.3.4 Model verification

    model.eval()
    loss = 0
    for data in tqdm(train_loader):
        x_train, y_train = data
        x_train, y_train = x_train.to(device), y_train.to(device)

        cur_batch = len(x_train)
        h = model.init_hidden(cur_batch)  # 初始化第一个Hidden_state

        x_input = x_train.long()
        x_input = x_input.to(device)
        output, h = model(x_input, h)

        loss += criterion(output, y_train.float().view(-1))

    print("test Loss: {:.6f}...".format(loss))

3.3.5 Model prediction

    test_text_cut = [jieba.lcut("商品质量相当不错,点赞"),
                     jieba.lcut("什么破东西,简直没法使用")]

    test_seq = tokenizer.texts_to_sequences(texts=test_text_cut)
    test_pad_seq = pad_sequences(test_seq, maxlen=sentence_len, padding='post', truncating='post')
    h = model.init_hidden(len(test_pad_seq))

    output, h = model(torch.tensor(test_pad_seq), h)
    print(output.view(-1, 2))

3.4 Running results

Epoch: 1/32... Step: 108... Loss: 73.786831...
Epoch: 2/32... Step: 216... Loss: 64.638946...
Epoch: 3/32... Step: 324... Loss: 62.405033...
Epoch: 4/32... Step: 432... Loss: 55.636972...
Epoch: 5/32... Step: 540... Loss: 47.018067...
Epoch: 6/32... Step: 648... Loss: 42.855574...
Epoch: 7/32... Step: 756... Loss: 38.990385...
Epoch: 8/32... Step: 864... Loss: 36.197353...
Epoch: 9/32... Step: 972... Loss: 38.080607...
Epoch: 10/32... Step: 1080... Loss: 35.610960...
Epoch: 11/32... Step: 1188... Loss: 33.048675...
Epoch: 12/32... Step: 1296... Loss: 31.463715...
Epoch: 13/32... Step: 1404... Loss: 31.872352...
Epoch: 14/32... Step: 1512... Loss: 32.763812...
Epoch: 15/32... Step: 1620... Loss: 28.785963...
Epoch: 16/32... Step: 1728... Loss: 29.832949...
Epoch: 17/32... Step: 1836... Loss: 27.506568...
Epoch: 18/32... Step: 1944... Loss: 25.992162...
Epoch: 19/32... Step: 2052... Loss: 23.530551...
Epoch: 20/32... Step: 2160... Loss: 25.915338...
Epoch: 21/32... Step: 2268... Loss: 24.821649...
Epoch: 22/32... Step: 2376... Loss: 21.365095...
Epoch: 23/32... Step: 2484... Loss: 21.001188...
Epoch: 24/32... Step: 2592... Loss: 19.786633...
Epoch: 25/32... Step: 2700... Loss: 18.771839...
Epoch: 26/32... Step: 2808... Loss: 18.928787...
Epoch: 27/32... Step: 2916... Loss: 18.087029...
Epoch: 28/32... Step: 3024... Loss: 17.189056...
Epoch: 29/32... Step: 3132... Loss: 16.458333...
Epoch: 30/32... Step: 3240... Loss: 15.939349...
Epoch: 31/32... Step: 3348... Loss: 15.498337...
Epoch: 32/32... Step: 3456... Loss: 15.064007...
100%|██████████| 27/27 [00:01<00:00, 21.16it/s]
test Loss: 9.794393...
tensor([[0.9946, 0.0059],
        [0.0509, 0.9495]], grad_fn=<ViewBackward0>)

Loss value change curve:

    x = [epoch + 1 for epoch in range(epochs)]
    plt.plot(x, epoch_loss_list)

    plt.xlim(0, 32)
    plt.ylim(0, 100)
    plt.show()

3.5 Complete code

Code: https://gitcode.net/ai-medical/lstm_sentiment_analyse

4 Summary

Through its internal gating mechanism, LSTM can effectively process long sequence data and capture long-term dependencies in the sequence, so it has broad application prospects in various sequence data modeling tasks.

Guess you like

Origin blog.csdn.net/lsb2002/article/details/132835517