Pytorch implements Transformer machine translation by hand

insert image description here


foreword

From the previous book, it seems a bit stiff to realize the Transformer model from scratch. After all, the model does not run or cannot be applied, which is a bunch of "dead code" . scratch) to experience the power of Transformer after learning, so I have this blog. Because Transformer was originally proposed by the Google team to solve NLP problems such as machine translation, Transformer is naturally suitable for the NLP field, so the application of using Transformer for machine translation is the most direct (in fact, I have always wanted to play NLP projects).
Due to time constraints, we will first announce the environment configuration and operation results.


1. Environment configuration

The source project is github.com/SamLynnEvans/Transformer , but the code may have various problems due to its age. I will publish the source code of my project soon.
Next are two special libraries. Just pip install is not finished, and the corresponding toolkit needs to be installed additionally.

1. torchtext

The installation of torchtext is the most noteworthy part!

Method1:

Install directly using pip install torchtext

pip install torchtext

If your pytorch version is lower, this command will automatically update pytorch and install the cpu version. At this time, the old pytorch will be uninstalled, and the installed new version of pytorch may not be compatible. Use with caution!

Method2:

Install using conda install -c pytorch torchtext

conda install -c pytorch torchtext

It is recommended to try Method2, and my method is to use Method1 directly in the Anaconda virtual environment. Because it is a laboratory computer, I don’t want to pollute the base environment, so I build a virtual environment for easy operation.

2. Space

Spacy is known as an industrial-grade Python natural language processing (NLP) software package, which can perform part-of-speech analysis, named entity recognition, dependency characterization, and word embedding vector calculation and visualization of natural language texts.

There is nothing to pay attention to in the installation of Spacy, just install it directly with pip install.

pip install spacy

However, this article needs to use two toolkits in English (en) and French (fr), so it needs to be downloaded additionally in Spacy. A faster method is to install the English and French toolkits directly from the official website, and then manually pip install them into spacy. The advantage of this method is that it is faster and easier to control.
Given the website for downloading en and fr language packs:
https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.5.0
https://github.com/explosion/spacy-models/releases/ tag/fr_core_news_sm-3.5.0
Tip : The github website may need to be scientifically surfed to load more smoothly. (These files will be included when the source code of this project is released. If readers need to reproduce, please don’t worry about insufficient resources)

Take en package download as an example:

Enter the website https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.5.0
insert image description here
and click download in the red box above to download.

Manually install language packs to spacy

The method is also very simple. Go to the downloaded language pack directory and enter the small black window of the Terminal terminal (cmd under windows), and enter pip install *****.tar.gz to install it. Language packs are automatically downloaded into the spacy repository.

3. NLTK

If you don't have nltk in your environment, first:

pip install nltk

This part of the source code uses the wordnet package in nltk, if there is no such package, it needs to be downloaded.

Method1:

Visual installation, just enter a python console interface, enter

import nltk
nltk.download()

Not surprisingly, the following pop-up window will appear:
insert image description here
Then click Corpora as shown in the figure below, then pull down the scroll bar to find wordnet and click "download". If the network is good, you will see that the red progress bar in the blue box in the lower right corner keeps growing.
insert image description here
However, this method is generally slower, because the network speed is not good enough to install.

Method2:

Go directly to the official website to find the zip package and download it directly to solve the problem from the root cause.
Enter the website http://www.nltk.org/nltk_data/
"Ctrl+F" search "id: wordnet" (please note that there is a space after the colon), there will be several search results, select the wordnet as shown in the figure below Download: If the file shows 10.3MB
insert image description here
when downloading , it proves that the download is correct! Then put the downloaded wordnet.zip in the nltk_data/corpora directory where the reader's computer is located. If you don't know where nltk_data is, you can enter the following command in python and all nltk_data paths will appear.

import nltk
nltk.download("wordnet")

insert image description here

2. Project source code

1. Error correction

The answer to the question in the article Pytorch implements Transformer (from scratch) from scratch is: the code in the Multi-Head Attention part is somewhat inappropriate. The source code of this project is as follows, you can compare it with the original code.

class MultiHeadAttention(nn.Module):
    def __init__(self, heads, d_model, dropout = 0.1):
        super().__init__()
        
        self.d_model = d_model
        self.d_k = d_model // heads
        self.h = heads
        
        """
		============================================
		问题所在之处
		"""
        self.q_linear = nn.Linear(d_model, d_model)
        self.v_linear = nn.Linear(d_model, d_model)
        self.k_linear = nn.Linear(d_model, d_model)
        """
		============================================
		"""
		
        self.dropout = nn.Dropout(dropout)
        self.out = nn.Linear(d_model, d_model)
    
    def forward(self, q, k, v, mask=None):
        
        bs = q.size(0)
        
        # perform linear operation and split into N heads
        k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
        q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
        v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
        
        # transpose to get dimensions bs * N * sl * d_model
        k = k.transpose(1,2)
        q = q.transpose(1,2)
        v = v.transpose(1,2)
        

        # calculate attention using function we will define next
        scores = attention(q, k, v, self.d_k, mask, self.dropout)
        # concatenate heads and put through final linear layer
        concat = scores.transpose(1,2).contiguous()\
        .view(bs, -1, self.d_model)
        output = self.out(concat)
    
        return output

2. github source code project

The runnable project has been released to github at 2023.4.8.1:00, and the source code link is https://github.com/Regan-Zhang/Transformer-Translation
Since the weight of the trained model is 300M, it will not be uploaded to github Yes, if readers need it, please chat privately on QQ (see my personal homepage introduction for the QQ number). I trained 110epochs on the 3060 graphics card. The log part of the training process is recorded as follows:

(graphCC) public@public-System-Product-Name:~/zhx_Regan/Transformer-master$ python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en_core_web_sm  -trg_lang fr_core_news_sm  -epochs 10
loading spacy tokenizers...
creating dataset and iterator...
The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
model weights will be saved every 1 minutes and at end of epoch to directory weights/
training model...
2m: epoch 1 [####################]  100%  loss = 3.40303
epoch 1 complete, loss = 3.403
4m: epoch 2 [####################]  100%  loss = 2.38484
epoch 2 complete, loss = 2.384
6m: epoch 3 [####################]  100%  loss = 1.86363
epoch 3 complete, loss = 1.863
9m: epoch 4 [####################]  100%  loss = 1.56969
epoch 4 complete, loss = 1.569
11m: epoch 5 [####################]  100%  loss = 1.44242
epoch 5 complete, loss = 1.442
13m: epoch 6 [####################]  100%  loss = 1.21919
epoch 6 complete, loss = 1.219
15m: epoch 7 [####################]  100%  loss = 1.19595
epoch 7 complete, loss = 1.195
18m: epoch 8 [####################]  100%  loss = 1.04545
epoch 8 complete, loss = 1.045
20m: epoch 9 [####################]  100%  loss = 1.01818
epoch 9 complete, loss = 1.018
22m: epoch 10 [####################]  100%  loss = 0.94545

"""......"""

204m: epoch 97 [####################]  100%  loss = 0.28888
epoch 97 complete, loss = 0.288
206m: epoch 98 [####################]  100%  loss = 0.27474
epoch 98 complete, loss = 0.274
208m: epoch 99 [####################]  100%  loss = 0.27373
epoch 99 complete, loss = 0.273
210m: epoch 100 [####################]  100%  loss = 0.28787
epoch 100 complete, loss = 0.287
training complete, save results? [y/n] : y
command not recognised, enter y or n : y
saving weights to weights/...
weights and field pickles saved to weights
train for more epochs? [y/n] : n
exiting program...

For detailed records, please refer to the log.txt file in the github project.

3. Running results

1. Model training (train)

Open the terminal and enter the command to train Transformer for machine translation tasks from English to French.

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en_core_web_sm -trg_lang fr_core_news_sm -epochs 10

That is, specify the english.txt and french.txt corpus (provided with the project code), and train for 10epochs first. In the original project, a K100 with 8GB video memory was used to run. This article uses Nvidia 3060 (12G video memory), so the training is more than enough. 10epochs can be trained within an hour.
insert image description here

2. Translation reasoning (inference)

After training the model, load the saved directory, here is weights, which is actually the path specified during training.

python translate.py -load_weights weights -src_lang en_core_web_sm -trg_lang fr_core_news_sm

insert image description here
The red box is the input English, and the blue box is the French translated by the model.
Although we may not be able to read French, we can turn French into Chinese by handing it over to an off-the-shelf translator.

I am a post graduate student.
Why do you like reading science?
This is the school I studied for four years. —— "Youdao Translation"

It can be seen that the translation is quite decent, and the attributive clause after where can be translated, which proves that Transformer is a feasible model for machine translation.


Summarize

At present, what this blog reflects is a part of the comparison surface, aiming to arouse everyone's interest in learning about Transformer and other deep learning models (and also stimulate my own motivation for learning). After learning the model, use the self-implemented model to make a runnable demo or application, which can not only deepen the memory, but also trigger the learning of more knowledge points like a chain reaction, improving the breadth of knowledge and the depth of cognition.

reference site

https://github.com/SamLynnEvans/Transformer
【Pytorch】Torchtext ultimate installation method and common problems
PYTHON -M SPACY DOWMLOAD EN fails
to install NLTK toolkit offline

Guess you like

Origin blog.csdn.net/weixin_43594279/article/details/129802040