NLP-"Natural Language Processing Based on PyTorch"

 
Detailed notes of the new book "Natural Language Processing Based on PyTorch"!
It is mainly about the understanding and processing of the code part of this book, and sorting out the process of processing methods, so that it can provide some help to scholars who have just entered NLP.

 
The content of the blog is organized according to the chapters of the article, and you can choose part of the content to learn.

Welcome to reprint, please indicate the source : https://blog.csdn.net/qq_41709378/article/details/113354298



 

Chapter 1 Overview

This part mainly introduces the basic concepts of tensors in PyTorch, the type and size of tensors, the basic operations of tensors, indexing, slicing and joining.

Here is the code to solve the exercises:

import torch

# 1.随机创建一个二维张量,然后第0维插入1个维度
a = torch.rand(3, 3)
a.unsqueeze(0)
print(a)

# 2.去掉刚刚加入张量上的维度
a.squeeze(0)
print(a)

# 3.在区间[3, 7]中创建一个形状为5*3的随机向量
a = 3 + torch.rand(5, 3)*(7-3)
print(a)

# 4.创建一个具有正态分布(mean = 0, std = 1)值的张量
a = torch.rand(3,3)
a.normal_()
print(a)
print(a.normal_())

# 5.找到torch.Tensor([1,1,1,0,1])中所有非零元素的索引
a = torch.Tensor([1,1,1,0,1])
print(torch.nonzero(a))

# 6.创建一个大小为(31)的随机张量,水平扩展4个副本
a = torch.rand(3,1)
print(a.expand(3, 4))

# 7.返回两个3维矩阵的乘积(a=torch.rand(3,4,5), b=torch.rand(3,5,4))
a = torch.rand(3,4,5)
b = torch.rand(3,5,4)
print(torch.bmm(a,b))

# 8.返回一个3维矩阵和一个2维矩阵的乘积(a=torch.rand(3,4,5), b=torch.rand(5,4))
a = torch.rand(3,4,5)
b = torch.rand(5,4)
print(b.unsqueeze(0).expand(a.size(0), *b.size()))
print(torch.bmm(a, b.unsqueeze(0).expand(a.size(0), *b.size())))

 
 

Chapter 2 Natural Language Processing

 
When you first start using the spacy package, import "en".

import spacy
nlp = spacy.load("en")
text = "Mary, don't slap the green witch"
print([str(token) for token in nlp(text.lower())])

The following problems will occur:


OSError: [E050] Can’t find model ‘en’. 
It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory

 
Solution 1:

Refer to the blog: https://blog.csdn.net/mr_muli/article/details/111592360

from spacy.lang.en import English
# 如下会报错:
#            import spacy
#            spacy_en = spacy.load('en')
#            return lambda s: [tok.text for tok in spacy_en.tokenizer(s)]
# 替换之后:

            from spacy.lang.en import English
            spacy_en = English()
            return lambda s: [tok.text for tok in spacy_en.tokenizer(s)]

 
Solution 2:

Reference blog: https://github.com/hamelsmu/Seq2Seq_Tutorial/issues/1
Reference blog: https://www.cnblogs.com/zrdm/p/8667131.html (this is the solution)
click en_core_web_sm Get en_core_web_sm-2.2.0.tar

After obtaining en_core_web_sm-2.2.0.tar:
refer to this blog: https://www.cnblogs.com/xiaolan-Lin/p/13286885.html
and execute:

pip install en_core_web_sm-2.2.0.tar.gz

 
Finally, the code is as follows:

import spacy

nlp = spacy.load("en_core_web_sm")
# nlp = spacy.load('en')
text = "Mary, don't slap the green witch"
print([str(token) for token in nlp(text.lower())])

 
 
Here is the exercise code in Chapter 2:


"""
第二章:自然语言处理
"""

# 例2-1:文本分词
"""
from spacy.lang.en import English
# 如下会报错:
#            import spacy
#            spacy_en = spacy.load('en')
#            return lambda s: [tok.text for tok in spacy_en.tokenizer(s)]
# 替换之后:

            from spacy.lang.en import English
            spacy_en = English()
            return lambda s: [tok.text for tok in spacy_en.tokenizer(s)]
"""
import spacy
nlp = spacy.load("en_core_web_sm")
# nlp = spacy.load('en')
text = "Mary, don't slap the green witch"
print([str(token) for token in nlp(text.lower())], '\n')

from nltk.tokenize import TweetTokenizer
tweet = u"Snow White and the Seven Degrees MakeAMovieCold@midnight:-)"
tokenizer = TweetTokenizer()
print(tokenizer.tokenize(tweet.lower()), "\n")


# 例2-2:从文本生成n元模型
def n_grams(text, n):
    return [text[i:i+n] for i in range(len(text)-n+1)]

cleand = ['mary', ',', "n't", 'slap', 'green', 'witch', '.']
print(n_grams(cleand, 3), "\n")


# 例2-3:词形还原:将单词还原为词根形式
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("he was running late")
for token in doc:
    print('{} --> {}'.format(token, token.lemma_))
print("\n")


# 例2-4:词性标注
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Mary slapped the green witch.")
for token in doc:
    print('{} --> {}'.format(token, token.pos_))
print("\n")


# 例2-5 名词短语(NP)分块
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Mary slapped the green witch.")
for chunk in doc.noun_chunks:
    print('{} --> {}'.format(chunk, chunk.label_))
print("\n")



 
 

Chapter 3 Neural Network Basics

 
In the study Example 3-5 , the error will be:

TypeError: prelu() missing 1 required positional arguments: "weight"

Solution:

Here we need to add weight a. Note that the type of weight a is: torch.FloatTensor

# 例3-5:PRelu激活函数
import torch
import matplotlib.pyplot as plt
import numpy as np

prelu = torch.nn.PReLU(num_parameters = 1)
x = torch.range(-5., 5., 0.1)
a = torch.tensor([0.25])  # a为torch.FloatTensor数据类型
# a = torch.from_numpy(np.array(0.25)).int()  # 运用form_numpy()可以实现类型转换
y = torch.prelu(x, a)

plt.plot(x.numpy(), y.numpy())
plt.show()

 

Chapter 4 Feedforward Networks for Natural Language Processing

 

Chapter 5 Embedded Words and Types

Chapter 6 Sequence Modeling of Natural Language Processing

Chapter 7 Intermediate Sequence Modeling for Natural Language Processing

Chapter 8 Advanced Sequence Modeling for Natural Language Processing

Chapter 9 Classics, Frontiers and Next Developments

Guess you like

Origin blog.csdn.net/qq_41709378/article/details/113354298