pytorch torch.nn function learning record

1.torch.nn.Embedding

torch.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None)
Input: (*) LongTensor of arbitrary shape contains the index to be extracted
Output: (*, H)* is the input size, H is embedding_dim

Parameter Description:

num_embeddings (int) – size of the dictionary of embeddings
嵌入s的字典的大小
embedding_dim (int) – the size of each embedding vector
每个嵌入向量的大小
padding_idx (int, optional) – If given, pads the output with the embedding vector at padding_idx (initialized to zeros) whenever it encounters the index.
如果给定，则在 遇到索引时，将输出嵌入矢量padding_idx（初始化为零）
max_norm (float, optional) – If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.
如果给定，则将范数大于的每个嵌入向量max_norm 重新归一化为norm max_norm
norm_type (float, optional) – The p of the p-norm to compute for the max_norm option. Default 2.
为该max_norm选项计算的p范数的p 。默认2
scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default False.
如果给定，将按小批量中单词频率的倒数来缩放梯度。默认False。
sparse (bool, optional) – If True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.
如果为True，则梯度wrtweight矩阵将为稀疏张量。有关稀疏渐变的更多详细信息，请参见注释。

embed = torch.nn.Embedding(6,5,padding_idx = 3) # 6 words, each word has 5 dimensions, the index of 3 is set to 0
x = torch.LongTensor([[1,2,3],[ 1,3,5]])
print(embed(x))
tensor([[[-1.3751, -3.0215, -1.3973, -0.3610, 1.6760],
[0.1496, 0.3810, -1.4765, 0.7070, 0.0221],
[0.0000 , 0.0000, 0.0000, 0.0000, 0.0000]], the index is 3
[[-1.3751, -3.0215, -1.3973, -0.3610, 1.6760],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000], the index is 3
[1.2162,- 0.5878, 0.2110, -0.3564, -1.6092]]],
grad_fn=<EmbeddingBackward>)
print(embed(x).size())
torch.Size([2, 3, 5])

In the actual task:
Summarize the process in the general task:

First, convert the words into the form of a dictionary. In English, they are usually separated by spaces, so the dictionary index structure can be established directly. Similar to:
dic={ 'i':1, 'am':2, 'a':3, 'student':4, 'like':5, 'apple':6 } . If it is Chinese, first perform word segmentation.

Then use the sentence as a list to build an index structure for each sentence, list[[sentence1],[sentence2]]. Taking the index of the above dictionary, the final result is [[1,2,3,4],[1,5,6]]. Such sentences of varying lengths.

The next step is to perform padding operations. Since the tensor structure is of equal length, it is necessary to perform padding operation on the sentence above and then use nn.Embedding to initialize the word. The structure after padding may be [[1,2,3,4],[1,5,6,0]]. Among them 0 is used as padding. (Note: Since there must be a filling problem in the NMT task, there must be a third parameter when embedding, so that the value of some indexes is 0, which means the filling is meaningless)

2. torch.nn.Parameters

torch official website introduction

torch.nn.Parameter
A kind of Tensor that is to be considered a module parameter.
Parameters are Tensor subclasses, that have a very special property when used with Module s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as Parameter, these temporaries would get registered too.

Parameters
data(Tensor)-parameter tensor
requires_grad

The simple understanding is that torch.nn.Parameter inherits from a subclass of torch.Tensor, and its main function is to be used as a trainable parameter in nn.Module. The difference between it and torch.Tensor is that nn.Parameter is automatically considered as a trainable parameter of the module, that is, it is added to the parameter() iterator.
Note that the default value of the requires_grad attribute of the nn.Parameter object is True, which means it can be trained, which is the opposite of the default value of the torth.Tensor object.
In the nn.Module class, pytorch also uses nn.Parameter to initialize the parameters of each module.

More information:
https://www.jianshu.com/p/d8b77cc02410
https://blog.csdn.net/qq_28753373/article/details/104179354

3.torch.nn.Identity()

torch official website introduction

torch.nn.Identity
A placeholder identity operator that is argument-insensitive.

Parameters

args - any argument(unused)

kwargs - any keyword argument (unused)

This function creates an input module and does nothing. It is usually used in the input layer of a neural network.
Multiple inputs can play a very good role in neural network construction, which is equivalent to a container, keeping all the inputs.
For example, LSTM, because LSTM is a cyclic network, the last information needs to be saved, and nn.Identity() can retain the information well.

For example

>>> input = torch.randn(128,20)
>>> m = nn.Identity(54,unused_argument=0.1, unused_argument2=False)
>>> output = m(input)
>>> output == input
tensor([[True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True],
        ...,
        [True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True]])