torch记录：torch.nn模块

Recurrent layers

class torch.nn.RNN(*args, **kwargs)

参数：
input_size – 输入x的特征数量。
hidden_size – 隐层的特征数量。
num_layers – RNN的层数。
bidirectional – 如果True，将会变成一个双向RNN，默认为False。

RNN的输入： (input, h_0)
- input (seq_len, batch, input_size): 保存输入序列特征的tensor。
h_0 (num_layers * num_directions, batch, hidden_size): 保存着初始隐状态的tensor

RNN的输出： (output, h_n)

output (seq_len, batch, hidden_size * num_directions): 保存着RNN最后一层的输出特征。
h_n (num_layers * num_directions, batch, hidden_size): 保存着最后一个时刻隐状态。

例子：


#输入x的长度是10,隐层的长度是20,RNN的层数是2层
rnn = nn.RNN(10, 20, 2)
# (seq_len, batch, input_size)
input = torch.randn(5, 3, 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)
output, hn = rnn(input, h0)
print(output.shape) # (seq_len, batch, hidden_size * num_directions)
print(hn.shape)    # (num_layers * num_directions, batch, hidden_size)

torch.Size([5, 3, 20])
torch.Size([2, 3, 20])

同理：

class torch.nn.GRU(*args, **kwargs)
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')[source]

另一类：

class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')

Linear layers

class torch.nn.Linear(in_features, out_features, bias=True)
Applies a linear transformation to the incoming data: y=xA^T+b

例子：

# 三维特征转化为2维特征
m = nn.Linear(3, 2)
input = torch.randn(10, 3)
output = m(input)
print(output.size())


torch.Size([10, 2])

Dropout layers

class torch.nn.Dropout(p=0.5, inplace=False)

参数：

p - 将元素置0的概率。默认值：0.5
in-place - 若设置为True，会在原地执行操作。默认值：False

形状：

输入：任意。输入可以为任意形状。
输出：相同。输出和输入形状相同。

例子：

m = nn.Dropout(p=0.5)
input = autograd.Variable(torch.randn(2, 2))
output = m(input)
output

tensor([[-0.0000, -2.9296],
        [ 0.0924,  0.0000]])

Sparse layers

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, _weight=None)[s

参数：

num_embeddings (int) - 嵌入字典的大小
embedding_dim (int) - 每个嵌入向量的大小
padding_idx (int, optional) - 如果提供的话，输出遇到此下标时用零填充
max_norm (float, optional) - 如果提供的话，会重新归一化词嵌入，使它们的范数小于提供的值
norm_type (float, optional) - 对于max_norm选项计算p范数时的p
scale_grad_by_freq (boolean, optional) - 如果提供的话，会根据字典中单词频率缩放梯度

变量：

weight (Tensor) -形状为(num_embeddings, embedding_dim)的模块中可学习的权值
形状：

输入： LongTensor (N, W), N = mini-batch, W = 每个mini-batch中提取的下标数
输出： (N, W, embedding_dim)

例子：

from torch.autograd import Variable
# an Embedding module containing 10 tensors of size 3
embedding = nn.Embedding(10, 3)
# a batch of 2 samples of 4 indices each
input = Variable(torch.LongTensor([[1,2,4,5],[5,4,2,1]]))
embedding(input)

tensor([[[-0.4031,  1.8008,  1.4954],
         [ 0.3768, -0.2439,  0.9262],
         [ 0.8444, -0.1265,  2.0801],
         [ 1.0576, -0.9705, -0.1841]],

        [[ 1.0576, -0.9705, -0.1841],
         [ 0.8444, -0.1265,  2.0801],
         [ 0.3768, -0.2439,  0.9262],
         [-0.4031,  1.8008,  1.4954]]])

embedding.weight


Parameter containing:
tensor([[-0.6084,  0.0402, -1.5447],
        [-0.4031,  1.8008,  1.4954],
        [ 0.3768, -0.2439,  0.9262],
        [ 0.4351, -1.6146,  0.7603],
        [ 0.8444, -0.1265,  2.0801],
        [ 1.0576, -0.9705, -0.1841],
        [ 0.6502, -0.1189,  0.0794],
        [-0.9843, -0.1582, -0.0912],
        [ 0.1690, -0.0980, -0.1338],
        [-0.9448, -1.9642, -0.1723]])

example with padding_idx:

# example with padding_idx
embedding = nn.Embedding(10, 3, padding_idx= 1)
input = Variable(torch.LongTensor([[0,1,0,5]]))
embedding(input)

tensor([[[-1.1790,  1.2073, -1.0174],
         [ 0.0000,  0.0000,  0.0000],
         [-1.1790,  1.2073, -1.0174],
         [-0.2278,  1.1332, -0.2259]]])