Refer to this blog: https://blog.csdn.net/yizhen_nlp/article/details/108896988
Let's start with examples
s1 = '佛棍真是不知羞耻。'
s2 = '他们最没有文化。'
s3 = '乌合之众,骗财骗色'
import jieba
s1_tokens = jieba.lcut( s1 )
s2_tokens = jieba.lcut( s2 )
s3_tokens = jieba.lcut( s3 )
s1_tokens
['佛棍', '真是', '不知羞耻', '。']
s2_tokens
['他们', '最', '没有', '文化', '。']
s3_tokens
['乌合之众', ',', '骗财骗色']
Convert tokens to ids representation
s1_raw_ids = [1, 2, 3, 4]
s2_raw_ids = [5, 6, 7, 8, 4]
s3_raw_ids = [9, 10, 11]
Fill to the same length, max_seq_len is 6
s1_ids = [1, 2, 3, 4, 0, 0]
s2_ids = [5, 6, 7, 8, 4, 0]
s3_ids = [9, 10, 11, 0, 0, 0]
Treat these three sentences as a batch
x = torch.tensor([[1, 2, 3, 4, 0, 0],
[5, 6, 7, 8, 4, 0],
[9, 10, 11, 0, 0, 0]])
x.shape: (batch_size, max_seq_len) -> (3, 6)
Send to Embedding and convert to word vector
vocab_size = 20 #词汇表大小
hidden_size = 3 #每个词转化为向量后的维度
embedding = nn.Embedding(vocab_size, hidden_size, padding_idx=0)
x_embed = embedding( x )
x_embed
tensor([[[ 0.2686, 0.3582, -0.3638], [-1.0293, -0.0710, 0.1930], [-0.5207, 0.1989, 0.7751], [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[-0.4994, -1.2767, -0.3047], [ 0.0995, -1.5530, 2.1707], [ 1.4377, -1.1575, 0.9451], [-0.2708, -0.5492, 0.4358], [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000]], [[-1.0363, -0.8285, -2.6332], [ 1.7135, -0.9450, 1.1375], [-0.1081, 2.3576, 0.1208], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]], grad_fn=<EmbeddingBackward>)
x_embed.shape: (batch_size, max_seq_len, hidden_size) -> (3, 6, 3)
# len(s1_raw_ids), len(s2_raw_ids), len(s3_raw_ids)
seq_lens = torch.tensor([4, 5, 3])
# enforce_sorted=False, 自动将长度排序
x_packed = nn.utils.rnn.pack_padded_sequence(x_embed, seq_lens, batch_first=True, enforce_sorted=False)
x_packed
PackedSequence(data=tensor([[-0.4994, -1.2767, -0.3047], [ 0.2686, 0.3582, -0.3638], [-1.0363, -0.8285, -2.6332], [ 0.0995, -1.5530, 2.1707], [-1.0293, -0.0710, 0.1930], [ 1.7135, -0.9450, 1.1375], [ 1.4377, -1.1575, 0.9451], [-0.5207, 0.1989, 0.7751], [-0.1081, 2.3576, 0.1208], [-0.2708, -0.5492, 0.4358], [ 0.1505, -1.0983, 0.7293], [ 0.1505, -1.0983, 0.7293]], grad_fn=<PackPaddedSequenceBackward>), batch_sizes=tensor([3, 3, 3, 2, 1]), sorted_indices=tensor([1, 0, 2]), unsorted_indices=tensor([1, 0, 2]))
This process first sorts the samples of x_embed according to the effective seq length 5->4->3,
That is, the original sample with subscript 1 has the longest effective length, followed by the sample with subscript 0, and the last is the sample with subscript 2, that is, sorted_indices=tensor([1, 0, 2]) in x_packed
[[-0.4994, -1.2767, -0.3047], [ 0.0995, -1.5530, 2.1707], [ 1.4377, -1.1575, 0.9451], [-0.2708, -0.5492, 0.4358], [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000]]
[[ 0.2686, 0.3582, -0.3638], [-1.0293, -0.0710, 0.1930], [-0.5207, 0.1989, 0.7751], [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]
[[-1.0363, -0.8285, -2.6332], [ 1.7135, -0.9450, 1.1375], [-0.1081, 2.3576, 0.1208], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]
Then compress by column, discarding the padding that is 0,
You get batch_sizes = tensor([3, 3, 3, 2, 1]) in x_packed above
Let's try to restore it back with pad_packed_sequence,
x_recover_embed, _ = nn.utils.rnn.pad_packed_sequence(x_packed, batch_first=True, total_length=x_embed.size(1))
x_recover_embed
tensor([[[ 0.2686, 0.3582, -0.3638], [-1.0293, -0.0710, 0.1930], [-0.5207, 0.1989, 0.7751], [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[-0.4994, -1.2767, -0.3047], [ 0.0995, -1.5530, 2.1707], [ 1.4377, -1.1575, 0.9451], [-0.2708, -0.5492, 0.4358], [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000]], [[-1.0363, -0.8285, -2.6332], [ 1.7135, -0.9450, 1.1375], [-0.1081, 2.3576, 0.1208], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]], grad_fn=<IndexSelectBackward>)
Indeed restored back to x_embed
Next, send x_packed into rnn
# (input_dim, output_dim, ...)
rnn = nn.GRU(3, 2, bidirectional=False, batch_first=True)
rnn.flatten_parameters()
output, hidden = rnn(x_packed)
hidden.size() # torch.Size([1, 3, 2]) (num_layers * num_directions, batch, hidden_size)
output is also a PackedSequence object,
output represents the feature output h_t of the last layer of GRU, shape: (batch, seq_len, num_dir * hidden_size)
PackedSequence(data=tensor([[-0.4909, -0.1294], [-0.1647, -0.1724], [ 0.0423, -0.4541], [-0.9293, -0.0407], [-0.4364, 0.0461], [-0.8112, -0.4613], [-0.9068, -0.2090], [-0.5980, 0.2338], [ 0.1795, 0.1355], [-0.7896, -0.2628], [-0.8343, 0.0887], [-0.8643, -0.3195]], grad_fn=<CatBackward>), batch_sizes=tensor([3, 3, 3, 2, 1]), sorted_indices=tensor([1, 0, 2]), unsorted_indices=tensor([1, 0, 2]))
Restore back with pad_packed_sequence
out, _ = nn.utils.rnn.pad_packed_sequence(output, batch_first=True, total_length=x_embed.size(1))
out
tensor([[[-0.1647, -0.1724], [-0.4364, 0.0461], [-0.5980, 0.2338], [-0.8343, 0.0887], [ 0.0000, 0.0000], [ 0.0000, 0.0000]], [[-0.4909, -0.1294], [-0.9293, -0.0407], [-0.9068, -0.2090], [-0.7896, -0.2628], [-0.8643, -0.3195], [ 0.0000, 0.0000]], [[ 0.0423, -0.4541], [-0.8112, -0.4613], [ 0.1795, 0.1355], [ 0.0000, 0.0000], [ 0.0000, 0.0000], [ 0.0000, 0.0000]]], grad_fn=<IndexSelectBackward>)
out.shape; (3, 6, 2)