NLP_pack_padded_sequence memo

Refer to this blog: https://blog.csdn.net/yizhen_nlp/article/details/108896988

Let's start with examples 

s1 = '佛棍真是不知羞耻。'
s2 = '他们最没有文化。'
s3 = '乌合之众,骗财骗色'
import jieba

s1_tokens  = jieba.lcut( s1 )
s2_tokens  = jieba.lcut( s2 )
s3_tokens  = jieba.lcut( s3 )

s1_tokens
['佛棍', '真是', '不知羞耻', '。']

s2_tokens
['他们', '最', '没有', '文化', '。']

s3_tokens
['乌合之众', ',', '骗财骗色']

Convert tokens to ids representation

s1_raw_ids = [1, 2, 3, 4]
s2_raw_ids = [5, 6, 7, 8, 4]
s3_raw_ids = [9, 10, 11]

Fill to the same length, max_seq_len is 6

s1_ids = [1, 2, 3, 4, 0, 0]
s2_ids = [5, 6, 7, 8, 4, 0]
s3_ids = [9, 10, 11, 0, 0, 0]

Treat these three sentences as a batch

x = torch.tensor([[1, 2, 3, 4, 0, 0],
                  [5, 6, 7, 8, 4, 0],
                  [9, 10, 11, 0, 0, 0]]) 

x.shape: (batch_size,  max_seq_len) -> (3,  6)

 

Send to Embedding and convert to word vector

vocab_size = 20 #词汇表大小
hidden_size = 3 #每个词转化为向量后的维度
embedding = nn.Embedding(vocab_size, hidden_size, padding_idx=0)
x_embed = embedding( x )
x_embed
tensor([[[ 0.2686,  0.3582, -0.3638],
         [-1.0293, -0.0710,  0.1930],
         [-0.5207,  0.1989,  0.7751],
         [ 0.1505, -1.0983,  0.7293],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]],

        [[-0.4994, -1.2767, -0.3047],
         [ 0.0995, -1.5530,  2.1707],
         [ 1.4377, -1.1575,  0.9451],
         [-0.2708, -0.5492,  0.4358],
         [ 0.1505, -1.0983,  0.7293],
         [ 0.0000,  0.0000,  0.0000]],

        [[-1.0363, -0.8285, -2.6332],
         [ 1.7135, -0.9450,  1.1375],
         [-0.1081,  2.3576,  0.1208],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]]], grad_fn=<EmbeddingBackward>)

x_embed.shape: (batch_size,  max_seq_len,  hidden_size) -> (3,  6,  3)

 

# len(s1_raw_ids), len(s2_raw_ids), len(s3_raw_ids)
seq_lens = torch.tensor([4, 5, 3])
# enforce_sorted=False, 自动将长度排序
x_packed = nn.utils.rnn.pack_padded_sequence(x_embed, seq_lens, batch_first=True, enforce_sorted=False)  

x_packed
PackedSequence(data=tensor([[-0.4994, -1.2767, -0.3047],
        [ 0.2686,  0.3582, -0.3638],
        [-1.0363, -0.8285, -2.6332],
        [ 0.0995, -1.5530,  2.1707],
        [-1.0293, -0.0710,  0.1930],
        [ 1.7135, -0.9450,  1.1375],
        [ 1.4377, -1.1575,  0.9451],
        [-0.5207,  0.1989,  0.7751],
        [-0.1081,  2.3576,  0.1208],
        [-0.2708, -0.5492,  0.4358],
        [ 0.1505, -1.0983,  0.7293],
        [ 0.1505, -1.0983,  0.7293]], 

grad_fn=<PackPaddedSequenceBackward>), 
batch_sizes=tensor([3, 3, 3, 2, 1]), 
sorted_indices=tensor([1, 0, 2]),
unsorted_indices=tensor([1, 0, 2]))

 This process first sorts the samples of x_embed according to the effective seq length 5->4->3,

That is, the original sample with subscript 1 has the longest effective length, followed by the sample with subscript 0, and the last is the sample with subscript 2, that is, sorted_indices=tensor([1, 0, 2]) in x_packed

[[-0.4994, -1.2767, -0.3047],  [ 0.0995, -1.5530, 2.1707],  [ 1.4377, -1.1575, 0.9451],  [-0.2708, -0.5492, 0.4358],  [ 0.1505, -1.0983, 0.7293], [ 0.0000, 0.0000, 0.0000]]

[[ 0.2686, 0.3582, -0.3638],  [-1.0293, -0.0710, 0.1930],  [-0.5207, 0.1989, 0.7751],  [ 0.1505, -1.0983, 0.7293],  [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]

[[-1.0363, -0.8285, -2.6332],  [ 1.7135, -0.9450, 1.1375],  [-0.1081, 2.3576, 0.1208],  [ 0.0000, 0.0000, 0.0000],  [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]

Then compress by column, discarding the padding that is 0,

You get batch_sizes = tensor([3, 3, 3, 2, 1]) in x_packed above

 

Let's try to restore it back with pad_packed_sequence,

x_recover_embed, _ = nn.utils.rnn.pad_packed_sequence(x_packed, batch_first=True, total_length=x_embed.size(1))

x_recover_embed 
tensor([[[ 0.2686,  0.3582, -0.3638],
         [-1.0293, -0.0710,  0.1930],
         [-0.5207,  0.1989,  0.7751],
         [ 0.1505, -1.0983,  0.7293],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]],

        [[-0.4994, -1.2767, -0.3047],
         [ 0.0995, -1.5530,  2.1707],
         [ 1.4377, -1.1575,  0.9451],
         [-0.2708, -0.5492,  0.4358],
         [ 0.1505, -1.0983,  0.7293],
         [ 0.0000,  0.0000,  0.0000]],

        [[-1.0363, -0.8285, -2.6332],
         [ 1.7135, -0.9450,  1.1375],
         [-0.1081,  2.3576,  0.1208],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]]], grad_fn=<IndexSelectBackward>)

Indeed restored back to x_embed

 

Next, send x_packed into rnn

# (input_dim, output_dim, ...)
rnn = nn.GRU(3, 2, bidirectional=False, batch_first=True)
rnn.flatten_parameters()
output, hidden = rnn(x_packed)
hidden.size() # torch.Size([1, 3, 2]) (num_layers * num_directions, batch, hidden_size)

output is also a PackedSequence object, 

output represents the feature output h_t of the last layer of GRU, shape: (batch, seq_len, num_dir * hidden_size)

PackedSequence(data=tensor([[-0.4909, -0.1294],
        [-0.1647, -0.1724],
        [ 0.0423, -0.4541],
        [-0.9293, -0.0407],
        [-0.4364,  0.0461],
        [-0.8112, -0.4613],
        [-0.9068, -0.2090],
        [-0.5980,  0.2338],
        [ 0.1795,  0.1355],
        [-0.7896, -0.2628],
        [-0.8343,  0.0887],
        [-0.8643, -0.3195]], 

grad_fn=<CatBackward>), 
batch_sizes=tensor([3, 3, 3, 2, 1]), 
sorted_indices=tensor([1, 0, 2]), 
unsorted_indices=tensor([1, 0, 2]))

 Restore back with pad_packed_sequence

out, _ = nn.utils.rnn.pad_packed_sequence(output, batch_first=True, total_length=x_embed.size(1))

out
tensor([[[-0.1647, -0.1724],
         [-0.4364,  0.0461],
         [-0.5980,  0.2338],
         [-0.8343,  0.0887],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[-0.4909, -0.1294],
         [-0.9293, -0.0407],
         [-0.9068, -0.2090],
         [-0.7896, -0.2628],
         [-0.8643, -0.3195],
         [ 0.0000,  0.0000]],

        [[ 0.0423, -0.4541],
         [-0.8112, -0.4613],
         [ 0.1795,  0.1355],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]]], grad_fn=<IndexSelectBackward>)

out.shape; (3, 6, 2) 

Guess you like

Origin blog.csdn.net/sdaujz/article/details/113246099