tf.nn.embedding_lookup record

 

I think this picture is enough, in fact tf.nn.embedding_lookup role is to find the vector under're looking for embedding data in the corresponding row.

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None, validate_indices=True, max_norm=None)

Official document location , where, given our params, by:
1.tf.get_variable("item_emb_w", [self.item_count, self.embedding_size])the like manner as in Production obey [0,1] uniform distribution or standard profile
2. Thetf.convert_to_tensorconversion of our existing array
is then, we are looking for IDS params the corresponding position.

for example:

import numpy as np
import tensorflow as tf
data = np.array([[[2],[1]],[[3],[4]],[[6],[7]]])
data = tf.convert_to_tensor(data)
lk = [[0,1],[1,0],[0,0]]
lookup_data = tf.nn.embedding_lookup(data,lk)
init = tf.global_variables_initializer()

Let us look at the data corresponding to the different dimensions:

In [76]: data.shape
Out[76]: (3, 2, 1)
In [77]: np.array(lk).shape
Out[77]: (3, 2)
In [78]: lookup_data
Out[78]: <tf.Tensor 'embedding_lookup_8:0' shape=(3, 2, 2, 1) dtype=int64>

This is how to do it? The key part here, look:


The value lk, vector looking at the corresponding embedding data in the index to find splicing. Always look (lk) dimension portion + embedding (data) in addition to the first-dimensional dimension spliced portion. Obviously, we can obtain, LK is a value which must be less than or equal to the maximum dimension of embedding (data) minus one .

 

The above result is:

In [79]: data
Out[79]:
array([[[2],
        [1]],

       [[3],
        [4]],

       [[6],
        [7]]])

In [80]: lk
Out[80]: [[0, 1], [1, 0], [0, 0]]

# lk[0]也就是[0,1]对应着下面sess.run(lookup_data)的结果恰好是把data中的[[2],[1]],[[3],[4]]

In [81]: sess.run(lookup_data)
Out[81]:
array([[[[2],
         [1]],

        [[3],
         [4]]],


       [[[3],
         [4]],

        [[2],
         [1]]],


       [[[2],
         [1]],

        [[2],
         [1]]]])

Finally, partition_strategy for when len (params)> 1, params element is divided can not aliquot, then the former (max_id + 1)% len ( params) multisection a id.
When partition_strategy = 'mod' time, 13 ids is divided into 5 partitions: [[0, 5, 10], [1, 6, 11], [2, 7, 12], [3, 8], [4, 9]], which is in accordance with the data column is mapped, and then to look_up operation.
When partition_strategy = 'div' time, 13 ids is divided into 5 partitions: [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10], [ 11, 12]], i.e. the data are successively sorted according to standard procedures, and then to look_up operation.

 

Original link tf.nn.embedding_lookup record - Jane books
 

Published 69 original articles · won praise 28 · views 30000 +

Guess you like

Origin blog.csdn.net/qq_24852439/article/details/90405892