I think this picture is enough, in fact tf.nn.embedding_lookup role is to find the vector under're looking for embedding data in the corresponding row.
tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None, validate_indices=True, max_norm=None)
Official document location , where, given our params, by:
1.tf.get_variable("item_emb_w", [self.item_count, self.embedding_size])
the like manner as in Production obey [0,1] uniform distribution or standard profile
2. Thetf.convert_to_tensor
conversion of our existing array
is then, we are looking for IDS params the corresponding position.
for example:
import numpy as np
import tensorflow as tf
data = np.array([[[2],[1]],[[3],[4]],[[6],[7]]])
data = tf.convert_to_tensor(data)
lk = [[0,1],[1,0],[0,0]]
lookup_data = tf.nn.embedding_lookup(data,lk)
init = tf.global_variables_initializer()
Let us look at the data corresponding to the different dimensions:
In [76]: data.shape
Out[76]: (3, 2, 1)
In [77]: np.array(lk).shape
Out[77]: (3, 2)
In [78]: lookup_data
Out[78]: <tf.Tensor 'embedding_lookup_8:0' shape=(3, 2, 2, 1) dtype=int64>
This is how to do it? The key part here, look:
The value lk, vector looking at the corresponding embedding data in the index to find splicing. Always look (lk) dimension portion + embedding (data) in addition to the first-dimensional dimension spliced portion. Obviously, we can obtain, LK is a value which must be less than or equal to the maximum dimension of embedding (data) minus one .
The above result is:
In [79]: data
Out[79]:
array([[[2],
[1]],
[[3],
[4]],
[[6],
[7]]])
In [80]: lk
Out[80]: [[0, 1], [1, 0], [0, 0]]
# lk[0]也就是[0,1]对应着下面sess.run(lookup_data)的结果恰好是把data中的[[2],[1]],[[3],[4]]
In [81]: sess.run(lookup_data)
Out[81]:
array([[[[2],
[1]],
[[3],
[4]]],
[[[3],
[4]],
[[2],
[1]]],
[[[2],
[1]],
[[2],
[1]]]])
Finally, partition_strategy for when len (params)> 1, params element is divided can not aliquot, then the former (max_id + 1)% len ( params) multisection a id.
When partition_strategy = 'mod' time, 13 ids is divided into 5 partitions: [[0, 5, 10], [1, 6, 11], [2, 7, 12], [3, 8], [4, 9]], which is in accordance with the data column is mapped, and then to look_up operation.
When partition_strategy = 'div' time, 13 ids is divided into 5 partitions: [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10], [ 11, 12]], i.e. the data are successively sorted according to standard procedures, and then to look_up operation.
Original link tf.nn.embedding_lookup record - Jane books