Tensorflow/Keras embedding embedding layer

Is a learnable layer used to learn the encoding of words or other index-labeled data (usually vector encoding)

Why is this layer needed? Ordinary one-hot encoding is difficult to represent the correlation between two words, but through trainable embedding

The layer can learn two word variable encodings, and if it is a related word, there is a greater correlation between the word vectors.

------

Hard:

keras.layers.embeddings.Embedding(input_dim, output_dim,

embeddings_initializer='uniform', embeddings_regularizer=None,

activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None)

 

input shape

2D tensor of shape (samples, sequence_length)

output shape

3D tensor of shape (samples, sequence_length, output_dim)

Official example:

【1】model = Sequential()

【2】model.add(Embedding(input_dim=1000,output_dim=64, input_length=10))

【3】input_array = np.random.randint(1000, size=(32, 10))

【4】model.compile('rmsprop', 'mse')

【5】output_array = model.predict(input_array)

【6】assert output_array.shape == (32, 10, 64)

explain:

[1] [2] Create a sequence model and add an embedding layer. 1000 is the longest encoding length, and the maximum subscript of the input data is +1. The maximum input data created in [3] is 999, so here input_dim is 1000

【3】Create input variable shape=(32,10) It can be considered that 32 is batch_size and 10 is the data length

[4] Compile the model

【5】Predict output model value

[6] The output shape is (32, 10, 64), where (32, 10) is the shape of the original input data, and 64 is the length of the encoded vector after embedding, that is, a code with a length of 64 is used to represent an original input data value

--- --- --- --- --- --- --- ---

tensoflow:

tf.nn.embedding_lookup has two parameters (tensor, id): tensor is the input tensor, and id is the index corresponding to the tensor.

tensor can be a training parameter or a fixed value, and id is the index value of the input variable, such as the index mark of a word;

use:

Example 1:

  1. c = np.random.random([100,10])  
  2. b = tf.nn.embedding_lookup(c, [1, 3])  
  3.   
  4. with tf.Session() as sess:  
  5.     sess.run(tf.initialize_all_variables())  
  6.     print sess.run(b)  
  7.     print c  

Explanation: c represents the coding matrix, the longest code words are 100, and the length of each code is 10;

The output of b is to take the first and third in the c encoding matrix, each with a length of 10

Example 2:

【1】input_ids = tf.placeholder(dtype=tf.int32, shape=[None])

【2】embedding = tf.Variable(np.identity(5, dtype=np.int32))

【3】input_embedding = tf.nn.embedding_lookup(embedding, input_ids)

【4】sess = tf.InteractiveSession()

【5】sess.run(tf.global_variables_initializer())

【6】print(embedding.eval())

【7】print(sess.run(input_embedding, feed_dict={input_ids:[1, 2, 3, 0, 3, 2, 1]}))

explain:

【1】Input variable placeholder, the length is variable. [2] The tensor variable of embedding defines the coding matrix, here is a diagonal matrix, the value on the diagonal of the matrix is ​​1, and the other is 0, which can be considered as a one-hot coding.

embedding = [[1 0 0 0 0]

[0 1 0 0 0]

[0 0 1 0 0]

[0 0 0 1 0]

[0 0 0 0 1]]

【3】Find the code by specifying input_ids.

【4】【5】Initialization. [6] Output coding matrix

【7】feed_dict input variable assignment, according to the ins output【3】op value, respectively is the encoding vector corresponding to the index [ 1 , 2 , 3 , 0 , 3 , 2 , 1 ].

---------------------------------

Notice:

The embedding layer is actually a layer that can be trained, and each encoding vector can be obtained through future learning

おすすめ

転載: blog.csdn.net/goddessblessme/article/details/79892613