How Embeding understand?

Reference:
http://www.sohu.com/a/206922947_390227

https://www.jianshu.com/p/0bb00eed9c63

https://www.baidu.com/link?url=CwDMHi72fOR8BzSlKAR0_01oYq-Jn79tNdrWrISguElN1w4Ng9DBZhihxCNjrWUBavktHOALF41rzvar191r4SlbKHO_EgiY_dmSYpDoq5C&wd=&eqid=c0fe574e00063f08000000035d11e9c6

https://www.jianshu.com/p/2a76b7d3126b

https://www.baidu.com/link?url=XI4NojXLflTT49Am0pQmaWXoPfqvBqdB1K8nkt6sFX1LRqsVwDyedyyN9vOH76GXquBBTfW7b2DfzYumTwYjaRBl87APzOD0u_YCeu4zWGW&wd=&eqid=c0fe574e00063f08000000035d11e9c6

https://blog.csdn.net/k284213498/article/details/83474972

Embedding (Embedding)

For example, you want to have three levels of input variables represented as two-dimensional data. Embedded layer, differential from the bottom of the engine (the underlaying automatic differentiation engines, e.g. Tensorflow or PyTorch) having three levels of input data is reduced to two-dimensional data.

Embedded data

Enter the required data is represented by an index. This label can be easily encoded. This is the input of your embedded layer.
Here's a simple example, using the embedded layer keras, click the link to view the details: https: //github.com/krishnakalyan3/FastAI_Practice/blob/master/notebooks/RecSys.ipynb

Initially the weights are randomly initialized, they use the stochastic gradient descent be optimized to obtain good data expressed in two-dimensional space. When you can say that when we have 100 levels, and you want to get this representation of data in 50 dimensions, this is a very useful idea.
Example:
Original data:

Tag data:

One-hot encoding:

Embedded data:

2. understanding of embedding the principles of machine learning and related API's tensorflow

embedding algorithm is mainly used to treat sparse features used in NLP, recommendations, advertising and so on. So word2vec embbeding just one application of ideas, but not all.

Original Address: https: //gshtime.github.io/2018/06/01/tensorflow-embedding-lookup-sparse/
Code Address: [email protected]: gshtime / tensorflow-api.git

embedding principle

Common feature reduction methods are PCA, SVD and so on.
The main purpose is the embedding of the (sparse) wherein dimensionality reduction, which dimensionality reduction is analogous to the way a fully connected layer (no activation function) calculated by the weight matrix layer to reduce weight embedding dimension.
Assumptions:

  • feature_num: Original characteristic number
  • embedding_size: after embedding characteristic number
  • [Feature_num, embedding_size] weight matrix shape
  • [m, feature_num] 输入矩阵shape,m为样本数
  • [m, embedding_size] 输出矩阵shape,m为样本数

从id(索引)找到对应的 One-hot encoding ,然后红色的weight就直接对应了输出节点的值(注意这里没有 activation function),也就是对应的embedding向量。

3.Word Embedding的发展和原理简介
https://www.jianshu.com/p/2a76b7d3126b

Guess you like

Origin www.cnblogs.com/Ann21/p/11084975.html