FNN - use matrix decomposition to complete the Embedding layer initialization

FNN - use the hidden vector of FM to complete the initialization of the Embedding layer (use the FM model to improve the Embedding layer)

Problems in the Embedding layer:

1. Huge parameters. The input layer and Embedding neurons are fully connected, that is, there is a parameter w on each connected edge. If one-hot encoding is used for input, the number of w can be huge.

2. The convergence speed is slow. During stochastic gradient descent, only the Embedding layer weights connected to non-zero features will be updated.

When starting the model for the first time, w needs to be initialized. Compared with random, the FNN algorithm uses the trained FM model to initialize the weight between the Embedding layer and the input layer (valuable prior information has been introduced) .

The FM model is an improvement on POLY2. The n*n weight matrix is ​​optimized into an n*k matrix, and each feature can be represented by a k-dimensional hidden vector.

The idea of ​​the FNN model is to use this k-dimensional vector as the feature in the feature domain (the feature domain is a collection of features, such as the feature domain of gender contains two features of male and female) and the weight of the connection with the embedding layer neurons.

Since the hidden vector is used as the initial weight, the dimension of the hidden vector must be the same as the number of embedding neurons in the domain.

                      

As shown in the figure above, each feature in the feature domain can be represented by a 3-dimensional hidden vector, and the embedding neurons in this domain can only be 3, and the weights on the fully connected edges are the element values ​​in the 3-dimensional vector.                         

Guess you like

Origin blog.csdn.net/qq_42018521/article/details/124911396