Papers Link: https://arxiv.org/pdf/1703.04247.pdf
FM Principle Reference:
Factorization Machines with libFM paper read https://www.cnblogs.com/yaoyaohust/p/10225055.html
GBDT, FM, FFM is derived https://www.cnblogs.com/yaoyaohust/p/7865379.html
Category type one-hot encoding characteristics, continuous characteristics represented directly or after discrete one-hot encoding.
The core idea is to take the output of the FM model weights as embedding cross-term use, FM and Deep components shared this embedding.
So do not pre-training (because the overall training), do not feature project (because FM), while the lower-order and higher-order interaction terms (as FM and NN).
评估:AUC,LogLoss(cross entropy)
Rapid training
Activation function: relu, tanh more common than sigmoid; and relu better than tanh (because of reduced sparsity)
Dropout: 0.6-0.9
Neurons per layer: 200-400
Optimal Hidden layer: 3
network shape: constant (width, "quite satisfactory")