Proficient in Recommendation Algorithm 3: Refined Feature Crossover Architecture (Systematic Summary, Essential for Interviews)

About the Author:

Tencent T11 algorithm researcher. Master's degree from University of Chinese Academy of Sciences. He has worked in Alibaba and Tencent for many years and has extensive experience in search and recommendation algorithms . CSDN blog expert, 100 original articles. Published 15 patents, 6 of which have been authorized.

1 Overall architecture

After years of development, the precision model has basically moved from the era of linear models to the era of deep learning. Since AlexNet[1] won the ImageNet large-scale visual recognition competition in 2012, deep learning has swept across all major business fields. Deep learning models have the characteristics of strong fitting ability, good generalization ability, and end-to-end learning, and have been quickly applied in recommendation algorithms. It has achieved good business results in e-commerce, advertising, social networking, information flow and other business fields. In recent years, deep models of recommendation algorithms have been updated rapidly, and various methods have emerged one after another, showing a state of contention among a hundred schools of thought and thriving.

The fine ranking model can be divided into four key directions : feature intersection , user behavior sequence modeling , embedding representation learning and multi-task learning . This article mainly introduces feature intersection, and the other directions will be explained separately later. Feature crossover can help generate new features and improve model accuracy. Using deep learning for feature crossover can solve the shortcomings of manual feature crossover such as high threshold, heavy work tasks, and inability to exhaustively, and greatly improves the model expression ability. There are three main categories of deep learning feature intersection: DNN model, heterogeneous model and sequence model.

This article mainly introduces the meaning, basic paradigm and main difficulties of feature intersection. Next, we will introduce three classic deep learning models: Deep Crossing, FNN and PNN. Then we introduce five classic heterogeneous models: Wide & Deep, DeepFM, DCN, NFM and xDeepFM. The sequence model is introduced in User Behavior Sequence Modeling. The knowledge framework of feature intersection is shown in Figure 1.

Figure 1 Feature cross-knowledge framework

The significance of feature crossover

Feature crossover is of great significance and has always been the core optimization direction of recommendation systems. It has the effect of improving model accuracy, generalization ability and nonlinear ability. The combined feature signal is very strong. For example, an 18-year-old male user has a high probability of liking shooting games, while the probability of female users and older male users is relatively low. Feature intersection is helpful to capture the co-occurrence relationship between features, thereby improving model accuracy. At the same time, new features and even unknown feature combinations can be generated through feature crossover, which is beneficial to improving the model's generalization ability. Finally, the real world is highly nonlinear, and the nonlinear capabilities of the model can be improved through feature intersection methods such as inner product and outer product.

After deep learning is applied to feature intersection fields, it greatly improves the performance of recommendation systems. Compared with manual construction of intersection features and FM second-order automatic feature intersection, its main advantages are:

  1. Automatic feature crossover : Through Deep Neural Network (DNN), the crossover between any features can be automatically performed, thereby freeing manpower from arduous manual crossover. However, it should be noted that for feature combinations with strong signals, crossover features are still manually constructed to avoid poor automatic crossover effects.
  2. High-order feature crossover : FM can only perform second-order feature crossover, and third-order or higher-order feature crossover will lead to an explosion of calculations. Deep learning has the advantage of strong fitting ability through multi-layer neural networks and can achieve high-order feature crossover. However, it should be noted that the full connection of neural networks is essentially linear calculation, and it is difficult to construct explicit crossover operations such as inner products. Therefore, the high-order feature crossover of ordinary depth models is implicit. xDeepFM achieves explicit high-order feature crossover through a unique design.
  3. Mining hidden and unknown information between features : By mining effective feature combinations through domain experts, it is easy to overlook a lot of hidden information, such as "beer and diapers". In addition, manual cross-over is difficult to discover feature combinations that have never appeared before. Deep learning feature crossover can avoid these problems.
  4. Wide range of crossovers : Manual crossover is labor-intensive and it is impossible to exhaustively cover all feature combinations. Although FM can automatically cross, it is limited to the second order and can only cross two features at the same time. Deep learning feature crossover can theoretically achieve any order crossover of any feature, and its upper limit is higher.
  5. Various cross-over methods : Through heterogeneous networks, deep learning models can be equipped with explicit low-order, implicit high-order, and explicit high-order feature cross-over capabilities at the same time. The FM branch of the xDeepFM model can achieve explicit low-order crossover, the DNN branch can achieve implicit high-order crossover, and the CIN branch can achieve explicit high-order crossover. Through diversified cross-over methods, model expression capabilities and effects can be improved.

3 Basic paradigm of feature crossover

Feature crossover is divided into two categories: manual crossover and automatic crossover . In the era of early recommendation systems such as logistic regression, it was important to leverage expert experience to manually construct large-scale cross-features. Manual crossover is highly interpretable and can achieve second-order or even third-order feature crossover if there is sufficient manpower. It usually requires strong business background knowledge and data analysis capabilities to mine effective feature combinations. Currently, for feature combinations with strong signals, cross features are still constructed in advance and then input into the deep model.

Automatic crossover is divided into two categories: second-order feature crossover represented by FM and deep learning crossover. FM uses the inner product operation to achieve explicit crossover of second-order features. But if you want to spread to the third level and above, you will face the problem of computational explosion. Deep learning uses multi-layer fully connected networks to achieve high-order feature crossover, but due to the lack of explicit crossover operations such as inner product and outer product, the crossover is implicit.

There are three typical architectures for deep learning crossover: DNN class , heterogeneous model class and sequence model class . The DNN class introduces multi-layer neural networks into recommendation systems and successfully realizes automatic high-order feature crossover in the industry. Its implementation is relatively simple, just use the multi-layer fully connected network directly. Heterogeneous model classes usually have multiple branches. For example, in DeepFM, one branch is FM and the other branch is DNN. Finally, each branch is merged through a layer of linear connection. Heterogeneous models take into account the respective advantages of linear models and deep models, thus having both low-order crossover and high-order crossover capabilities, as well as memory and generalization capabilities. The sequence model is mainly used to model the sequence characteristics of user behavior and achieve the intersection of historical behavior and candidate items. There are mainly methods such as attention pooling, GRU sequence model and Transformer model, which will be explained in detail in the next chapter.

Feature crossover operators can be divided into two categories: full connection and inner product (or outer product) . The fully connected operation uses an additive operation to fit the "OR" relationship between features, so it is considered an implicit crossover . The inner product and outer product can fit the "and" relationship between features through multiplication operations, which can be considered as explicit crossover . Generally speaking, the "and" relationship is more conducive to capturing strong signals and has higher value. DeepFM and IPNN use inner product crossover, while OPNN, DCN and xDeepFM use outer product crossover.

Difficulties in Feature Crossover

Feature crossover can improve model accuracy, generalization ability and nonlinear ability, which is of great significance. The main difficulties are:

  1. Curse of dimensionality : Too many features and too high a crossover order will significantly increase the amount of calculation, leading to an increase in training time and storage costs. When the crossover order increases linearly, its feature combination space increases significantly in the form of factorial, resulting in a significant increase in the amount of calculation. Therefore, the curse of dimensionality has always been a major problem restricting the scale of feature intersection.
  2. Explicit high-order feature crossover : FM can achieve explicit feature crossover by inner product operation, but due to the disaster of dimensionality problem, it can only stay at the second order. DNN can achieve high-order feature crossover through multi-layer fully connected networks, but its crossover is implicit. How to obtain explicit high-order feature crossover capabilities has always been a big problem. xDeepFM truly realizes this capability through a unique CIN network unit design and the use of outer product operations. However, it is also limited by the disaster of dimensionality problem, and its feature intersection can generally only reach the fourth order. In addition, due to the relatively large amount of calculation, there are certain challenges in model training and deployment.

5 References

[1] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

Welcome to follow this series of articles.

Xie Yangyi: Proficient in recommendation algorithms 1: Why recommendation systems are needed (series of articles, recommended attention) 6 Agree · 6 Comments Article editor icon-default.png?t=N7T8https://zhuanlan.zhihu.com/p/675300656

Xie Yangyi: Recommended algorithm architecture 1: Recall (serial series, recommended to pay attention) 28 Agree · 5 Comments Editor icon-default.png?t=N7T8https://zhuanlan.zhihu.com/p/672909266

Xie Yangyi: Recommended algorithm architecture 2: Rough arrangement (systematic summary) 20 Agree · 6 Comments Editor icon-default.png?t=N7T8https://zhuanlan.zhihu.com/p/673414952

Xie Yangyi: Recommended algorithm architecture 3: Fine layout (10,000-word long article) 36 Agree · 6 Comments Editor icon-default.png?t=N7T8https://zhuanlan.zhihu.com/p/673932082

Xie Yangyi: Recommended algorithm architecture 4: Rearrangement (required for interviews) 43 Agree · 7 Comments Editor icon-default.png?t=N7T8https://zhuanlan.zhihu.com/p/674870882

Guess you like

Origin blog.csdn.net/u013510838/article/details/135350439
Recommended