Feature Engineering of Recommendation System

To construct a good feature project, three problems need to be solved in sequence:

  • What are the basic principles that should be followed to construct feature engineering?
  • What are the commonly used feature categories?
  • How to perform feature processing on the basis of the original features to generate feature vectors that can be used for training and inferring yoga for the recommendation system?

1. Principles for constructing feature engineering of recommendation system

2. Common features in the recommendation system

2.1 User behavior data

2.2 User relationship data

2.3 Attribute and label data

2.4 Content data

2.5 Context data

2.6 Statistical data

2.7 Combined data

Combination features refer to new features generated by combining different features.
In the beginning recommendation system, recommendation models (such as logistic regression) often do not have the ability to combine features. Under the recent deep learning recommendation system, the combined features are not necessarily selected by manual combination or manual screening, but can be automatically processed by the model.

3. Commonly used feature processing methods

For recommendation systems, the input of the model is often a feature vector composed of numbers.

  • Continuity characteristics:
    Numerical characteristics such as user age, statistical characteristics, article release time, and movie playback duration. For the processing of such features, the most commonly used processing methods include normalization, discretization, and nonlinear functions.

    • Normalization: Unify the dimensions of each feature
    • Discretization: The process of dividing the original continuous value into buckets by determining the quantile, and finally forming a discrete value.
    • Add nonlinear function: directly transform the original feature through the nonlinear function, and then add the original feature and the transformed feature to the model for training.
  • Categorical features
    User's historical behavior data, attribute tag data, etc. are all categorical features, and their manifestation is often a category or an id. The most commonly used processing method is to convert one-hot encoding into a numeric vector. In the face of non-unique category selection in the same feature domain, multi-hot encoding can also be used.
    The above processing methods cause the feature vector dimension to be too large and the feature to be too sparse, which easily causes the model to underfit, and the number of weight parameters of the model is too large, which leads to the slow convergence of the model. Embedding technology can first encode category features into Embedding vectors, and then combine with other features to form the final feature vector.

4. Feature Engineering and Business Understanding

Today, when recommendation models and feature engineering tend to be integrated, feature engineering itself is part of the deep learning model.

Only by deeply understanding the operating mode of the business and understanding the thinking mode and behavior motivation of users in the business scenario can we accurately extract the most valuable features and build a successful deep learning model.

Reference: Deep Learning Recommendation System

Guess you like

Origin blog.csdn.net/weixin_44127327/article/details/112765408