Statistical features of feature processing

Statistical features of feature processing

Features used in previous Kaggle
/Tianchi competitions, Tmall / JD.com ranking and recommendation business lines ), the number of consecutive login days of the user exceeds the average ( indicating the user's stickiness to the product ) 2. Quantile line : the quantile line of the commodity price. ( For example, 20%, indicating that 20% of people will not buy anything below this price ). 3. Sequence type : in which position. 4. Proportional type : In e-commerce, the ratio of good/medium/bad reviews of a product on an e-commerce platform ### Feature processing example: ##### Data description: The data mainly includes two parts. The first part is the mobile behavior data (D) of 10 million users on the product collection , including the following fields: Examples are: 141278390, 282725298, 1,95jnuqm, 5027, 2014-11-18 08 In these fields, the behavior_type field and the time field It contains the largest amount of information, and the user_geohash field is basically unusable due to too many missing values. The second part is a subset of goods










(P), contains the following fields:

Examples are:
117151719,96ulbnj,7350
The training data contains the mobile behavior data (D) of a certain number of users sampled within a month (11.18~12.18), and the scoring data are these User purchase data for a subset of items (P) on a day after this month (12.19). Competitors use the training data to build a recommendation model and output predictions of users’ purchase behavior for a subset of items in the next day.

Data processing example

(1) The items in the shopping cart of the previous day are likely to be purchased the next day => Rule
(2) Eliminate those who have never bought anything in 30 days => Data cleaning
(3) Add N items to the cart and only buy them One piece, the rest will not buy => Rule
(4) Purchase conversion rate of the shopping cart ( some people buy it after adding the cart. Directly push the user to recommend the addition of the cart ) => User dimension statistical characteristics
( 5) Commodity popularity ( hot things that the public likes: generally sales, continuous data ) => Commodity dimension features
(6) The total number of different item clicks/collections/shopping carts/purchases ( 4 continuous values) ) => product dimension statistical features
(7) different item clicks/collections/shopping cart/purchase average count per user ( 4 consecutive values, the number of times each user has the above behavior for the specific product they interact with ) => User dimension statistical features
(8) Popular brands/products ( for a certain item, how many people clicked today - how many people clicked yesterday, if the difference becomes larger, it means the item has become popular ) => product dimension statistical features ( Difference type)
(9) The ratio of the number of behaviors to the average number of behaviors on the most recent 1/2/3/7 days ( some users prefer points, but he does not like to buy; some users have very few points, he points Basic will buy ) => user dimension statistical characteristics (proportional)
(10) Sorting of products in categories (for example, sorting of iphone8 in mobile phone categories, by popularity: clicks, purchases, shelf time ) => product dimension statistical characteristics (order type)
(11) Purchase conversion rate of products ( The ratio of product display and purchase times, some products are displayed many times but no one buys, and some products are displayed, many people buy ) => product dimension statistical characteristics (proportional type)
open a time window, the time may be the previous week, at most A month, or a quarter
(12) The time from the most recent interaction to the present => time type
(13) The number of days of total interaction ( how much the user is related to the APP ) => time type
(14) The user's previous day The latest interactive behavior time ( judging the user's habits, such as finding that a user likes shopping in the middle of the night ) => time type
(15) The time when the user purchased the product ( the average, earliest, and latest time of the user's purchase behavior ) => time type

Model Feature Combinations

  1. Splicing: Simple combination features. For example, mining users' preferences for a certain type, and splicing users and types. Positive and negative weights, representing likes or dislikes of a certain genre.
      - user_id&&category: 10001&&Women's Skirt10002&&Men's Denim-
      user_id&&style: 10001&&Lace 10002&&Cotton  
  2. Model feature combination:
      - Use GBDT to generate feature combination paths
      - Combined features and original features are put into LR training
  1. April Machine Learning Algorithm Class - Feature Engineering: https://blog.csdn.net/joycewyj/article/details/51647036
  2. Feature processing and selection instance analysis:
    https://blog.csdn.net/han_xiaoyang/article/details/50481967
  3. Feature Engineering Notes: https://blog.csdn.net/joycewyj/article/details/51647036

Reprinted from: https://blog.csdn.net/fisherming/article/details/79925574

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325388572&siteId=291194637