Analysis of Commodity Recommendation System

c248a02af1cdd6f752a966fbfe75ed7d.gif

1. Overview

This article mainly analyzes the recommendation system, mainly introduces the definition of the recommendation system, the basic framework of the recommendation system, and briefly introduces the related methods and architecture of the design recommendation. It is suitable for some students who are interested in the recommendation system and those who have relevant foundations. My level is limited, and everyone is welcome to correct me.

2. Commodity recommendation system

2.1 Definition of recommender system

The recommendation system essentially solves the problem of information overload, helps users find items they are interested in, and deeply taps users' potential interests.

2.2 Recommended Architecture

In fact, the core process of the recommendation system is only recall, sorting, and rearrangement.

04859336018f259cc0f03b7283b3f73a.png

Request Process

When a user opens a page, the front-end will carry the user information (pin or uuid, etc.) to request the back-end interface (indirectly called through color), and when the back-end receives the request, it will generally first divide and obtain relevant policy configurations according to the user ID ( ab strategy), these strategies determine which interface of the recall module, sorting module and rearrangement module will be called next. The general recall module is divided into multiple recalls, and each recall is responsible for recalling multiple products, and sorting and rearranging are responsible for adjusting the order of these products. Finally, the appropriate product is selected and relevant information such as price and picture is supplemented and displayed to the user. Users will choose to click or not according to whether they are interested or not. These user-related behaviors will be reported to the data platform through logs, laying the foundation for subsequent effect analysis and recommendation of products using user behavior.

870a7ff637257f032682ee80b083d6bf.png

In fact, there are some questions I want to talk about:

Why adopt the funnel hierarchy of recall, sorting, and rearrangement?

455b1b3c95b58f301afe641dfad7af18.png

(1) In terms of performance

Final level: From the million-level commodity library, select the single-digit level commodities that users are interested in.

The online inference of complex sorting models is time-consuming, and it is necessary to strictly control the number of products entering the sorting model. need to be disassembled

(2) From the aspect of target

Recall module : The task of the recall module is to quickly screen out some candidate items from a large number of items, with the purpose of not missing items that users may like. The recall module usually adopts multi-way recall, using some simplified features or models.

Sorting module : The task of the sorting module is to sort accurately, and sort the candidate items screened by the recall module according to the user's historical behavior, interests, preferences and other information. Sorting modules usually use some complex models.

Re-ranking module : The task of the re-ranking module is to re-rank or adjust the results of the ranking module to further improve the accuracy and personalization of recommendations. Rearrangement modules usually use some simple but effective algorithms.

What is an ab experiment?

Reference: Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (google2010)

Only the online experiment can truly evaluate the pros and cons of the model, and the ab experiment can quickly verify the effect of the experiment and quickly iterate the model. Reduce the risk of launching new features.

ab algorithm: Hash (uuid+experiment id+creation timestamp)%100

Features: Shunt + Orthogonal

683a5a11642c14c1cd6786f8de6b423e.png

9ef50e2eb6b8e7a3497a4d08368b2a08.png

2.3 Recall

The existence of the recall layer is only for users to initially screen a batch of good products from the vast product pool. In order to balance the contradiction between the calculation speed and the recall rate (the proportion of positive samples to all positive samples), a multi-path recall strategy is adopted, and each recall strategy only considers a single feature or strategy.

2.3.1 Advantages and disadvantages of multi-channel recall

Multi-way recall: Use different strategies, features or simple models to recall a part of the candidate sets, and then mix the candidate sets together for sorting. The recall rate is high, the speed is fast, and the multi-way recall complements each other.

In the multi-channel recall, the truncated number K of each recall is a hyperparameter, which requires manual parameter adjustment and high cost; there are overlap problems and redundancy in the recall channels.

Whether there is a kind of recall that can replace multi-channel recall, vector recall came into being. For now, it is still based on vector recall and other recalls are supplemented.

2.3.2 Recall Classification

It is mainly divided into two categories: non-personalized recall and personalized recall. Non-personalized recalls are mainly for hot pushes, and the Matthew effect in the recommendation field is serious, with 20% of the products contributing 80% of the clicks. Personalized recall is mainly to discover products that users are interested in, focus on dealing with the differences of each user, increase the diversity of products, and maintain user stickiness.

non-personalized recall

(1) Popular recalls: Recall of products with high clicks, high likes, and high sales in the past 7 days

(2) New Product Recall: Recall of the latest products on the shelves

personalized recall

(1) Label recall, regional recall

  • Label recall: the category, brand, store recall, etc. that the user is interested in

  • Regional recall: Recall high-quality products in the region according to the user's region.

(2) cf recall

Collaborative filtering algorithm is based on user behavior data to mine user's behavior preference, so as to recommend items according to user's behavior preference, which is based on the behavior matrix (co-occurrence matrix) of users and items. User behavior generally includes browsing, likes, additional purchases, clicks, attention, sharing, etc.

Collaborative filtering is divided into three categories: user-based collaborative filtering (UCF), item-based collaborative filtering (ICF) and model-based collaborative filtering (hidden semantic model). Whether to recommend an item for a user, the user and the item must first be associated, and whether the point of association is another item or another user determines which type of collaborative filtering it belongs to. The hidden semantic model is based on the user behavior data to automatically cluster and mine the user's potential interest characteristics. Thus, users and items are associated through latent interest features.

Item-based Collaborative Filtering (ICF): To determine whether to recommend an item to a user, first infer the user's interest in the item based on the similarity between the item recorded in the user's historical behavior and the item, so as to determine whether we recommend the item. The entire collaborative filtering process is mainly divided into the following steps: calculating the similarity between items, calculating the user's interest in items, sorting and intercepting the results.

8cd9a1f7126774596a747f4b51a1cbad.png

Commodity similarity calculation:

There are mainly the following ways to measure similarity: angle cosine distance, Jaccard formula. Due to the diversity of representations of users or items, the calculation of these similarities is very flexible. We can use the behavior matrix of users and items to calculate similarity, and we can also construct vector representations of users and items based on user behavior, item attributes and contextual relations to calculate similarity.

Angle cosine distance formula: 

c8af8f845e2dbca8e04429f8cfff04ab.png

Jaccard formula J(A,B)=(|A⋂B|)/(|A⋃B|)


Commodity a
Commodity b Commodity c Commodity d
User A
1
0
0 1
User B 0
1 1
0
User C 1
0 1
1
User D 1
1 0
0

The cosine distance formula of the included angle calculates the similarity between commodities a and b:

Wab=(1*0+0*1+1*0+1*1)/(√(1^2+0^2+1^2+1^2 )*√(0^2+1^2+0^2+1^2 ))=1/√6

Spark implements ICF: https://zhuanlan.zhihu.com/p/413159725

Problem: cold start problem, long tail effect.

(3) Vector recall

Vectorized recall: By learning the low-dimensional vectorized representation of users and items, the recall is modeled as a neighbor search problem in the vector space, which effectively improves the generalization ability and diversity of the recall, and is the core recall channel of the recommendation engine.

Vector: Everything can be vectorized. Embedding is to use a low-dimensional dense vector to represent an object (word or commodity). The main function is to convert a sparse vector into a dense vector (the effect of dimensionality reduction). The representation here contains certain Deep meaning, so that it can express a part of the characteristics of the object, and the distance between the vectors reflects the similarity between the objects.

Vector recall step: offline training to generate vectors, online vector retrieval.

1. Offline training to generate vectors

word2vec: The originator of the word vector, consists of three layers of neural network: input layer, hidden layer, output layer, hidden layer has no activation function, and the output layer uses softmax to calculate the probability.

objective function

cb757cafddab6a43791e26bf1ac55692.png

Network structure:

082d1b4cd89aec75557ece2b2226bd19.png

In general: the input is a sequence of words, and the vector corresponding to each word can be obtained after model training. The application in the recommendation field is to input the user's click sequence, and obtain the vector of each product through model training.

Pros and cons: Simple and efficient, but only behavior sequences are considered, and other features are not considered.

Twin Towers Model:

Network structure: respectively called User Tower and Item Tower; User Tower receives user-side features as input, such as user id, gender, age, third-level categories of interest, user click sequence, user address, etc.; Item Tower accepts product-side features , such as product id, category id, price, order volume in the last three days, etc. Data training: (positive sample data, 1) (negative sample, 0) positive sample: clicked product, negative sample: global random product sample (or sample clicked by other users in the same batch)

980247d54d25941744548520bfb5fb48.png

Pros and Cons: Efficient, perfectly fit the recall feature, online request to get the user vector, retrieve and recall the item vector, high generalization; the user tower and the item tower are separated, and only interact at the end.

2. Online Vector Retrieval

Vector retrieval: It is an information retrieval method based on the Vector Space Model, which is used to quickly find the document vector most similar to the query vector in a large-scale text collection. It is widely used in information retrieval, recommendation system and text classification.

The process of vector retrieval is to calculate the similarity between vectors, and finally return the TopK vector with higher similarity, and there are many ways to calculate vector similarity. The methods for calculating vector similarity include Euclidean distance, inner product, and cosine distance. After normalization, the inner product is equivalent to the cosine similarity calculation formula.

The essence of vector retrieval is Approximate Neighbor Search (ANNS), which reduces the search range of the query vector as much as possible, thereby improving the query speed.

The vector retrieval algorithms currently used on a large scale in the industry can basically be divided into the following three categories:

  • Locality Sensitive Hashing (LSH)

  • Graph-based (HNSW)

  • Product based quantization

A brief introduction to LSH

The core idea of ​​the LSH algorithm is: after transforming two adjacent data points in the original data space through the same mapping or projection, the probability that these two data points are still adjacent in the new data space is very high, and they are not related. The probability that adjacent data points are mapped to the same bucket is very small.

Compared with brute force search to traverse all points in the data set, and using hash, we first find which bucket the query sample falls into. If the division of space is divided under the similarity measure we want, the query sample The nearest neighbor of will most likely fall in the bucket of the query sample, so we only need to traverse and compare in the current bucket, instead of traversing in all data sets. When the number of hash functions H is too large, the possibility of the query sample and its corresponding nearest neighbor falling into the same bucket will become very weak. To solve this problem, we can repeat this process L times (each time is different hash function), thereby increasing the recall rate of the nearest neighbor.

Case: Vector recall based on word2vec

9c418c983ee822c576299740990ea62e.png

2.4 Sorting

The apple of the eye for recommender systems

The sorting stage is divided into rough sorting and fine sorting. Rough sorting generally occurs when the data magnitude of recall results is relatively large.

Evolution

6a3f8bb4a2d9ec3eb8396c579a42284b.png

A brief introduction to Wide&Deep

Background : The memory effect of manual feature combination is good, but feature engineering is too labor-intensive, and feature combinations that have not appeared before cannot be memorized and cannot be generalized.

Purpose : To make the model take into account both generalization and memory ability (effective use of historical information and strong expressive ability)

(1) The memory ability model directly learns and uses the ability of the co-occurrence frequency of items or features in historical data to memorize the distribution characteristics of historical data. The simple model can easily find the features or combined features in the data that have a greater impact on the results, and adjust its weight to achieve memory for strong features

(2) The generalization ability of the model transfers the correlation of features, and the ability to explore the correlation between sparse or rare features and the final label. Even a very sparse feature vector input can get a stable and smooth recommendation probability. Examples of improved generalization: matrix factorization, neural networks

66ab1e2ffde964b0b8dd59113d05ede1.png

Based on both memory and generalization capabilities (result accuracy and scalability) the wide part focuses on model memory, quickly processing a large number of historical behavior features, the deep part focuses on model generalization, exploring new worlds, the correlation of model transfer features, and mining sparseness The ability to correlate even out-of-the-way rare features with the final label is powerfully expressive. Finally, the wide part and the deep part are combined to form a unified model.

The wide part is the basic linear model, expressed as y=W^T X+b The X feature part includes basic features and cross features. The cross feature is very important in the wide part, which can capture the interaction between features and play the role of adding nonlinearity.

The deep part is embeding layer + three-layer neural network (relu), feed-forward formula

147ab3d7a284b14fbfd0c0f75565721d.png

joint training

3ce3bb7350a1d4f60738d67a30f15bea.png

Pros and Cons: It has laid an important foundation for the development of recommendation/advertising/search ranking algorithms, and has leaped from traditional algorithms to deep learning algorithms, which is a milestone. Taking into account both memory and generalization capabilities, but the Wide side still needs to manually combine features.

Reference paper: Wide & Deep Learning for Recommender Systems

2.5 Rearrangement

Definition: Fine-tune the order of results after fine-tuning, on the one hand to achieve global optimization, on the other hand to meet business demands and improve user experience. For example, disperse strategy, strong insertion strategy, increase exposure, sensitive filtering

MMR algorithm

Achieving product diversity

Purpose: To ensure the diversity of recommendation results while ensuring the accuracy of recommendation results, in order to balance the diversity and relevance of recommendation results

Algorithm principles such as formulas

0e30257e80d3350f50db16df4eb2c9a8.png

D: product collection, Q: user, S: selected product collection, R\S: unselected product collection in R

 
  
def MMR(itemScoreDict, similarityMatrix, lambdaConstant=0.5, topN=20):
    #s 排序后列表 r 候选项
    s, r = [], list(itemScoreDict.keys())
    while len(r) > 0:
        score = 0
        selectOne = None
        # 遍历所有剩余项
        for i in r:
            firstPart = itemScoreDict[i]
            # 计算候选项与"已选项目"集合的最大相似度
            secondPart = 0
            for j in s:
                sim2 = similarityMatrix[i][j]
                if sim2 > second_part:
                    secondPart = sim2
            equationScore = lambdaConstant * (firstPart - (1 - lambdaConstant) * secondPart)
            if equationScore > score:
                score = equationScore
                selectOne = i
        if selectOne == None:
            selectOne = i
        # 添加新的候选项到结果集r,同时从s中删除
        r.remove(selectOne)
        s.append(selectOne)
    return (s, s[:topN])[topN > len(s)]

The meaning is to select an item that is most relevant to the user and least relevant to the selected item. Time complexity O(n2) can reduce the time complexity by limiting the number of choices

Engineering implementation: The correlation between users and items and the similarity between items are required as input. The correlation between users and items can be replaced by the results of the sorting model, and the similarity between items can be obtained through algorithms such as collaborative filtering. Commodity vector , to calculate the cosine distance. It can also be as simple as whether it is represented by the same third-level category or the same store.

3. Summary

That’s all for a brief chatter. I want everyone to understand the recommendation system, and introduce the entire recommendation architecture and the modules of the entire recommendation. Due to my limited level, I did not explain each module in detail. I hope that I can continue to study this field in my work, dig deeper into the details, and produce better things for everyone.

-end-

Guess you like

Origin blog.csdn.net/jdcdev_/article/details/132440006