[Transfer] In-depth analysis of the evolution history of JD.com's personalized recommendation system

?wx_fmt=jpeg&wxfrom=5&wx_lazy=1

This article is excerpted from "Decisive Battle 618: Exploring the Way to Win JD's Technology". The two authors were the heads and system architects of JD's recommendation system at the time.

In the field of e-commerce, the value of recommendation lies in mining users' potential purchase needs, shortening the distance between users and products, and improving users' shopping experience.

The evolution history recommended by JD.com is colorful. The recommendation of JD.com started in 2012, and the recommended products at that time were even based on rule matching. The entire recommended product line combination is like a loose primitive tribe, and tribes and tribes did not have any intersection of engineering and algorithms before. In 2013, the era of domestic big data arrived. On the one hand, if you do things that are not related to big data, you will appear to be not good enough. On the other hand, JD.com's business began to develop rapidly this year, so the traditional method has been unable to keep up with the business. developed, and the recommendation team specially designed a new recommendation system for this purpose.

With the rapid development of business and the advent of the mobile Internet, multi-screens (JD.com App, JD.com PC Mall, M Station, WeChat Mobile QQ, etc.) are interconnected, and the recommended types have gradually expanded from traditional product recommendations to other types of recommendations, such as activities. , Categories, Coupons, Floors, Entrance Maps, Articles, Lists, Goods, etc. There is a strong demand for personalized recommendation business. Based on big data and personalized recommendation algorithms, the effect of displaying different content to different users is realized.

To this end, the team upgraded the recommendation system again at the end of 2015. During 618 in 2016, personalized recommendations were brilliant, especially the "smart store" created by the team, which realized the personalized distribution of event venues, which not only brought a significant increase in GMV, but also greatly reduced labor costs and greatly increased traffic. Efficiency and user experience, so as to achieve a win-win situation for merchants and users, this product won the 2016 Group Excellent Product. In order to better support the recommendation business in various personalized scenarios, the recommendation system has been iteratively optimized and upgraded, and will develop in the direction of "intelligent recommendation on all screens" in the future.

Recommended Products

The whole process from the user generating the purchase intention, to the purchase decision, to the final order, at any node on the shopping link, the recommended product can help the user to make a decision to a certain extent.

Recommended product development process

The development process of recommended products has mainly gone through several stages (Figure 1), from simple association recommendation process to personalized recommendation, and gradually transitioning to scene intelligent recommendation. Transition from related and similar product recommendation to multi-feature, multi-dimensional, real-time user behavior, and comprehensive intelligent recommendation based on user scenarios.

?wx_fmt=png

Figure 1 Recommended product development process

Multi-screen multi-type product form

Multi-type mainly refers to the recommendation type covering multiple types, such as commodities, activities, categories, coupons, floors, entrance maps, articles, lists, good products, etc. In the era of mobile internet, multi-screen scenarios are very common, and integrating users' information on multiple screens can make personalized recommendations more accurate. The technology behind the multi-screen integration is to use the front-end burying point, user behavior triggers burying point events, and collects multi-screen behavior information through the click stream system. These behavioral data use the real-time streaming computing platform to calculate the user's interests and preferences, so as to reorder the recommendation results according to the user's interests and preferences, so as to achieve the effect of personalized recommendation. The JD multi-screen terminal is shown in Figure 2.

?wx_fmt=png

Figure 2 JD multi-screen terminal

 

 

Recommendation System Architecture

Overall business structure

The goal of the recommendation system is to describe the user's purchase intention through a full range of accurate data, recommend the products that the user is willing to buy, give the user the best experience, increase the conversion rate of orders, and enhance the user stickiness. The business architecture of the recommendation system is shown in Figure 3.

?wx_fmt=png

Figure 3 The business architecture of the recommendation system

  • system structure. Provide a unified HTTP recommendation service to the outside world, and serve the recommendation business of all terminals on JD.com.

  • Model service. A series of public personalized services developed to improve the effect of personalization. The user dimension includes user behavior services and user portrait services, the commodity dimension includes commodity portraits, the regional dimension includes community portraits, and the feature dimension includes characteristic services. Through these basic services, personalized recommendations are made simpler and more accurate.

  • machine learning. In the training phase of the algorithm model, try a variety of machine learning models, combine offline evaluation and online A/B, verify the effect of the algorithm model in different scenarios, and improve the conversion rate of recommendations.

  • data platform. Data is the source of recommendations, including data collection and data calculation. Although data is the bottom layer of the overall recommendation architecture, it is very important, because data is directly related to the healthy development and effect improvement of recommendation.

Personalized Recommendation Architecture

In the initial stage, the recommended products are relatively simple, and each recommended product is an independent service implementation. The new recommendation system is a systematic project, which relies on the organic combination of data, architecture, algorithms, and human-computer interaction. The goal of the new version of the recommendation system is to transform "a thousand people with one face" into "a thousand people with a thousand faces" through personalized data mining, machine learning and other technologies, improve user loyalty and user experience, and improve the quality and efficiency of users' shopping decisions; Improve the cross-selling capability of the website, shorten the user shopping path, and improve the traffic conversion rate (CVR). At present, the new version of the recommendation system supports multiple types of personalized recommendations, including products, stores, brands, events, coupons, floors, etc. The architecture of the new personalized recommendation system is shown in Figure 4.

?wx_fmt=png

Figure 4 Architecture of the new personalized recommendation system

Different colors in the architecture diagram of the personalized recommendation system represent different business processing scenarios: the data processing part (the bottom green module), including offline data preprocessing, machine learning model training, online real-time behavior access, and real-time feature calculation. The recommendation platform (blue module) mainly reflects the interaction between the service modules of the recommendation system when responding to user requests. Recommended system core modules:

  • Recommended gateway. The entry of the recommendation service, which is responsible for the legality check of the recommendation request, the request distribution, the online debugging, and the result of assembling the request response.

  • scheduling engine. It is responsible for the scheduling and traffic distribution of recommended services according to the strategy, mainly according to the experimental configuration strategy of the recommended products in the configuration center, and supports the distribution of users, random distribution and key parameters. Support custom buried points, collect real-time data; support emergency plan function, deal with emergencies, and take effect in seconds.

  • Recommendation engine. Responsible for the logic implementation of recommended online algorithms, mainly including recall, filtering, feature calculation, sorting, diversification and other processing processes.

  • Personalized basic services. At present, the main personalized basic services include user portrait, product portrait, user behavior, and forecasting services. User portraits include long-term interests, short-term interests, and real-time interests of users. Interests mainly include gender, brand preference, category preference, purchasing power level, self-operated preference, size and color preference, promotion sensitivity, family situation, etc. Product portraits mainly include product words, modifiers, brand words, quality scores, price levels, gender, age, labels, etc. User behavior mainly obtains the recent behavior of users, including users' searches, clicks, following, joining car purchases, placing orders, etc. The prediction service is mainly based on the user's historical behavior, using machine learning to train the model to adjust the weight of the recall candidate set.

  • Feature service platform. Responsible for providing characteristic data and characteristic calculation for personalized services. The characteristic service platform mainly performs effective declaration and management for characteristic data, thereby achieving the sharing of characteristic resources, and quickly supporting effective declaration, online, testing and A/ Comparison of experimental results of B.

Personalization technology (orange module), personalization is mainly reordered through features and algorithm training models to achieve the purpose of accurate recommendation. The feature service platform is mainly used to provide a large amount of multi-dimensional feature information. The recommended scene playback technology refers to the feedback of the user's real-time scene feature information to the recommendation ranking. Online-Learning and deep learning are both personalized large-scale feature calculations. Serve.

The main advantages of the personalized recommendation system are that it supports multi-type recommendation and multi-screen product forms, supports rapid iteration of algorithm model A/B experiments, supports decoupling of system architecture and algorithm, supports decoupling of storage resources and recommendation engine calculation, and supports prediction Decoupling of recall and recommendation engine calculation, support for custom tracking function; recommendation feature data service platform, support recommended scene playback.

data platform

JD.com has a huge number of users, a full range of products and a variety of promotional activities. It can accumulate data based on users' behavior records on the JD.com platform, such as browsing, adding shopping carts, following, searching, purchasing, commenting and other behavioral data, as well as commodity data. The accumulation of attribute data such as its own brand, category, description, price, etc., and the accumulation of data of resources such as activities and materials. This data is the basis for large-scale machine learning and a prerequisite for more precise personalized recommendations.

data collection

The user behavior data collection process is generally related to the user's operations on the JD platform (JD App, JD PC website, WeChat mobile QQ), which will trigger the click stream system (a platform system dedicated to collecting behavior data). After the clickstream system receives the request, it sends real-time messages (for real-time computing business consumption) and saves local logs (for offline model computing), and automatically extracts behavior logs to the big data platform center at regular intervals. Algorithms train models through machine learning in the data mart. These algorithm models are applied to recommendation services, which assist users in decision-making and further influence users’ shopping behaviors. Shopping behavior data is then sent to the clickstream to achieve a closed loop of data collection.

Offline computing

At present, the computing content involved in the offline computing platform mainly includes offline models, offline features, user portraits, commodity portraits, and user behaviors. The offline computing mainly runs MapReduce on Hadoop, and some of them are calculated on the Spark platform. The calculation results are imported through public derivative tools. repository. Considering the wide variety of businesses, complex types and storage types, the team developed a plug-in derivative tool to reduce the cost of offline data development and maintenance. The data offline computing architecture is shown in Figure 5.

?wx_fmt=png

Figure 5 Data offline computing architecture

online computing

At present, the scope of online computing mainly includes real-time user behavior, real-time user portrait, real-time user feedback, real-time interactive feature calculation, etc. Online computing is to quickly capture the user's interests and scene characteristics according to business needs, so as to feed back the user's recommendation results and ranking in real time, and give users an exclusive personalized experience. The realization messages of online computing mainly come from the message subscription of the Kafka cluster and the JMQ message subscription, which are consumed in real time through the Storm cluster or Spark cluster and pushed to the Redis cluster and HBase cluster for storage. The data online computing framework is shown in Figure 6.

?wx_fmt=png

Figure 6 Data online computing architecture

 

 

key technology

The recommendation system involves many technical points. Considering the limited space, here we focus on the more important parts of personalized recommendation.

recommendation engine

The core of the personalized recommendation system is the recommendation engine. The general processing process of the recommendation engine is to recall the candidate set, perform rule filtering, use the algorithm model to score, model fusion and sort, and display the recommended results in a variety of ways. The main technology used is machine learning model, combined with knowledge graph, mining the relationship between commodities, according to user scenarios, through high-dimensional feature calculation and massive recall, large-scale sorting model, personalized recommendation, improve sorting effect, give users the ultimate shopping experience.

The processing logic of the recommendation engine mainly includes assigning tasks, executing the recommender, and merging the recall results. The recommender is responsible for the recall of candidate sets, business rule filtering, feature calculation, sorting and other processing. The technical architecture of the recommendation engine is shown in Figure 7.

?wx_fmt=png

Figure 7 Recommendation engine technical architecture

distribute. According to the recommended scenario, the task is split according to the recall source, and the key is to make the distributed tasks achieve load balancing.

recommender. The core execution component of the recommendation engine obtains personalized recommendation results. The implementation of the recommender is shown in Figure 8.

?wx_fmt=png

Figure 8 Recommender architecture

  • recall phase. To obtain a candidate set, recall is generally based on user portraits, user preferences, regions and other dimensions. If the recall resources for new users are insufficient, the cold start service will be used for recall.

  • Rule filtering phase. Filter manual rules, multiple merchants for one product, parent-child code, postage difference, etc.

  • Feature calculation stage. Combined with the user's real-time behavior, user portrait, knowledge map, and feature service, the feature vector of the recalled candidate set is calculated.

  • sorting stage. Use the algorithm model to score the recall candidate set, and reorder the candidate set according to a certain strategy according to the recall source and the score of the candidate set.

merge. Merge the recommendation results returned by multiple recommenders, and combine them according to business rules, taking into account a certain diversity. For example, the implementation process of “guess you like it” on the homepage of the Jingdong App is shown in Figure 9. Firstly, according to the user portrait information and the user's recent behavior and related feedback information, different recall methods are selected, and business rules are filtered; for the candidate product sets that meet the requirements, user characteristics, commodity characteristics, and cross characteristics of users and commodities are extracted; The model calculates the scores of candidate products according to these characteristics; sorts the products according to the scores of each product, and at the same time enriches the recommendation reasons, considers the user experience, and makes minor adjustments to the final sorted recommendation results, such as diversity display.

?wx_fmt=png

Figure 9 Guess you like the implementation process diagram

User portrait

What differentiates JD Big Data from other manufacturers is that JD has the longest value chain and data accumulation throughout the entire process. The characteristics of Jingdong data are very comprehensive. The data chain records every step of each user's operation: from login to search, browsing, selecting products, page stay time, review reading, whether to pay attention to promotions, and adding to shopping cart, placing an order, payment, The delivery method, whether there is after-sales service and repair, and the complete data of the entire user's shopping behavior are recorded. Through the analysis of these user behaviors and related scenarios, JD.com user portraits are constructed, as shown in Figure 10.

Among them are not only the user's age, gender, shopping habits, but also a large amount of data analyzed according to their shopping behavior, such as whether they are married, whether they have children, whether they are sensitive to promotions, etc. In addition, real-time user portraits can analyze users' purchase intentions and real-time interest preferences in seconds. JD.com recommends user portrait technology system as shown in Figure 11.

User portraits are used in the recommended products of JD.com's terminals, and the smart store launched by 618 is a typical application scenario of user portraits. The products of the smart store include discovering good products, personalized floors, spikes, activities, coupons, categories, labels, etc. Taking seckill as an example, the recommendation results will be weighted according to the portrait model (gender, age, promotion sensitivity, category preference, purchasing power) in the current user's user portrait, so that the products that users are most interested in rank first.

User portraits are also the core basis of scene recommendation. Taking Dongjia Xiaoyuan as an example, many scene tags are gathered according to the user's historical behavior, and the order of scene tags is adjusted according to the current user's portrait model. If the user selects the "Cure all diseases" label, the recommended products will be reordered according to the gender, age, category, promotion sensitivity and other portrait models in the user portrait.

?wx_fmt=png

Figure 10 Schematic diagram of user portrait

?wx_fmt=png

Figure 11 JD.com recommends user portrait technology system

Feature service platform

A feature is a description of an attribute, and a feature is the basis of personalized recommendation. Commonly used features are divided into unilateral features and bilateral features. The unilateral feature refers to the attribute description of the object itself, such as the color of the product; the bilateral feature refers to the description of the degree of interaction between two objects, such as the matching degree of the brand browsed by a user in the last hour with the brand in the candidate set. From the scene of feature generation, it is divided into offline features and real-time features. Offline features are generated in advance by algorithmic models, and real-time features are generated by real-time computing. The quality of features directly affects the effect of recommendation, the performance of feature calculation, and also affects the processing capability of personalized recommendation. In addition, sharing and reusing features can improve the iterative speed of the algorithm and save labor costs.

The feature service management platform mainly performs effective declaration and management for feature data and feature calculation, so as to achieve the sharing and reuse of feature resources. The feature service platform can quickly meet the needs of effective declaration, launch, testing and A/B experiment effect comparison for different features, so that features can be maintained, explained and verified. The main functions of the feature service platform are as follows: customized use of offline features, customized use of online features, new features generated from customized features, online declaration of some features and models, and fast A/B of different feature effects. The architecture of the feature service platform is shown in Figure 12.

?wx_fmt=png

Figure 12 Architecture of feature service platform

Scene Feature Playback Technology

The recommended general processing logic is to recall a batch of items for each request, and then calculate the characteristics of each item based on the user's behavior data and user model. The algorithm model will calculate the score of each product according to the characteristics of each product, and finally select the products with the highest scores to recommend to the user.

The behavior of computing features online is one-time and will not be recorded. Therefore, when training the model offline, if you want to use the above features, you need to calculate these features again on the offline machine. Unfortunately, the features calculated offline are often not exactly the same as the online features, which leads to a poor model training effect. The schematic diagram of scene feature playback is shown in Figure 13. The recommendation service calls the recommendation engine. The recommendation engine records the scene features through the feature playback service and pushes them to the big data platform. Machine learning retrains the algorithm model according to the scene feature data, which in turn affects the recommendation engine. order to form a scene closed-loop recommendation to achieve more accurate personalized recommendation.

?wx_fmt=png

Figure 13 Schematic diagram of scene feature playback

The scene feature playback technology architecture is shown in Figure 14, and the implementation process of the scene feature playback technology is as follows. Online features are generally a series of values. We assemble these features into a string according to certain rules, and then send the features to the server asynchronously using the HTTP POST method.

?wx_fmt=png

Figure 14 Technical architecture of scene feature playback

The server uses Openresty to receive these HTTP requests, and land the feature data in the HTTP request to the local disk file. Openresty is a high-performance web server that can withstand high QPS and has high stability. Its two characteristics ensure the stability of the service.

The data extraction system extracts the data on the disk of the server cluster to the temporary warehouse.

The data extraction system compresses and filters the data, and then pushes it into the Hive table. Different types of requests will be placed in different partitions, making it easier for algorithm engineers to use the data.

Personalized recommendation system is a systematic project, which relies on products, data, architecture, algorithms, human-computer interaction, etc. for scene recommendation. This section focuses on JD.com's personalized recommendation system from these dimensions. The recommendation system has been continuously upgraded with the development of business and changes in social life style. It has experienced the transition from the PC era to the mobile Internet era, from association recommendation to personalized recommendation, and from pure product recommendation to multi-type recommendation. The personalized recommendation system has achieved thousands of people and thousands of faces. It is true that the effect of personalization also needs to be improved, and some experience-related problems are gradually being improved. The aspects that are currently in progress or need to be improved include: rich knowledge maps in terms of algorithms, and extensive application of deep learning; in terms of recommendation systems, it will better support mass recall, high-dimensional feature calculation, online learning, and recommend more real-time and more accurate; products have been Moving forward in the direction of "intelligent recommendation on all screens". Finally, I hope that the personalized recommendation system can make shopping easier, more user-friendly, richer and better.

Recommended reading:

1. User behavior analysis of recommendation system

2. Case: Spark User-Based Collaborative Filtering Algorithm

3. Please stop asking me the difference between Spark's MLlib and ML library

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325086202&siteId=291194637