[Practice] Recommendation Algorithm PaaS Exploration and Practice | JD Cloud Technical Team

Author: JD Retail Cui Ning

1. Background

At present, the recommendation algorithm department supports 900+ recommendation scenarios of 20+ business lines such as the main website, enterprise business, and omni-channel. By sorting out the common requirements of the big promotion operation and the recommendation scenarios of each vertical business line, the existing recommendation algorithm capabilities are precipitated and accumulation, and create a general recommendation capability through algorithm PaaS, improve the efficiency of recommendation empowerment in various business scenarios, and efficiently empower business needs.

  • Why it is PaaS: First of all, we think that PaaS is a better solution and plan, because it provides a basic framework for solving the complex business of super companies that can be changed, expanded, and reusable . In such a Under the framework, repetitive labor can be greatly released to achieve efficient business improvement; secondly, we have also seen other players in some industries, who also implement PaaS on the basis of their own business platforms, and provide services through PaaS The ability to continuously incubate their own innovative projects to reduce their manpower input and reduce their input costs, and they have also launched many PaaS tools for commercial use to create opportunities for greater social value; therefore, we It is believed that PaaS should be a better solution to the problem that we will choose at present;
  • How to help improve recommendation business capabilities : By sorting out common requirements in recommendation scenarios, within the basic framework of changeable, scalable, and reusable capabilities, we classify business requirements and abstract capabilities, and provide step-by-step coping strategies ; For general needs, we provide one-stop personalized recommendation capabilities to meet the needs of fast access to business; for customized needs, by creating efficient and easy-to-use PaaS tools , on the one hand, reduce the investment in algorithm manpower, on the other hand, Shorten the delivery cycle of business requirements;

2. Scheme design

In the process of sorting out the recommended business requirements, we have summarized the demands of the business side into the following two categories:

  • Added recommended business requirements
  1. According to the recommendation scenarios, it can be roughly divided into the access of recommended scenarios such as the first recommendation, my Beijing, store details, shopping cart, short video, live broadcast, and channels;

  2. According to the classification of personalized recommendation capabilities, it can be roughly divided into data access, recall, sorting, filtering/weight adjustment, diversity, rendering and other recommendation algorithm modules, as well as AB experiments and data analysis capabilities;

  3. According to the division of operational demands, it can be roughly divided into support capabilities such as rights promotion, fixed investment, non-scheduled investment, and fixed pit;

  • Existing recommendation position recommendation strategy iterative optimization business requirements

  1. Effect improvement business requirements: can be roughly divided into new product pools, recall new data sources, business labels/feature factor access models, support categories, data analysis, etc.;

  2. User experience business requirements: can be roughly divided into weight adjustment/filtering, negative feedback, diversity sorting, novelty, multi-material interleaving, etc.;

  3. Operational requirements: can be roughly divided into operational capabilities such as special commodity flow support, horse racing mechanism, right promotion, fixed investment, and fixed pit;

In order to more efficiently support the above business needs, the recommended algorithm PaaS is built around the six PaaS directions of data/algorithm components/data analysis/operators/scenario templates/services, with the goal of shortening the demand delivery cycle and effectively improving usage perception.

2.1 Classification of Recommendation Algorithm PaaS Capabilities

As a provider of personalized recommendation capabilities, we hope to transparently display the recommendation system to everyone through business enabling technology and PaaS-based recommendation algorithms. Based on a new understanding of the recommendation system, we will better deduce the future; we will The recommended algorithm PaaS is divided into data/algorithm components/data analysis/operators/scene templates/services, a total of 6 first-level capabilities and 20 second-level capabilities, as follows

level one ability Level 1 Competency Definition secondary ability Second level competency definition
data multiplexing Recommend data reuse in each link Recall data direct reuse 
Simple processing and reuse of recalled data 
Sorting model file reuse 
Code (algorithm components) reuse  non-model recall Cold start recall, portrait recall, similarity related recall, etc. (The difference between each type of recall source is the different calculation scripts called)
KNN recall 
Refined model 
Data Analysis Reuse  basic intermediate table 
 project intermediate table 
Operator multiplexing Including the multiplexing of operator-related functions and the multiplexing of associated data filter operator 
Adjustment operator
top operator
deduplication operator
rendering operator
diversity operator
Scene Template Reuse Comprehensive recommendation of products in the whole station, shopping cart live broadcast... Grass planting show Recommend package home page feed streaming function for you, based on user behavior preferences and comprehensive recommendations related to similar products Multi-material support (stores, rankings, graphics, aggregation materials), support configuration of knn recall, support business adjustment model goals, business self-configuration model features
Main product recommendation package details, shopping cart, for scenarios with participating products Supports similar related recalls of participating products, free shipping recommendation (price and weight filtering), same-store product recommendation, self-operated non-self-operated filtering, support for additional purchase pop-up recommendations for specified business categories, and business adjustment model goals , service self-configuration model features, LBS recommendation
Commodity pool recommendation marketing activities (New Year's Day, etc.), tab sorting, tab classification product recommendation, store + suk recommendation, O2O type recommendation Supports tab sorting, sorting based on tab input parameters, brand recommendation, big promotion demand for new scenarios, flash sales and other recommendation functions: recall based on time, filter function, creation and recall of special pots (kv)
service reuse  Single material service, single product, single store, short video, good things market... LBS 
filter service already have
Refined reasoning service already have

The above classification is based on our current understanding of business needs. With the continuous advancement of the recommendation algorithm PaaS, the definition and classification will continue to migrate;

2.2 PaaS capability building of recommendation algorithm

2.2.1 Componentization of Recommendation Algorithms

The componentization of the recommendation algorithm is a pre-step for platformization and configuration. Through componentization, we can visualize the algorithmic capabilities, let some information deposited in the code be displayed to the public, and make the algorithmic capabilities a truly inheritable asset. , to efficiently empower business needs; specifically, we abstract and encapsulate the algorithm capabilities and integrate them into a runnable code package. Users can use "pluggable" applications through the introduction of algorithm components and instructions for use. in their field of business;

The construction of algorithm componentization mainly includes two parts. One is that the recommendation algorithm PaaS capability builder integrates the recommendation algorithm capabilities, and the other is the recommendation algorithm PaaS capability. You can grasp the rhythm of demand delivery;

Schematic diagram of componentization of recommendation algorithm

2.2.2 Platformization of general algorithm capabilities

The main purpose of platformization is to simplify the complexity of using recommendation algorithm components. Therefore, our requirements for platform tools are usable, visible, and modifiable. It is worth noting that platformization can be divided into two major categories: The first is the platformization of the full link of recommendation capabilities, the purpose of which is to quickly support business needs such as new recommendation positions; the second is the platformization of recommendation algorithm modules. Through such platform tools, we hope to quickly support existing Recommended position recommendation strategy iterative optimization class business requirements;

  • For the platformization of the full link of recommendation capabilities, we cooperate with products, architectures, and platforms to meet the demands of fast access to services by creating rich recommendation scenario templates and providing general personalized distribution capabilities; specifically, for Business parties have different demands for different recommendation scenarios. The PaaS-based project team has built comprehensive recommendations for products on the whole site, related recommendations similar to the main sku, flexible business recommendations, omni-channel store + product recommendations, and assistant product recommendations, etc. Multiple types of general-purpose templates. On these templates, the recommendation algorithm PaaS is based on changeable and reusable basic logic. By providing rich recommendation strategies for business parties to choose and use, it covers more requirements for new recommendation positions;

Schematic diagram of scene template list

  • For the platformization of the recommendation algorithm module, we plan to cooperate with the platform side to improve the work efficiency of the algorithm students and shorten the delivery cycle of the demand by building a batch of efficiency improvement tools;

2.2.3 General Algorithm Strategy Configuration

In order to improve the efficiency of algorithm personnel in supporting business needs, based on the current recommendation system, cooperate with the recommendation framework to complete the construction of a general operator library, including commonly used operators such as fetching, recalling, sorting, filtering, and diversity; in the future, this A batch of general-purpose operators can directly enter the small traffic experiment to verify the effect, reduce the cost of operator configuration, improve code reuse, and achieve the goal of shortening the demand delivery cycle;

Process comparison before and after implementing general algorithm strategy configuration

2.2.4 Low-code development of customized algorithm strategies

In the process of supporting business needs, we found that the development of a small operator also consumes a lot of time for algorithm personnel, including but not limited to: early development communication, strategy development, environment deployment, strategy verification, and operator launch, etc. , we hope to streamline the development process to achieve the goal of improving efficiency. Based on this, we have reached a consensus with the recommended architecture and platform to build low-code development tools for professionals such as algorithms, so that customized requirements can quickly pass the low-code link Rapid development and release;

For the overall idea, refer to the easy studio system of big data

2.2.5 Construction of Recommendation Algorithm PaaS Tool

Here we mainly consider customization requirements, such as recalling new data sources, sensitive product filtering, case troubleshooting tools, etc.; for customization requirements, we hope to provide some efficient and easy-to-use PaaS tools. On the one hand, the repetitive labor of the algorithm is liberated, On the other hand, shorten the delivery cycle of business requirements;

3. Implementation

3.1 Case 1 Scenario template personalized recommendation capacity building

3.1.1 Scenario template development

Scenario templates, as a tool to meet the needs of new recommendation positions, are directly open to business parties. For different recommendation scenarios, we have built a wealth of templates for business parties to choose from, including: comprehensive product recommendation, business details, shopping Cars, live broadcasts, short videos, etc., on each template, we have configured a basic recommendation distribution strategy, and the business side can choose which recommendation strategy to use according to their own needs; the following uses the product aggregation tab recommendation as an example to introduce template personalized recommendation Implementation of capabilities;

First of all, in the early stage of template construction, we will confirm the magnitude of similar needs with the product as a basis for evaluating whether to build a template; One, and the requirements of this type of demand for algorithm capabilities are basically similar. Therefore, we believe that product aggregation tab recommendation is a general and relatively frequent type of demand, and it is necessary to build a template to efficiently undertake this type of demand;

Secondly, as algorithm personnel, we need to sort out the algorithm capabilities for this type of demand. Based on the requirements for recommendation capabilities of more than a dozen similar needs in the past, we can roughly sort out a version of algorithm solutions with complete functions and high coverage; Take aggregation tab recommendation as an example. When accessing data, in most demands, the data provided by the business party is a pool that includes commodity pool (did), virtual category/brand (vcateid) and real category/brand (cate_id) When recalling data, the recall of virtual categories/brands and real categories/brands is often completed through two-way recall of cold start and portrait, and then the scoring of the rank stage is completed through a linear ranking model, supplemented by filtering, weight adjustment and The diversity strategy completes the establishment of the entire recommendation distribution capability. It is not difficult to find out from the above description that if most of the requirements are advanced according to the above process, then we can design a complete algorithm solution to efficiently undertake similar requirements;

Then, on the basis of the completion of the algorithm scheme review, the architecture side completes the function development, and the platform side completes the development of the front-end page;

Finally, when there are similar business needs, we open the template capability to the business side, and the business side can complete the demand through the click-style page, and the progress of this process is controlled by the business side itself;

Scene Template Development Flowchart

3.1.2 Capacity building of automatic recall vocabulary/index library

In the process of undertaking business needs, in most cases, each business party has its own commodity pool. Faced with different commodity pools, we need to dynamically adjust the recall vocabulary or index according to the changes in the commodity pool. library, if we want to fully automate the personalized distribution capabilities, we need to create a new set of recall vocabulary/index library construction tools. Based on this, we jointly proposed a one-click pot/index library creation solution with the platform side , specifically, the algorithm staff abstracts and encapsulates all recall vocabulary/index library production scripts required on the template, reserves input and output parameters, and the platform side obtains specific recall vocabulary/index library creation commands through the front-end interface , and use this command as an input parameter to enter the code package pre-packaged by the algorithm personnel. In order to update the task regularly every day and automatically create a BDP scheduling task, the output parameter of the code package is sent back to the platform side through DUCC as a subsequent creation of vocabulary/ The basis of the index library, so as to complete the fully automatic creation of the recall vocabulary or index library;

One-click pot/index library creation and implementation

3.1.3 Multi-service sequencing model support

In order to cover more business needs, in the sorting module, we mainly consider the requirements for sorting capabilities under different business modes. For example, in the sinking scenario, it is more necessary to improve the UCVR index, and some business needs of the master station hope to improve the user's UCTR Therefore, in order to take into account various business needs, we sorted out three commonly used models, namely, the multi-domain sorting model of the main site, the sinking sorting model of the special edition, and the enterprise sorting model of ToB, and integrated the above three models Into each template, and provide the introduction and usage instructions of each model, the business side can choose according to the specific content of the demand;

Sorting Model Selection

3.2 Case 2 Create efficient and easy-to-use PaaS tools

The rational use of tools can not only improve our work efficiency, but also make our work easier; here we take the woodpecker we created in the user experience as an example to explain the application of PaaS tools in business; ( Woodpecker for noun explanation: a platform tool that supports offline filtering/unbanning and self-configuration)

3.2.1 Demand sorting

In the user experience module, there are often business needs that need to filter products, categories, sensitive words, etc., or filter within a certain period of time, and then release them after the time passes; before we created a woodpecker, we received similar needs. , will manually write commodities, categories or sensitive words into a text, and then push the text to a certain path of hdfs. When the next day's BDP scheduling task is executed, the data table will be updated to achieve filtering or release Observing the above process, it is not difficult to find that manually modifying the text can easily lead to errors, and unintentional deletion or addition may cause the next day's scheduling task to hang up and become unstable; in addition, after newcomers take over such needs, training The cost is extremely high, and you need to teach him several times before you dare to entrust him with such a job, which is difficult to operate;

In order to solve such problems, we plan to create an efficient and easy-to-use PaaS tool. Such a tool can provide stable addition, deletion, modification and query, and it should be easy to operate. It is best to know how to operate it at a glance. Based on this idea , we combined the platform to create Woodpecker;

3.2.2 Design and development of Woodpecker

Design ideas:

Through the jrec platform, all offline filtering/release can be configured as paas. The platform needs to have the following capabilities:

  1. Woodpecker platform provides filter and release configuration entry, provided by jrec platform;

  2. The long-term rules configured on the platform can be moved offline to reduce the occupation of online service resources;

  3. Offline filtering can be flexibly configured and supports offline release, reducing manual operation costs;

Design:

The overall solution design is shown in the figure below. After configuration through the platform WEB interface, the data will be transferred to the offline computing task part through DUCC. After the offline computing task is completed, the derivative will be cached in jimdb, and the filter service or ps filter operator will be configured online. The filtering and release of commodities, categories or sensitive words can be completed;

Woodpecker Landing Implementation

3.3.3 Woodpecker use

Woodpecker has built and delivered the corresponding algorithm personnel to use, and we also provide a detailed user manual for newcomers to learn;

4. Summary of practical experience

In the process of exploring and practicing the PaaS of recommendation algorithms, we, as capability providers and capability users, on the one hand, summarize and sort out the PaaS tools that need to be provided from the perspective of capability providers; on the other hand, from the perspective of capability From the user's point of view, to evaluate whether the tool is efficient and easy to use;

As a provider of capabilities : through sorting out business requirements and the long-term business experience of PaaS builders, based on the existing recommendation system, through the componentization of recommendation algorithms, re-understand the system and re-plan the process;

As a user of capabilities : From passive to active, I can truly perceive the improvement of tool efficiency, be good at using tools, and use PaaS tools to easily complete complex business requirements. As long as I want to do it, I can control the rhythm of demand delivery. ;

5. Outlook for the future of work

We hope that under the compound interest of long-termism, the accumulation of recommendation algorithm PaaS will become a miracle; based on our current understanding of business needs, in the future, we will continue to cultivate in the following aspects:

5.1 Scene Template Hierarchical Personalized Recommendation Capability Building

In the future, we will upgrade the personalized capabilities of templates. Based on the current status of the basic version, we will provide advanced version and high-level version capabilities to meet more diverse demands of the business;

5.2 Create efficient and easy-to-use PaaS tools

5.2.1 Single material service capacity building

First of all, we need to explain why we need to build a single-material service capability. An important reason is that scene templates can only support new recommendations, and such requirements should not be very complicated. For complex new recommendations or existing The iterative optimization scene template of the recommended position cannot provide support; based on this, we propose the concept of service reuse. Specifically, we plan to build a single material into a service one by one, and the algorithm staff will focus on optimizing the service in an all-round way. The new recommendation position needs to be optimized and the iterative optimization of existing recommendation positions are empowered through services, which can not only reduce the input of algorithm manpower, but also shorten the delivery cycle of business requirements;

5.2.2 Further upgrade of algorithm component platform

In order to improve the user experience of recommendation algorithm PaaS capabilities, we plan to platformize some common algorithm capabilities to get rid of the current operations that still require algorithm personnel to manually copy, and truly realize the point-and-click operation method. Therefore, in the future, we will It will also cooperate with the platform side to jointly build such platform capabilities and further release the repetitive labor of algorithm personnel;

Musk announced that Twitter will change its name to X and replace the Logo . React core developer Dan Abramov announced his resignation from Meta Clarification about MyBatis-Flex plagiarizing MyBatis-Plus OpenAI officially launched the Android version of ChatGPT ChatGPT for Android will be launched next week, now Started pre-registration Arc browser officially released 1.0, claiming to be a replacement for Chrome Musk "purchased for zero yuan", robbed @x Twitter account VS Code optimized name obfuscation compression, reduced built-in JS by 20%! Bun 0.7, a new high-speed JavaScript runtime , was officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10091394