Mushroom Street contents of the information flow sorting Practice

 

REVIEW: This paper finishing from MOGU DDay held in August 2019 shared algorithm field reports. Covering the critical path algorithm iterations of mushroom Street contents stream sorting algorithms whole year; at the same time we are taking the road of evolution fit just right to recall, typical iteration paths of the three-stage sorting. By the team reviewed the information can be made public, and therefore finishing this, expect more exchanges with you.

Introduction

 

Mushroom Street home page content adhering to let more people were yearning for fashion and use mushrooms Street targets, to provide users with stylish, good-looking, buy value. The following slide in the first two views taken our app earlier delayed up to three people in the form of content aggregation, content to strengthen the country in the form of waterfalls flow changes, the impact of such changes on products for our algorithm and architecture or engineering pretty big. The third chart is to click on the content of the effect of full-screen page, the user can point like, comment, share, if you like the content of a single product, can also be grass and buy buy buy, continue to decline is surprising to find out more relevant "content recommendation."

 

 

How will that " stylish, good-looking, buy " transformed our algorithm to index it ? Content community "fashionable good looks" of this basic problem solution is PUGC operations , that street I presented to the user are recruited KOL published through operational reviews, to meet the content selection criteria, which is almost known, small red books such UGC community generated content approach is different.

 

In this mode, the contents of the sorting algorithm pursuit of "fashion look good," we can understand the content of Ta force , that is, users do like to watch and see much more to see for a long time. Continue dismantling down Home basic indicators is long (dwell time) when the content CTR (ctr) and the user stays . Further terms is: Home exposure pv + pv + click on the home page click on two pages long stays . Of course, business users prefer the retention point of view, the core of user-related indicators is very important, but these indicators on the one hand and long-term indicators of short-term experiments can not be observed, on the other hand ctr + long-term and short-term indicators index dwell time also have some relevance sex. Therefore, we communicate with the business side of the anchor short-term indicators, continued to observe long-term indicators.

 

For the "buy", it is the content users want to be able to help take the clothes, Tiaoyi Fu, we can understand the force Ta of goods , that is, at the same time sort of content need to consider the contents of buy buy buy value. This we can be transformed into cvr content , content and force double high content must be of high value commodities force, but the status quo is not part of a related item, grass content related items also need to accumulate a certain stage, and from Home to show the contents of the content and then click click commodity exposure for cvr data link related items different from the previous pure electricity supplier and strengthen the country scene. In addition, in order to make the business of people make money, opened a cpo commodity sub-commission plan, the value of different commodities into consideration the different content even more diversified.

 

Can be said that the characteristics of our service business is: content is associated ecological multi-entity, multi-objective optimization is our go hand in hand . Here simply to draw the entire contents of the various entities and ecological relationships. From a consumer perspective, visible entities include: dwelling Daren (graphic look up to people, short videos of people, live up to people) and those of people on the platform for the production of fashion content (including the studio anchor Daren ), up to people in the production of content will choose content store merchandise , corresponding fashion brand , and Daren optional content label . Community operators will be integrated community tonality and content review of all aspects of content and content tagging. In addition In addition, the community will operate on a regular basis the appropriate tags and content organization popular topic to the topic of the moment have, the trend of the thematic content to show up in an aggregated form.

 

Overall entities, Home algorithmic synthesis to sort include: fashion content, the studio, the core brand, popular topics, and should be considered in the process of ordering the relationship of the entities to reach people, goods, label, including.

 

Because our team is also responsible for ordering goods, it is inevitable that we will think: What are the similarities and differences with the content stream ordering merchandise ordering it?

 

The same place is obvious: they are scheduling problem. That being so, it is the overall architecture and programs will have similarities. We can adopt the recommended two-step song: matching + Ranking . Since the contents and ordering of goods can fall into the same framework, then we can reuse past experience, can, can learn from past experience to make decisions on each iteration point along the common technical evolution path.

 

But, in fact does, after all, "No Free Lunch"! The aforementioned characteristics of our content and the entire ecological community composition, we can see that with respect to the ordering of goods, the contents of the user's probably not the same as expected. User cost of content consumption on the very same. That is a different business forms, different indicators we are concerned, that sort of evaluation system is also different, our practice indeed encountered many times, " the same execution path different results in commercial and sort the contents of " experience .

 

We look at the industry's mainstream recommended framework + Ranking Marching . The flow chart of the main process we borrow YouTube in FIG. When a user requests, matching the first selected candidate set smaller from the larger pool of material depending on the context and the user's behavior history and other scenes, then the candidate set of integrated data and user behavior ranking.

 

Matching the mainstream evolution path as follows:

  1. The first generation is Collaborative filtering methods and variations thereof , including swing, simrank the like, is mainly based on heuristic rules define the degree of similarity is calculated off-line and two materials each material most similar material, the user can click through the material A recall that similar materials;

     

  2. The second generation embedding and variations thereof , including graph embedding, node2vec the like, is used directly from the rules and methods that id materials, each material model will learn a low-dimensional representation, similar model to ensure that the material in the low-dimensional space close proximity, better calculated given a material similar to the material by these methods;

     

  3. The third generation model is the recall depth model, including dssm, YouTubeDNN, TDM, etc. , because the second class methods or learning a material representation of it, that third category of methods to study how to give you a direct user count out which materials should be recalled, and not be calculated in advance, as portrayed ability and expressive structure and better.

 

For Ranking , where the first edition of lazy borrow notes @ Wang Zhe students finishing, finishing his major in wide & deep on the basis of various model variants; and because we try to match the path is relatively high, as the latter part of the transformer and lifelong modeling we did not get online results table temporarily. On the whole, the industry's evolution and our ranking model of three generations of the same ctr also highly estimated the change max model. This area know almost the district has a lot of small partners a lot more detailed combing, and thus will not repeat them.

 

 

We Home Content Framework is divided into three parts, plus a matching and ranking on the basis of the visualization layer , including: business arrangement, personalized broken up, transfer rights and other personalized features, to meet the home of many types of multi-service multi-entity adjust goals. In the matching layer, we made a parallel chain strategy combined flow cell and vector recall layout to solve efficiently personalized, diversity, timeliness of content, special support and other business objectives and requirements. In the ranking, we are from LR to WideLR, try WIDE + Deep model, and try to learn multi-objective sorting, on the whole continue to optimize the matching efficiency, allowing users to be able to match the most suitable KOL, content, and other topics.

 

In the part of the recall, we first product recommendations based on the idea of deploying a parallel strategy chain strategy matrix composed of , based on the relationship between the entity and related dimensions combing, linked to the station's business data, open data-link various business scenarios. This phase, the algorithm proceeds mainly from the new regulatory strategies and gain the recall strategy to bring order and proportion, in the most simple and easy way to spread the parallel features, solving business problems covered. Then, in order to further enhance the relevance of generalization, we tried i2i use heuristic methods to replace simrank this content embedding model , which also includes embedding method of embedding methods offline and online , are brought online before and after 3 % and 5% increase . Finally, in order to exploit the reasonable flow effect, consider the recall strategy to simplify, video recommendation from the beginning, we proceed with the flow cell recall program , taking into account personalized, timeliness, diversity, special support and other business goals, resolve in the past in order to ensure timeliness, global business support weighted do bring the overall online index fell problems brought online to enhance the effect of 4% ; on this basis, we continue to try to lower the original lengthy personalized personalized traffic pool policy chain streamline, to replace the original recall dozens of strategies may not take into account the dimensions of each property issues through deep model way, the final dssm + embedding chain effect with the original policy unchanged , which follow further optimized by recall model opens a new chapter for us .

 

 

In sorting areas, first, we recommend ordering goods using the classic idea that dense features plus LR simple way , thinking fast on this line, and the feature can be reused recall link. At the same time, we start with the overall target business model combing content, increase ctr basis of long residence time model , initially through new features and models we also brought long stay, but because ctr growth brings. After determining the key indicators of the long residence time, we will begin a long stay when added to the model optimization objectives, by reweight samples WIDE model on the basis of its introduction into the objective function, bringing long lift 7% of stay . In the model structure, drawing Mall practices by large-scale cross offline behavior sequence features LR , brought a 20% improvement line ; further extend the model to store more successful Wide + Deep model , online only bring 2% improvement , which we also need to combine the characteristics of home business and to further explore and try.

 

 

Have some experience in the whole process of practice to share with everyone, with respect to the details of the algorithm model, the more important point is:

 

The most important task, that is, data and log RBI buried point, buried properly and reasonably point is extremely important, real-time features RBI abnormal helpful . This is the product, client, back-end, quality control, the number of warehouses, BI, as well as to participate in the algorithm, trivial complex work.

 

Second priority, is to build products based on morphological characteristics of selected samples and models, after all, as Free Lunch No . The importance of product and interaction we can understand, we are based on product features to model; and user experience feedback is timely, user feedback, and cherish every junior partner feedback questions.

 

Third priority, that I the weapon where to invest, effective communication and consensus requires an understanding of business issues and business students to achieve after, of course, in this process, in determining the priorities to solve any business problem, the model can explanatory particularly important, in short, we still have to have in mind the number of points on their own position and value.

 

Finally, the above work, My Street Recommended result of joint efforts of work teams. Review content sorting algorithm Throughout the year, teams can be said to do at key nodes have achieved improved results and business goals, in support of more than 30 high-intensity shuttle team can continue to function while the iterative technology and multi-faceted growth. Although there are some regrets and shortcomings of, but I believe these will drive us to go further in the future! ~ ~ Finished article

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104584523