Ali Fliggy personalized search and sorting exploration practice (2021-06-12)

Introduction: Travel commodities (such as air tickets, train tickets, bus tickets) are more standard products than physical e-commerce, and user decision-making factors are more single, while most of the industry is based on simple rules, such as time, price or business logic weighting, which is difficult to satisfy users. personalized travel needs. In the past period of time, Fliggy has been continuously exploring the personalized sorting technology of traffic search. This time, Ali Fliggy's prime teacher will share the practice and innovation of traffic personalized search and sorting technology in detail around the intelligent traffic shopping guide. The main contents include:

  • Background: Introduction to Transportation Business

  • Challenge: The particularity of transportation business

  • Solution: From Business Rules to Personalized Sorting Models

  • Effect: Model optimization iteration results

  • Summary: Further optimization direction

01 Background: Introduction to Transportation Business

1. The pain points of the transportation industry

From the perspective of the transportation industry itself, the current traffic sorting strategies are relatively simple, and most of them are based on simple rule sorting. This single rule sorting cannot efficiently match the needs of users, and it is difficult to meet the diversified and personalized travel needs of users. This is a relatively large An industry pain point.

2. Insufficiency compared to physical e-commerce

Compared with physical and physical e-commerce, the commodity decision-making information of transportation is very limited, and commodities are highly standardized, such as travel time, price and other factors. The key decision-making information has been given a clear display to users on our search list page. Different from traditional e-commerce, that is, physical e-commerce products have more decision-making information, and transportation commodity information decision-making can only display some decision-making factors, so you need to go to the details page to see the real key decision-making factors. In the traffic scenario, this decision-making factor and physical e-commerce will produce a significant difference.

3. Related theoretical research

Before 2018, there were some theoretical researches in the industry, which were mainly concentrated in academia, while the related industrial applications were relatively few. Most of them were based on traditional linear models and machine learning models for search and sorting of traffic. In 2018, Amadeus published the paper "Deep Choice Model Using Pointer Networks for Airline Itinerary Prediction", which used the deep sequence model to search and sort air tickets, and achieved relatively large results on offline data sets.

02 Challenge: The particularity of transportation business

Challenge 1: Differences in user nature based on field-goods-person

First, user behavior is extremely sparse, and the frequency of user travel is relatively low, such as three or five times a year; second, user travel pays more attention to service experience; and the decision-making factors are too single and the decision-making process is relatively complicated, and the decision-making cycle is very long. From the perspective of goods, goods are dynamic in real time, inventory prices are dynamic in real time, and transportation resources are also a real-time dynamic constraint. In addition, goods are highly standardized, and it can be found that the needs of users in different scenarios are very different, such as Alipay. APP or Taobao, their users on the three terminals are very different in nature. Even in different portals of the same terminal and in different traffic scenarios, the nature of users is also very different. How to meet the nature of this difference and do personalized sorting , the challenge is still great.

Challenge 2: Comparison with physical e-commerce: information silos

Further, in the traffic scenario and the physical e-commerce scenario, the search and sorting are performed. From a technical comparison point of view, first of all, we can see obvious differences in the recall. The traditional physical e-commerce can easily build a network structure such as UQI. Because N commodities can be recalled under the same Q, the work commodities are connected by query to generate a network structure, but in the traffic scenario, this network structure is split, forming one by one. Information silos. For example, in a search from Hangzhou to Singapore, one origin and one destination, CA767 can be recalled under the destination, but VS251 cannot be recalled, and can only be recalled through a route such as Shanghai to London. In this case, it is difficult to associate the two products with the user through query. This brings great difficulties to product characterization and user characterization.

03 Solution: From Business Rules to Personalized Sorting Models

1. Ranking system architecture

How to solve the above difficulties? This is an overall online system architecture diagram. The bottom layer is log collection and data preprocessing. Data samples are further processed on the Alibaba Cloud ODPS platform, and then a model is built on Tensorflow, which is deployed online to the TPP environment and provided externally. Sorting service.

2. Deep Listwise Model

Why use the DLM model? The DLM model has the following advantages:

  • Diversity of traffic sorting results

  • Simulate the user's decision-making process

  • Low engineering scoring delay

Amadeus mentioned in the paper that their method has achieved good optimization results, including a significant increase in top N accuracy. The core idea is to score the flight sequence listwise based on the Pointer Net network structure. Pointer Net is an article published on NIPS in 2015. In 2018, they introduced the core idea of ​​​​the article into the sorting of air tickets.

3. DCM:Deep Choice Model

In the Encoder stage, the RNN network is used. For example, we experimented with three sub-networks of LSTM, biLSTM and transformer. The results of the offline experiments are relatively close, and there is no obvious difference. The third stage is the Decoder stage, which is similar to the role of a decision maker and includes all the inputs of the above sequence. The Decoder output information can be regarded as a process in which a user browses all flights and then makes a decision on flights. The Attention stage is essentially to calculate the similarity between the Decoder and Encoder vectors, which is equivalent to the user choosing a more suitable flight. α is the weight of Attention, that is, the flight sequence scoring result. According to the online experiment, the overall conversion rate has increased to a certain extent, but the increase is not particularly large.

4. PFRN:Personalized Flight Ranking Network

The effect of the first version of the DCM model inspired us a lot. Based on the exploration of this version of the model, we further optimized the proposed PFRN model. This paper has been published in CIKM'20. The model itself is a classic two-tower structure. The left is the representation of the flight sequence, and the right is the representation of the user's behavior sequence. The upper layer pays attention to the two sequences, indicating the user's preference or interest in the flight sequence. We propose the LFE sequence coding structure, which is a relatively big innovation.

① PFRN:Listwise Feature Encoding ( LFE )

② PFRN: How to alleviate the sparsity of user behavior

The second problem that the model needs to solve is how to alleviate the sparsity of user behavior. Our current work is relatively simple: based on user population division (based on business rules), the population is divided into six categories, and each user will Mapping to one of the categories, when representing user behavior, in addition to the behavior of individual users, the behavior of the group will also be introduced. For example, if the user is a travel/business trip intent user, the user group purchasing behavior can be integrated into the existing current user behavior. It is found through experiments that this integration will greatly improve the overall ranking effect.

04 Effect: Model optimization iteration results

The first type of model: rule-based Cheapest, sorted by low price; the second type of model: traditional machine learning model; the third type of model: a comparison of some papers related to search ranking in recent years. Online experimental results, the overall conversion rate increased by nearly 4%.

05 Summary: Further optimization direction

From the current work, a traffic search sorting algorithm system has been initially established, and certain business results have been achieved. In terms of user representation, in addition to the long-term and short-term behaviors of users, user group behaviors are also introduced for data enhancement to alleviate the sparsity of individual behaviors.

There may be three areas of work in the future:

  • In-depth understanding of travel intention, such as the division of group user behavior, and user grouping should be further improved; 

  • User behavior sparsity modeling, this work has a great impact on the improvement of the overall ranking effect;

  • In terms of the overall sorting strategy, the number of natural recalled products is limited. By adding more recommended slots, and at the same time introducing rich traffic-related content through content operation, new product styles and mixed sorting of multi-source information are also relatively big challenges for us. .


Sharing guests:

Prime number

Ali Fliggy | Shopping Guide Algorithm Team

In 2016, he joined the Fliggy Technology Department and deeply participated in the user intent prediction, query recommendation, and personalized sorting of the Fliggy global search project. Currently, he is focusing on the construction of the shopping guide algorithm system for intelligent transportation.

Reprinted: Ali Fliggy's Personalized Search Ranking Exploration Practice

Guess you like

Origin blog.csdn.net/yangbindxj/article/details/123911936