How to integrate Embedding into traditional machine learning framework? (Reprinted learning)

Reprint learning

Original link: https://blog.csdn.net/xixiaoyaoww/article/details/111412968

References: https://zhuanlan.zhihu.com/p/162163054 https://zhuanlan.zhihu.com/p/320196402 (Written by Shitashi boss recently)

LR itself is a classic CTR model, which is widely used in recommendation/advertising systems. Most of the input features are discrete/combined. So for Embedding technology, how to integrate it into the LR framework without using deep learning models (assuming that DNN cannot be used)? Let's take a look at how Dr. Shitashi from Tsinghua University answered.

The actual meaning of the problem

In fact, this question can be extended a bit, that is, how to use Embedding information in traditional machine learning algorithms (LR/GBDT).

This question is not groundless, but has a certain practical value. At present, the popularity of DNN is unabated, and it has basically become the standard algorithm of recommendation and search systems. Traditional machine learning algorithms, such as LR and GBDT, have been in the cold, and have not received attention. As for why DNN is able to succeed in the position and dominate the favor of all the hitters, please refer to my article "Making Out of Nothing: On Embedding Thoughts in Recommendation Algorithms" [1].

However, DNN has a fatal disadvantage, that is, it is difficult to go online. During the training, everyone, the assistants, add all kinds of cool structures, attention, transformer, capsule, and everything that can be added to it. Watching the offline indicators increase all the way, my heart and face are happy. , But completely ignored the back-end engineer next to him and gritted his teeth with hatred. The more complex the model, the offline and online indicators may not be better, but the online time overhead will definitely increase, which will affect the relationship between the algorithm and the back-end colleagues (why bother to beat the workers), and the offline indicators will be perfect. The model has no chance to go online at all. Although there are already online serving frameworks such as TF Serving, it is not available out of the box and requires a series of performance tuning to meet online real-time requirements.

Therefore, if you are in a small team and the technical capabilities of the back-end engineers are not strong, online DNN will become a problem. At this time, the traditional LR and GBDT have their advantages. If all ID features are used (real number features are also bucketed into ID features), then the LR online will be simplified to "look up the table to take the weight, and then accumulate", even the multiplication is saved, and the real-time performance is naturally guaranteed.

However, if you want to have both fish and bear's paws, you have to use simple traditional machine learning algorithms and also want to use Embedding to improve scalability, what should you do? Alas, it took a long time to talk, just to solve the problem, the purpose is to explain the practical significance of this problem, and cause everyone to pay attention to this problem.

It is not recommended to directly use Embedding itself

First of all, if your main framework is a traditional machine learning algorithm, then Embedding must not be learned by End-To-End, and you need to learn it offline with another algorithm. For example, if you use DeepWalk to learn the user's purchase sequence first, learn the Embedding of good products offline.

The second question is how traditional machine learning uses these embeddings. Of course, the simplest method is to use it directly. In order to use a 64-dimensional vector, it is equivalent to adding 64-dimensional features to LR. However, I do not recommend using this method:

  • The reason why LR is used online is to use its ability to process high-dimensional and sparse ID features. Online operations are simplified to "look up tables and accumulate weights" quickly and conveniently. If you use dense features like vectors, then the advantages of LR will no longer exist. What's more, some embedding, such as the embedding of pictures, may have thousands of dimensions, which destroys the sparsity, and online storage and calculation are very difficult.
  • Embedding used by LR is calculated off-line, black box, and poor interpretability. And we use LR, the picture is that it is interpretable and easy to debug.
  • In addition, Embedding is not stable, because the offline program that calculates Embedding may also need to be upgraded. Once upgraded, the previous accumulated training samples are all invalidated, because the old and new Embedding are definitely not in the same coordinate system and cannot be mixed.

It is recommended to use derivative indicators based on Embedding

Therefore, I do not recommend using Embedding directly in LR. In my opinion, the correct posture should be based on offline generated Embedding, a series of indicators that measure the relevance of <user,item> are derived, and then these derived indicators are used in LR. This approach is not my fantasy, it has provenance and has been tested in practice. Airbnb's "Real-time Personalization using Embeddings for Search Ranking at Airbnb" [2] uses this method to feed the offline calculated Embedding into their GBDT ranking model.

For the detailed algorithm, please read section 4.4 of the Airbnb paper. I will briefly describe Airbnb’s approach as follows:

1. The premise, Airbnb has already calculated the listing (house) embedding offline

2. Collect user history from multiple angles,

  1. For example, Hc represents the collection of listings the user has clicked on in the past 2 weeks.
  2. Hs represents the set of listings exposed to users but ignored,
  3. Hw is a collection of user’s favorite listings,
  4. Hb is a collection of listings that the user has booked,...

3. Take the average of the embedding of all listings in one of the above collections, and treat it as the embedding of the user under this behavior (click, ignore, favorite, reserve,...)

4. Take the user embedding under a certain behavior, and the current listing embedding to be sorted, and calculate the cosine similarity as the user's tendency to perform a certain action (click, ignore, favorite, book, ...) on the current listing. Use this "propensity score to perform a certain action" as a real number feature and feed it to GBDT to train the ranking model.

5. Use this "propensity score to perform a certain action" as a real number feature and feed it to GBDT to train the ranking model.

6. In addition to the long-term interests of the above users (H* is collected on a weekly basis), Airbnb also calculates the similarity between the current listing embedding to be sorted and the listing embedding of the user's last click to characterize the user's short-term interests.

All derived indicators used by Airbnb based on listing embedding are shown in Table 6 in the paper (see original blog)

to sum up

  • Using Embedding in traditional machine learning, this question has certain practical significance. Especially when you want to avoid the complicated online process of the DNN model, but also want to get the scalability improvement brought by Embedding.
  • Embedding is used in traditional machine learning models. I don't recommend using Embedding directly. Instead, it is recommended to use derivative indicators calculated based on Embedding.

Guess you like

Origin blog.csdn.net/weixin_43901214/article/details/111520397