Paper Reading:Wide & Deep Learning for Recommender Systems

This article is a reading note of the paper Wide & Deep Learning for Recommender Systems, a paper published by Google in 2016.

ABSTRACT

For solving regression and classification problems, there are two types of methods, one is wide and the other is deep. wide, usually a linear model, with many input features, with cross-features that enable nonlinearity (so wide.). Deep is mainly a model based on neural network.

Everything has two sides, both pros and cons. What are the benefits of the wide model? How the features interact with each other is clear at a glance, that is, the interpretability is good. What about the downside? Feature engineering is laborious, and patterns that are not present in historical data cannot be learned. What are the benefits of deep models? More general, you can learn some unseen feature combinations (because it is based on embedding query and item). What about the downside? It is too general and may recommend something irrelevant.

The model proposed in this paper is to fuse the wide model and the deep model together, so that the two models restrict each other and take the advantages of the two models.

How to integrate? How to jointly train? Why is the effect better than the linear model or deep model alone. This is the most interesting point of this paper.

This paper also describes how to deploy from an engineering perspective, which is also worth learning from.

INTRODUCTION

The author of this part further explains several points mentioned in abstract.

One challenge in recommender systems, is to achieve both memorization and generalization.

For memorization and generalization, there are explanations in the paper:

Memorization can be loosely defined as learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data.

Generalization, on the other hand, is based on transitivity of correlation and explores new feature combinations that have never or rarely occurred in the past.

I understand that memorization is to summarize the past, and generalization is to discover the unknown

This paper present the Wide & Deep learning framework to achieve both memorization and generalization in one model, by jointly training a linear model component and a neural network component.

RECOMMENDER SYSTEM OVERVIEW

Here the author introduces the recommendation system, which is concise and comprehensive.

The first is the retrieval process. Because there are too many candidates, it is impossible to calculate a score for each, so first filter and narrow down the candidate set. This step is usually done through simple models or rules.

Next is rank, which gives the candidate a score.

WIDE & DEEP LEARNING

Next, we will introduce how this model is constructed.

 

1,the wide component

Linear model, generating cross-features through cross-product

2,the deep compoent

feed-forward neural network

The above figure shows clearly, continuous features are directly input, and category features are embedded.

3,join training

The loss function uses log-loss.

For the wide part, FTRL with L1 regularization is used as the optimization method, and the deep part uses AdaGrad. 

SYSTEM IMPLEMENTATION

This part covers some of the details of building and deploying models, but there are a few interesting points

1,data gengeration

Map categorical features to IDs, and continuous features map to 0-1 through its cumulative distribution

2,model traing

Here, categorical features are mapped to a 32-dimensional embedding, and then spliced ​​together with continuous features to form a vector of about 1200 dimensions

It is worth noting here that for the new data, the paper adopts a warm-starting method, using the parameters of the embedding and linear model of the original model as the starting parameters of the new model.

3,model serving

The service time is less than 10ms, parallel.

EXPERIMENT RESULT

From the experimental point of view, the offline AUC, the deep learning is not as good as the linear model, the wide & deep is the best, and the online effect is more prominent.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325165137&siteId=291194637