Fifth clock in

Introduction to DIN Model

The full name of DIN is Deep Interest Network, which is a model proposed by Alibaba in 2018 based on the previous deep learning model that cannot express the diverse interests of users. It can be considered by considering [given candidate ads] and [user’s historical behavior ] To calculate the representation vector of user interest. Specifically, the local activation unit is introduced to focus on relevant user interests through the relevant parts of the soft search history behavior, and the weighted sum is used to obtain the expression of user interests related to candidate advertisements. Behaviors that are more relevant to candidate advertisements will get higher activation weights and dominate user interests. The representation vector is different in different advertisements, which greatly improves the expressive ability of the model. Therefore, this model is also more suitable for the task of news recommendation. Here we calculate the user's interest in the article based on the correlation between the current candidate article and the user's historical clicked article. The structure of the model is as follows:
Insert picture description here
def DIN(dnn_feature_columns, history_feature_list, dnn_use_bn=False, dnn_hidden_units=(200, 80), dnn_activation='relu', att_hidden_size=(80, 40), att_activation=“lice_normalization, att_weight” =0, l2_reg_embedding=1e-6, dnn_dropout=0, seed=1024, task='binary'):

dnn_feature_columns: 特征列, 包含数据所有特征的列表
history_feature_list: 用户历史行为列, 反应用户历史行为的特征的列表
dnn_use_bn: 是否使用BatchNormalization
dnn_hidden_units: 全连接层网络的层数和每一层神经元的个数, 一个列表或者元组
dnn_activation_relu: 全连接网络的激活单元类型
att_hidden_size: 注意力层的全连接网络的层数和每一层神经元的个数
att_activation: 注意力层的激活单元类型
att_weight_normalization: 是否归一化注意力得分
l2_reg_dnn: 全连接网络的正则化系数
l2_reg_embedding: embedding向量的正则化稀疏
dnn_dropout: 全连接网络的神经元的失活概率
task: 任务, 可以是分类, 也可是是回归

For specific use, we must pass in the feature column and historical behavior column, but before passing it in, we need to preprocess the feature column. details as follows:

First, we need to process the data set to get the data. Since we predict whether the user clicks on the current article based on the user’s past behavior, we need to divide the data feature column into numerical features, discrete features and historical behavior features. Part, for each part, the DIN model processing will be different.
For discrete features, in our data set are those categorical features, such as user_id. For this categorical feature, we must first go through embedding to get each The low-dimensional dense representation of features. Since embedding is required, we need to build a dictionary for the value of the category feature of each column and specify the embedding dimension. Therefore, when preparing data using the DIN model of deepctr, we need to pass The SparseFeat function indicates these categorical features. The incoming parameters of this function are the column name, the unique value of the column (used to build the dictionary) and the embedding dimension.
For user historical behavior feature columns, such as article id, article category, etc., we need to go through embedding first, but the difference from the above is that for this feature, we are getting the embedding of each feature After the representation, it is necessary to calculate the correlation between the user’s historical behavior and the current candidate article through an Attention_layer to obtain the embedding vector of the current user. This vector can be based on the similarity between the current candidate article and the historical article that the user has clicked in the past. The degree reflects the user’s interest, and changes with the user’s different historical clicks to dynamically simulate the changing process of the user’s interest. This kind of feature is a historical behavior sequence for each user. For each user, the length of the historical behavior sequence will be different. Some users may click on more historical articles, and some click on fewer historical articles, so we still need to This length is unified. When preparing data for the DIN model, we must first specify these categorical features through the SparseFeat function, and then we need to fill in the sequence through the VarLenSparseFeat function to make the historical sequence of each user the same length, so this function There will be a maxlen in the parameter to indicate the maximum length of the sequence.
For continuous feature columns, we only need to use the DenseFeat function to specify the column name and dimension.
After processing the feature column, we correspond the corresponding data with the column to get the final data.
Let’s get a feel for the specific code. The logic is like this. First, we need to write a data preparation function. Here, we need to prepare data according to the specific steps above, get the data and feature columns, then build and train the DIN model, and finally based on the model. carry out testing.

Guess you like