Deep learning recommendation system (7) NFM model and its application on Criteo dataset

Deep learning recommendation system (7) NFM model and its application on Criteo dataset

1 NFM model principle and its implementation

1.1 Principle of NFM model

Whether it is FM or its improved model FFM, in the final analysis it is a model of second-order feature intersection. Plagued by the combinatorial explosion problem, it is almost impossible for FM to be extended beyond the third order, which inevitably limits the expressive power of FM models.

Scholars from the National University of Singapore used the nonlinear and strong expressive capabilities of neural networks to improve the FM model and obtained an enhanced version of the FM model, the NFM model.

As shown in the figure below, in mathematical form, the main idea of ​​the NFM model is to ⼀个表达能力更强的函数replace the inner product of the second-order hidden vector in the original FM.

insert image description here

This more expressive function is a neural network. Because neural networks can theoretically fit functions of any complex ability, the author replaced f(x) with a neural network. Of course, it is not a simple DNN, but still The bottom layer considers intersection, and then the DNN network used by the high layer is the NFM network.

1.1.1 NFM’s deep network partial model structure diagram

  • The characteristics of the NFM network architecture are very obvious, that is, it is added between the Embedding layer and the multi-layer neural network 特征交叉池化层(Bi-Interaction Pooling Layer).

  • The NFM architecture diagram shown omits its first-order part. If the first-order part of NFM is regarded as a linear model, then the architecture of NFM can also be regarded as the evolution of the Wide & Deep model. Compared with the original Wide & Deep model, the NFM model adds a feature cross-pooling layer to its Deep part to strengthen feature cross-over.

insert image description here

1.1.2 Feature cross pooling layer

insert image description here

  • After performing the element product operation of two Embedding vectors, the cross feature vectors are summed to obtain the output vector of the pooling layer.

  • The vector is then input into the upper multi-layer fully connected neural network (DNN) for further crossover.

1.2 Implementation of NFM model

NFM模型的实现在于特征交叉池化层,对原始的池化层公式进行化简:

insert image description here

import torch.nn as nn
import torch.nn.functional as F
import torch

class Dnn(nn.Module):
    """
    Dnn 网络
    """
    def __init__(self, hidden_units, dropout=0.):
        """
        hidden_units: 列表, 每个元素表示每一层的神经单元个数, 、
                      比如[256, 128, 64], 两层网络, 第一层神经单元128, 第二层64, 第一个维度是输入维度
        dropout: 失活率
        """
        super(Dnn, self).__init__()

        self.dnn_network = nn.ModuleList(
            [nn.Linear(layer[0], layer[1]) for layer in list(zip(hidden_units[:-1], hidden_units[1:]))])

        self.dropout = nn.Dropout(p=dropout)

    def forward(self, x):
        for linear in self.dnn_network:
            x = linear(x)
            x = F.relu(x)

        x = self.dropout(x)
        return x



class NFM(nn.Module):

    def __init__(self, feature_info, hidden_units, embed_dim=8):
        """
               DeepCrossing:
                   feature_info: 特征信息(数值特征, 类别特征, 类别特征embedding映射)
                   hidden_units: 列表, 隐藏单元
                   dropout: Dropout层的失活比例
                   embed_dim: embedding维度
               """
        super(NFM, self).__init__()

        self.dense_features, self.sparse_features, self.sparse_features_map = feature_info

        # embedding层, 这里需要一个列表的形式, 因为每个类别特征都需要embedding
        self.embed_layers = nn.ModuleDict(
            {
    
    
                'embed_' + str(key): nn.Embedding(num_embeddings=val, embedding_dim=embed_dim)
                for key, val in self.sparse_features_map.items()
            }
        )

        # 注意 这里的总维度  = 数值型特征的维度 + 离散型变量每个特征要embedding的维度
        dim_sum = len(self.dense_features) + embed_dim
        hidden_units.insert(0, dim_sum)

        # bn
        self.bn = nn.BatchNorm1d(dim_sum)

        # dnn网络
        self.dnn_network = Dnn(hidden_units)

        # dnn的线性层
        self.dnn_final_linear = nn.Linear(hidden_units[-1], 1)

    def forward(self, x):
        # 1、先把输入向量x分成两部分处理、因为数值型和类别型的处理方式不一样
        dense_input, sparse_inputs = x[:, :len(self.dense_features)], x[:, len(self.dense_features):]
        # 2、转换为long形
        sparse_inputs = sparse_inputs.long()

        # 2、不同的类别特征分别embedding  [(batch_size, embed_dim)]
        sparse_embeds = [
            self.embed_layers['embed_' + key](sparse_inputs[:, i]) for key, i in
            zip(self.sparse_features_map.keys(), range(sparse_inputs.shape[1]))
        ]
        # 3、embedding进行堆叠
        sparse_embeds = torch.stack(sparse_embeds) # (离散特征数, batch_size, embed_dim)
        sparse_embeds = sparse_embeds.permute((1,0,2))  # (batch_size, 离散特征数, embed_dim)

        # 这里得到embedding向量 sparse_embeds的shape为(batch_size, 离散特征数, embed_dim)
        # 然后就进行特征交叉层,按照特征交叉池化层化简后的公式  其代码如下
        # 注意:
        # 公式中的x_i乘以v_i就是 embedding后的sparse_embeds
        # 通过设置dim=1,把dim=1压缩(行的相同位置相加、去掉dim=1),即进行了特征交叉
        embed_cross = 1 / 2 * (
                torch.pow(torch.sum(sparse_embeds, dim=1), 2) - torch.sum(torch.pow(sparse_embeds, 2), dim=1)
        )  # (batch_size, embed_dim)

        # 4、数值型和类别型特征进行拼接  (batch_size, embed_dim + dense_input维度 )
        x = torch.cat([embed_cross, dense_input], dim=-1)

        x = self.bn(x)

        # Dnn部分,使用全部特征
        dnn_out = self.dnn_final_linear(self.dnn_network(x))

        # out
        outputs = torch.sigmoid(dnn_out)

        return outputs

if __name__ == '__main__':
    x = torch.rand(size=(2, 5), dtype=torch.float32)
    feature_info = [
        ['I1', 'I2'],  # 连续性特征
        ['C1', 'C2', 'C3'],  # 离散型特征
        {
    
    
            'C1': 20,
            'C2': 20,
            'C3': 20
        }
    ]
    # 建立模型
    hidden_units = [128, 64, 32]

    net = NFM(feature_info, hidden_units)
    print(net)
    print(net(x))
NFM(
  (embed_layers): ModuleDict(
    (embed_C1): Embedding(20, 8)
    (embed_C2): Embedding(20, 8)
    (embed_C3): Embedding(20, 8)
  )
  (bn): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dnn_network): Dnn(
    (dnn_network): ModuleList(
      (0): Linear(in_features=10, out_features=128, bias=True)
      (1): Linear(in_features=128, out_features=64, bias=True)
      (2): Linear(in_features=64, out_features=32, bias=True)
    )
    (dropout): Dropout(p=0.0, inplace=False)
  )
  (dnn_final_linear): Linear(in_features=32, out_features=1, bias=True)
)
tensor([[0.4627],
        [0.4660]], grad_fn=<SigmoidBackward0>)

2 Application of NFM model on Criteo data set

For data preprocessing, please refer to

Deep learning recommendation system (2) Deep Crossing and its application on the Criteo dataset

2.1 Prepare training data

import pandas as pd

import torch
from torch.utils.data import TensorDataset, Dataset, DataLoader

import torch.nn as nn
from sklearn.metrics import auc, roc_auc_score, roc_curve

import warnings
warnings.filterwarnings('ignore')
# 封装为函数
def prepared_data(file_path):
    # 读入训练集,验证集和测试集
    train_set = pd.read_csv(file_path + 'train_set.csv')
    val_set = pd.read_csv(file_path + 'val_set.csv')
    test_set = pd.read_csv(file_path + 'test.csv')

    # 这里需要把特征分成数值型和离散型
    # 因为后面的模型里面离散型的特征需要embedding, 而数值型的特征直接进入了stacking层, 处理方式会不一样
    data_df = pd.concat((train_set, val_set, test_set))

    # 数值型特征直接放入stacking层
    dense_features = ['I' + str(i) for i in range(1, 14)]
    # 离散型特征需要需要进行embedding处理
    sparse_features = ['C' + str(i) for i in range(1, 27)]

    # 定义一个稀疏特征的embedding映射, 字典{key: value},
    # key表示每个稀疏特征, value表示数据集data_df对应列的不同取值个数, 作为embedding输入维度
    sparse_feas_map = {
    
    }
    for key in sparse_features:
        sparse_feas_map[key] = data_df[key].nunique()


    feature_info = [dense_features, sparse_features, sparse_feas_map]  # 这里把特征信息进行封装, 建立模型的时候作为参数传入

    # 把数据构建成数据管道
    dl_train_dataset = TensorDataset(
        # 特征信息
        torch.tensor(train_set.drop(columns='Label').values).float(),
        # 标签信息
        torch.tensor(train_set['Label'].values).float()
    )

    dl_val_dataset = TensorDataset(
        # 特征信息
        torch.tensor(val_set.drop(columns='Label').values).float(),
        # 标签信息
        torch.tensor(val_set['Label'].values).float()
    )
    dl_train = DataLoader(dl_train_dataset, shuffle=True, batch_size=16)
    dl_vaild = DataLoader(dl_val_dataset, shuffle=True, batch_size=16)
    return feature_info,dl_train,dl_vaild,test_set
file_path = './preprocessed_data/'

feature_info,dl_train,dl_vaild,test_set = prepared_data(file_path)

2.2 Establish NFM model

from _01_nfm import NFM

hidden_units = [128, 64, 32]
net = NFM(feature_info, hidden_units)
# 测试一下模型
for feature, label in iter(dl_train):
    out = net(feature)
    print(feature.shape)
    print(out.shape)
    print(out)
    break

3.3 Model training

from AnimatorClass import Animator
from TimerClass import Timer


# 模型的相关设置
def metric_func(y_pred, y_true):
    pred = y_pred.data
    y = y_true.data
    return roc_auc_score(y, pred)


def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{
      
      i}')
    return torch.device('cpu')


def train_ch(net, dl_train, dl_vaild, num_epochs, lr, device):
    """⽤GPU训练模型"""
    print('training on', device)
    net.to(device)
    # 二值交叉熵损失
    loss_func = nn.BCELoss()
    optimizer = torch.optim.Adam(params=net.parameters(), lr=lr)

    animator = Animator(xlabel='epoch', xlim=[1, num_epochs],legend=['train loss', 'train auc', 'val loss', 'val auc']
                        ,figsize=(8.0, 6.0))
    timer, num_batches = Timer(), len(dl_train)
    log_step_freq = 10

    for epoch in range(1, num_epochs + 1):
        # 训练阶段
        net.train()
        loss_sum = 0.0
        metric_sum = 0.0

        for step, (features, labels) in enumerate(dl_train, 1):
            timer.start()
            # 梯度清零
            optimizer.zero_grad()

            # 正向传播
            predictions = net(features)
            loss = loss_func(predictions, labels.unsqueeze(1) )
            try:          # 这里就是如果当前批次里面的y只有一个类别, 跳过去
                metric = metric_func(predictions, labels)
            except ValueError:
                pass

            # 反向传播求梯度
            loss.backward()
            optimizer.step()
            timer.stop()

            # 打印batch级别日志
            loss_sum += loss.item()
            metric_sum += metric.item()

            if step % log_step_freq == 0:
                animator.add(epoch + step / num_batches,(loss_sum/step, metric_sum/step, None, None))

        # 验证阶段
        net.eval()
        val_loss_sum = 0.0
        val_metric_sum = 0.0


        for val_step, (features, labels) in enumerate(dl_vaild, 1):
            with torch.no_grad():
                predictions = net(features)
                val_loss = loss_func(predictions, labels.unsqueeze(1))
                try:
                    val_metric = metric_func(predictions, labels)
                except ValueError:
                    pass

            val_loss_sum += val_loss.item()
            val_metric_sum += val_metric.item()

            if val_step % log_step_freq == 0:
                animator.add(epoch + val_step / num_batches, (None,None,val_loss_sum / val_step , val_metric_sum / val_step))

        print(f'final: loss {
      
      loss_sum/len(dl_train):.3f}, auc {
      
      metric_sum/len(dl_train):.3f},'
              f' val loss {
      
      val_loss_sum/len(dl_vaild):.3f}, val auc {
      
      val_metric_sum/len(dl_vaild):.3f}')
        print(f'{
      
      num_batches * num_epochs / timer.sum():.1f} examples/sec on {
      
      str(device)}')
lr, num_epochs = 0.001, 10
train_ch(net, dl_train, dl_vaild, num_epochs, lr, try_gpu())

insert image description here

Guess you like

Origin blog.csdn.net/qq_44665283/article/details/132747687