1 DeepFM模型

1 DeepFM模型

1.0 Wide & Deep推荐框架

paper地址：https://github.com/talentlei/PaperList
文中设计了一种融合浅层（wide）模型和深层（deep）模型进行联合训练的框架，综合利用浅层模型的记忆能力和深层模型的泛化能力，实现单模型对推荐系统准确性和扩展性的兼顾。对提出的W&D模型，文中从推荐效果和服务性能两方面进行评价：

效果上，在Google Play 进行线上A/B实验，W&D模型相比高度优化的Wide浅层模型，app下载率+3.9%。相比deep模型也有一定提升。
性能上，通过切分一次请求需要处理的app 的Batch size为更小的size，并利用多线程并行请求达到提高处理效率的目的。单次响应耗时从31ms下降到14ms。

Wide部分

wode的部分就是线性模型，表示为 Task2 精排模型 DeepFM、DIN - 图1 ，x特征部分包括基础特征和交叉特征，交叉特征在wide部分很重要，可以捕捉到特征间的交互，起到添加非线性的作用。交叉特征可表示为:
Task2 精排模型 DeepFM、DIN - 图2

Deep部分

Deep部分是前馈网络模型，特征先转换为低纬稠密向量，维度通常 Task2 精排模型 DeepFM、DIN - 图3 .向量随机初始化，激活函数为Relu,前馈网络模型表示为：
Task2 精排模型 DeepFM、DIN - 图4

1.1 introduction

Paper: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

特征交叉对于CTR预估是至关重要的；
线性模型不适合学习特征交叉，通过人工特征工程之后可以学习部分特征交叉;
FM通常用于学习二阶特征交叉；
神经网络模型适于学习高阶特征交叉，基于CNN的模型适于学习相邻特征之间的交叉，基于RNN的模型适于学习带有时序依赖数据的交叉；
FNN：在DNN之前预训练FM模型；
PNN: 在embedding层与全连接层之间增加一个点积层；
Wide & Deep：线性模型 + deep模型，两个部分需要构建不同的输入。

论文主要贡献：

DeepFM包括FM部分和Deep部分。FM用于学习低阶特征交叉，Deep部分用于学习高阶特征交叉。相比于Wide & Deep，DeepFM可以不经过特征工程直接进行端到端的训练。
DeepFM共享输入和嵌入向量。

1.2 DeepFM 模型

1.2.1 FM部分

FM层主要由一阶特征和二阶特征组合，再经过Sigmoid, FM模型公式为：

1.2.2 Deep部分

该部分和Wide&Deep模型类似，是简单的前馈网络。在输入特征部分，由于原始特征向量多是高纬度,高度稀疏，连续和类别混合的分域特征，为了更好的发挥DNN模型学习high-order特征的能力，文中设计了一套子网络结构，将原始的稀疏表示特征映射为稠密的特征向量。

1.2.3 Embedding层
神经网络的输入应该是连续和稠密的，而CTR预估中原始的数据通常是高度稀疏和高维的，所以应该在原始数据与第一个隐藏层之间增加一个embedding层，将稀疏特征数据转换为稠密数据。

embedding layer的两个特性：
嵌入层的embedding向量和FM的隐向量有相同的维度。
用FM中的隐向量作为embedding向量。
1.2.4 Code

Pytorch实现 ```python import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim

from time import time

class DeepFM(nn.Module): “”” A DeepFM network with RMSE loss for rates prediction problem. There are two parts in the architecture of this network: fm part for low order interactions of features and deep part for higher order. In this network, we use bachnorm and dropout technology for all hidden layers, and “Adam” method for optimazation. You may find more details in this paper: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He. “””

def __init__(self, feature_sizes, embedding_size=4,
             hidden_dims=[32, 32], num_classes=10, dropout=[0.5, 0.5], 
             use_cuda=True, verbose=False):
    """
    Initialize a new network
    Inputs:
    - feature_size: A list of integer giving the size of features for each field.
    - embedding_size: An integer giving size of feature embedding.
    - hidden_dims: A list of integer giving the size of each hidden layer.
    - num_classes: An integer giving the number of classes to predict. For example,
    someone may rate 1,2,3,4 or 5 stars to a film.
    - batch_size: An integer giving size of instances used in each interation.
    - use_cuda: Bool, Using cuda or not
    - verbose: Bool
    """
    super().__init__()
    self.field_size = len(feature_sizes)
    self.feature_sizes = feature_sizes
    self.embedding_size = embedding_size
    self.hidden_dims = hidden_dims
    self.num_classes = num_classes
    self.dtype = torch.float
    """
    check if use cuda
    """
    if use_cuda and torch.cuda.is_available():
        self.device = torch.device('cuda')
    else:
        self.device = torch.device('cpu')
        """
        init fm part
        """
        #        self.fm_first_order_embeddings = nn.ModuleList(
        #            [nn.Embedding(feature_size, 1) for feature_size in self.feature_sizes])
        fm_first_order_Linears = nn.ModuleList(
            [nn.Linear(feature_size, self.embedding_size) for feature_size in self.feature_sizes[:13]])
        fm_first_order_embeddings = nn.ModuleList(
            [nn.Embedding(feature_size, self.embedding_size) for feature_size in self.feature_sizes[13:40]])
        self.fm_first_order_models = fm_first_order_Linears.extend(fm_first_order_embeddings)
        #        self.fm_second_order_embeddings = nn.ModuleList(
        #            [nn.Embedding(feature_size, self.embedding_size) for feature_size in self.feature_sizes])
        fm_second_order_Linears = nn.ModuleList(
            [nn.Linear(feature_size, self.embedding_size) for feature_size in self.feature_sizes[:13]])
        fm_second_order_embeddings = nn.ModuleList(
            [nn.Embedding(feature_size, self.embedding_size) for feature_size in self.feature_sizes[13:40]])
        self.fm_second_order_models = fm_second_order_Linears.extend(fm_second_order_embeddings)
        """
        init deep part
        """
        all_dims = [self.field_size * self.embedding_size] + \
        self.hidden_dims + [self.num_classes]
        for i in range(1, len(hidden_dims) + 1):
            setattr(self, 'linear_'+str(i),
                    nn.Linear(all_dims[i-1], all_dims[i]))
            # nn.init.kaiming_normal_(self.fc1.weight)
            setattr(self, 'batchNorm_' + str(i),
                    nn.BatchNorm1d(all_dims[i]))
            setattr(self, 'dropout_'+str(i),
                    nn.Dropout(dropout[i-1]))
            def forward(self, Xi, Xv):
                """
                Forward process of network. 
                Inputs:
                - Xi: A tensor of input's index, shape of (N, field_size, 1)
                - Xv: A tensor of input's value, shape of (N, field_size, 1)
                """
                """
                fm part
                """
                emb = self.fm_first_order_models[20]
                #        print(Xi.size())
                for num in Xi[:, 20, :][0]:
                    if num > self.feature_sizes[20]:
                        print("index out")
                        #        fm_first_order_emb_arr = [(torch.sum(emb(Xi[:, i, :]), 1).t() * Xv[:, i]).t() for i, emb in enumerate(self.fm_first_order_models)]
                        #        fm_first_order_emb_arr = [(emb(Xi[:, i, :]) * Xv[:, i])  for i, emb in enumerate(self.fm_first_order_models)]
                        fm_first_order_emb_arr = []
                        for i, emb in enumerate(self.fm_first_order_models):
                            if i <=12:
                                Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.float)
                                fm_first_order_emb_arr.append((torch.sum(emb(Xi_tem).unsqueeze(1), 1).t() * Xv[:, i]).t())
                            else:
                                Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.long)
                                fm_first_order_emb_arr.append((torch.sum(emb(Xi_tem), 1).t() * Xv[:, i]).t())
                                #        print("successful")      
                                #        print(len(fm_first_order_emb_arr))
                                fm_first_order = torch.cat(fm_first_order_emb_arr, 1)
                                # use 2xy = (x+y)^2 - x^2 - y^2 reduce calculation
                                #        fm_second_order_emb_arr = [(torch.sum(emb(Xi[:, i, :]), 1).t() * Xv[:, i]).t() for i, emb in enumerate(self.fm_second_order_models)]
                                # fm_second_order_emb_arr = [(emb(Xi[:, i]) * Xv[:, i]) for i, emb in enumerate(self.fm_second_order_embeddings)]
                                fm_second_order_emb_arr = []
                                for i, emb in enumerate(self.fm_second_order_models):
                                    if i <=12:
                                        Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.float)
                                        fm_second_order_emb_arr.append((torch.sum(emb(Xi_tem).unsqueeze(1), 1).t() * Xv[:, i]).t())
                                    else:
                                        Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.long)
                                        fm_second_order_emb_arr.append((torch.sum(emb(Xi_tem), 1).t() * Xv[:, i]).t())
                                        fm_sum_second_order_emb = sum(fm_second_order_emb_arr)
                                        fm_sum_second_order_emb_square = fm_sum_second_order_emb * \
                                        fm_sum_second_order_emb  # (x+y)^2
                                        fm_second_order_emb_square = [
                                            item*item for item in fm_second_order_emb_arr]
                                        fm_second_order_emb_square_sum = sum(
                                            fm_second_order_emb_square)  # x^2+y^2
                                        fm_second_order = (fm_sum_second_order_emb_square -
                                                           fm_second_order_emb_square_sum) * 0.5
                                        """
                                        deep part
                                        """
                                        #        print(len(fm_second_order_emb_arr))
                                        #        print(torch.cat(fm_second_order_emb_arr, 1).shape)
                                        deep_emb = torch.cat(fm_second_order_emb_arr, 1)
                                        deep_out = deep_emb
                                        for i in range(1, len(self.hidden_dims) + 1):
                                            deep_out = getattr(self, 'linear_' + str(i))(deep_out)
                                            deep_out = getattr(self, 'batchNorm_' + str(i))(deep_out)
                                            deep_out = getattr(self, 'dropout_' + str(i))(deep_out)
                                            #            print("successful") 
                                            """
                                            sum
                                            """
                                            #        print("1",torch.sum(fm_first_order, 1).shape)
                                            #        print("2",torch.sum(fm_second_order, 1).shape)
                                            #        print("deep",torch.sum(deep_out, 1).shape)
                                            #        print("bias",bias.shape)
                                            bias = torch.nn.Parameter(torch.randn(Xi.size(0)))
                                            total_sum = torch.sum(fm_first_order, 1) + \
                                            torch.sum(fm_second_order, 1) + \
                                            torch.sum(deep_out, 1) + bias
                                            return total_sum
                                        def fit(self, loader_train, loader_val, optimizer, epochs=1, verbose=False, print_every=5):
                                            """
                                            Training a model and valid accuracy.
                                            Inputs:
                                            - loader_train: I
                                            - loader_val: .
                                            - optimizer: Abstraction of optimizer used in training process, e.g., "torch.optim.Adam()""torch.optim.SGD()".
                                            - epochs: Integer, number of epochs.
                                            - verbose: Bool, if print.
                                            - print_every: Integer, print after every number of iterations. 
                                            """
                                            """
                                            load input data
                                            """
                                            model = self.train().to(device=self.device)
                                            criterion = F.binary_cross_entropy_with_logits
                                            for epoch in range(epochs):
                                                for t, (xi, xv, y) in enumerate(loader_train):
                                                    xi = xi.to(device=self.device, dtype=self.dtype)
                                                    xv = xv.to(device=self.device, dtype=torch.float)
                                                    y = y.to(device=self.device, dtype=self.dtype)
                                                    total = model(xi, xv)
                                                    #                print(total.shape)
                                                    #                print(y.shape)
                                                    loss = criterion(total, y)
                                                    optimizer.zero_grad()
                                                    loss.backward()
                                                    optimizer.step()
                                                    if verbose and t % print_every == 0:
                                                        print('Epoch %d Iteration %d, loss = %.4f' % (epoch, t, loss.item()))
                                                        self.check_accuracy(loader_val, model)
                                                        print()
                                                        def check_accuracy(self, loader, model):
                                                            if loader.dataset.train:
                                                                print('Checking accuracy on validation set')
                                                            else:
                                                                print('Checking accuracy on test set')   
                                                                num_correct = 0
                                                                num_samples = 0
                                                                model.eval()  # set model to evaluation mode
                                                                with torch.no_grad():
                                                                    for xi, xv, y in loader:
                                                                        xi = xi.to(device=self.device, dtype=self.dtype)  # move to device, e.g. GPU
                                                                        xv = xv.to(device=self.device, dtype=self.dtype)
                                                                        y = y.to(device=self.device, dtype=self.dtype)
                                                                        total = model(xi, xv)
                                                                        preds = (F.sigmoid(total) > 0.5).to(dtype=self.dtype)
                                                                        #                print(preds.dtype)
                                                                        #                print(y.dtype)
                                                                        #                print(preds.eq(y).cpu().sum())
                                                                        num_correct += (preds == y).sum()
                                                                        num_samples += preds.size(0)
                                                                        #                print("successful")
                                                                        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))


- torch-renhub实现
```python
from torch_rechub.basic.layers import FM, MLP, LR, EmbeddingLayer
from tqdm import tqdm
import torch
class DeepFM(torch.nn.Module):
    def __init__(self, deep_features, fm_features, mlp_params):
        """
        Deep和FM分别处理deep_features和fm_features两个不同的特征
        mlp_params表示MLP多层感知机的参数
        """
        super().__init__()
        self.deep_features = deep_features
        self.fm_features = fm_features
        self.deep_dims = sum([fea.embed_dim for fea in deep_features])
        self.fm_dims = sum([fea.embed_dim for fea in fm_features])
        # LR建模一阶特征交互
        self.linear = LR(self.fm_dims)
        # FM建模二阶特征交互
        self.fm = FM(reduce_sum=True)
        # 对特征做嵌入表征
        self.embedding = EmbeddingLayer(deep_features + fm_features)
        # 设置MLP多层感知机
        self.mlp = MLP(self.deep_dims, **mlp_params)
    def forward(self, x):
        # Dense Embeddings
        input_deep = self.embedding(x, self.deep_features, squeeze_dim=True) 
        input_fm = self.embedding(x, self.fm_features, squeeze_dim=False)
        y_linear = self.linear(input_fm.flatten(start_dim=1))
        y_fm = self.fm(input_fm)
        y_deep = self.mlp(input_deep)
        # 最终的预测值为一阶特征交互，二阶特征交互，以及深层模型的组合
        y = y_linear + y_fm + y_deep
        # 利用sigmoid将预测得分规整到0,1区间内
        return torch.sigmoid(y.squeeze(1))

Task2 精排模型 DeepFM、DIN

1 DeepFM模型

1.0 Wide & Deep推荐框架

1.1 introduction

1.2 DeepFM 模型

1.2.1 FM部分

1.2.2 Deep部分

1.2.3 Embedding层

1.2.4 Code