1 DeepFM模型

1.0 Wide & Deep推荐框架

paper地址:https://github.com/talentlei/PaperList
文中设计了一种融合浅层(wide)模型和深层(deep)模型进行联合训练的框架,综合利用浅层模型的记忆能力和深层模型的泛化能力,实现单模型对推荐系统准确性和扩展性的兼顾。对提出的W&D模型,文中从推荐效果和服务性能两方面进行评价:

  1. 效果上,在Google Play 进行线上A/B实验,W&D模型相比高度优化的Wide浅层模型,app下载率+3.9%。相比deep模型也有一定提升。
  2. 性能上,通过切分一次请求需要处理的app 的Batch size为更小的size,并利用多线程并行请求达到提高处理效率的目的。单次响应耗时从31ms下降到14ms。
  • Wide部分

wode的部分就是线性模型,表示为Task2 精排模型 DeepFM、DIN - 图1,x特征部分包括基础特征和交叉特征,交叉特征在wide部分很重要,可以捕捉到特征间的交互,起到添加非线性的作用。交叉特征可表示为:
Task2 精排模型 DeepFM、DIN - 图2

  • Deep部分

Deep部分是前馈网络模型,特征先转换为低纬稠密向量,维度通常Task2 精排模型 DeepFM、DIN - 图3.向量随机初始化,激活函数为Relu,前馈网络模型表示为:
Task2 精排模型 DeepFM、DIN - 图4

1.1 introduction

Paper: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

  1. 特征交叉对于CTR预估是至关重要的;
  2. 线性模型不适合学习特征交叉,通过人工特征工程之后可以学习部分特征交叉;
  3. FM通常用于学习二阶特征交叉;
  4. 神经网络模型适于学习高阶特征交叉,基于CNN的模型适于学习相邻特征之间的交叉,基于RNN的模型适于学习带有时序依赖数据的交叉;
  5. FNN:在DNN之前预训练FM模型;
  6. PNN: 在embedding层与全连接层之间增加一个点积层;
  7. Wide & Deep: 线性模型 + deep模型,两个部分需要构建不同的输入。

论文主要贡献:

  1. DeepFM包括FM部分和Deep部分。FM用于学习低阶特征交叉,Deep部分用于学习高阶特征交叉。相比于Wide & Deep,DeepFM可以不经过特征工程直接进行端到端的训练。
  2. DeepFM共享输入和嵌入向量。

    1.2 DeepFM 模型

    image.png

    1.2.1 FM部分

    image.png
    FM层主要由一阶特征和二阶特征组合,再经过Sigmoid, FM模型公式为:
    Task2 精排模型 DeepFM、DIN - 图7

    1.2.2 Deep部分

    image.png
    该部分和Wide&Deep模型类似,是简单的前馈网络。在输入特征部分,由于原始特征向量多是高纬度,高度稀疏,连续和类别混合的分域特征,为了更好的发挥DNN模型学习high-order特征的能力,文中设计了一套子网络结构,将原始的稀疏表示特征映射为稠密的特征向量。

    1.2.3 Embedding层

    神经网络的输入应该是连续和稠密的,而CTR预估中原始的数据通常是高度稀疏和高维的,所以应该在原始数据与第一个隐藏层之间增加一个embedding层,将稀疏特征数据转换为稠密数据。
    image.png
    embedding layer的两个特性:

  3. 嵌入层的embedding向量和FM的隐向量有相同的维度。

  4. 用FM中的隐向量作为embedding向量。

    1.2.4 Code

  • Pytorch实现 ```python import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim

from time import time

class DeepFM(nn.Module): “”” A DeepFM network with RMSE loss for rates prediction problem. There are two parts in the architecture of this network: fm part for low order interactions of features and deep part for higher order. In this network, we use bachnorm and dropout technology for all hidden layers, and “Adam” method for optimazation. You may find more details in this paper: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He. “””

  1. def __init__(self, feature_sizes, embedding_size=4,
  2. hidden_dims=[32, 32], num_classes=10, dropout=[0.5, 0.5],
  3. use_cuda=True, verbose=False):
  4. """
  5. Initialize a new network
  6. Inputs:
  7. - feature_size: A list of integer giving the size of features for each field.
  8. - embedding_size: An integer giving size of feature embedding.
  9. - hidden_dims: A list of integer giving the size of each hidden layer.
  10. - num_classes: An integer giving the number of classes to predict. For example,
  11. someone may rate 1,2,3,4 or 5 stars to a film.
  12. - batch_size: An integer giving size of instances used in each interation.
  13. - use_cuda: Bool, Using cuda or not
  14. - verbose: Bool
  15. """
  16. super().__init__()
  17. self.field_size = len(feature_sizes)
  18. self.feature_sizes = feature_sizes
  19. self.embedding_size = embedding_size
  20. self.hidden_dims = hidden_dims
  21. self.num_classes = num_classes
  22. self.dtype = torch.float
  23. """
  24. check if use cuda
  25. """
  26. if use_cuda and torch.cuda.is_available():
  27. self.device = torch.device('cuda')
  28. else:
  29. self.device = torch.device('cpu')
  30. """
  31. init fm part
  32. """
  33. # self.fm_first_order_embeddings = nn.ModuleList(
  34. # [nn.Embedding(feature_size, 1) for feature_size in self.feature_sizes])
  35. fm_first_order_Linears = nn.ModuleList(
  36. [nn.Linear(feature_size, self.embedding_size) for feature_size in self.feature_sizes[:13]])
  37. fm_first_order_embeddings = nn.ModuleList(
  38. [nn.Embedding(feature_size, self.embedding_size) for feature_size in self.feature_sizes[13:40]])
  39. self.fm_first_order_models = fm_first_order_Linears.extend(fm_first_order_embeddings)
  40. # self.fm_second_order_embeddings = nn.ModuleList(
  41. # [nn.Embedding(feature_size, self.embedding_size) for feature_size in self.feature_sizes])
  42. fm_second_order_Linears = nn.ModuleList(
  43. [nn.Linear(feature_size, self.embedding_size) for feature_size in self.feature_sizes[:13]])
  44. fm_second_order_embeddings = nn.ModuleList(
  45. [nn.Embedding(feature_size, self.embedding_size) for feature_size in self.feature_sizes[13:40]])
  46. self.fm_second_order_models = fm_second_order_Linears.extend(fm_second_order_embeddings)
  47. """
  48. init deep part
  49. """
  50. all_dims = [self.field_size * self.embedding_size] + \
  51. self.hidden_dims + [self.num_classes]
  52. for i in range(1, len(hidden_dims) + 1):
  53. setattr(self, 'linear_'+str(i),
  54. nn.Linear(all_dims[i-1], all_dims[i]))
  55. # nn.init.kaiming_normal_(self.fc1.weight)
  56. setattr(self, 'batchNorm_' + str(i),
  57. nn.BatchNorm1d(all_dims[i]))
  58. setattr(self, 'dropout_'+str(i),
  59. nn.Dropout(dropout[i-1]))
  60. def forward(self, Xi, Xv):
  61. """
  62. Forward process of network.
  63. Inputs:
  64. - Xi: A tensor of input's index, shape of (N, field_size, 1)
  65. - Xv: A tensor of input's value, shape of (N, field_size, 1)
  66. """
  67. """
  68. fm part
  69. """
  70. emb = self.fm_first_order_models[20]
  71. # print(Xi.size())
  72. for num in Xi[:, 20, :][0]:
  73. if num > self.feature_sizes[20]:
  74. print("index out")
  75. # fm_first_order_emb_arr = [(torch.sum(emb(Xi[:, i, :]), 1).t() * Xv[:, i]).t() for i, emb in enumerate(self.fm_first_order_models)]
  76. # fm_first_order_emb_arr = [(emb(Xi[:, i, :]) * Xv[:, i]) for i, emb in enumerate(self.fm_first_order_models)]
  77. fm_first_order_emb_arr = []
  78. for i, emb in enumerate(self.fm_first_order_models):
  79. if i <=12:
  80. Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.float)
  81. fm_first_order_emb_arr.append((torch.sum(emb(Xi_tem).unsqueeze(1), 1).t() * Xv[:, i]).t())
  82. else:
  83. Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.long)
  84. fm_first_order_emb_arr.append((torch.sum(emb(Xi_tem), 1).t() * Xv[:, i]).t())
  85. # print("successful")
  86. # print(len(fm_first_order_emb_arr))
  87. fm_first_order = torch.cat(fm_first_order_emb_arr, 1)
  88. # use 2xy = (x+y)^2 - x^2 - y^2 reduce calculation
  89. # fm_second_order_emb_arr = [(torch.sum(emb(Xi[:, i, :]), 1).t() * Xv[:, i]).t() for i, emb in enumerate(self.fm_second_order_models)]
  90. # fm_second_order_emb_arr = [(emb(Xi[:, i]) * Xv[:, i]) for i, emb in enumerate(self.fm_second_order_embeddings)]
  91. fm_second_order_emb_arr = []
  92. for i, emb in enumerate(self.fm_second_order_models):
  93. if i <=12:
  94. Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.float)
  95. fm_second_order_emb_arr.append((torch.sum(emb(Xi_tem).unsqueeze(1), 1).t() * Xv[:, i]).t())
  96. else:
  97. Xi_tem = Xi[:, i, :].to(device=self.device, dtype=torch.long)
  98. fm_second_order_emb_arr.append((torch.sum(emb(Xi_tem), 1).t() * Xv[:, i]).t())
  99. fm_sum_second_order_emb = sum(fm_second_order_emb_arr)
  100. fm_sum_second_order_emb_square = fm_sum_second_order_emb * \
  101. fm_sum_second_order_emb # (x+y)^2
  102. fm_second_order_emb_square = [
  103. item*item for item in fm_second_order_emb_arr]
  104. fm_second_order_emb_square_sum = sum(
  105. fm_second_order_emb_square) # x^2+y^2
  106. fm_second_order = (fm_sum_second_order_emb_square -
  107. fm_second_order_emb_square_sum) * 0.5
  108. """
  109. deep part
  110. """
  111. # print(len(fm_second_order_emb_arr))
  112. # print(torch.cat(fm_second_order_emb_arr, 1).shape)
  113. deep_emb = torch.cat(fm_second_order_emb_arr, 1)
  114. deep_out = deep_emb
  115. for i in range(1, len(self.hidden_dims) + 1):
  116. deep_out = getattr(self, 'linear_' + str(i))(deep_out)
  117. deep_out = getattr(self, 'batchNorm_' + str(i))(deep_out)
  118. deep_out = getattr(self, 'dropout_' + str(i))(deep_out)
  119. # print("successful")
  120. """
  121. sum
  122. """
  123. # print("1",torch.sum(fm_first_order, 1).shape)
  124. # print("2",torch.sum(fm_second_order, 1).shape)
  125. # print("deep",torch.sum(deep_out, 1).shape)
  126. # print("bias",bias.shape)
  127. bias = torch.nn.Parameter(torch.randn(Xi.size(0)))
  128. total_sum = torch.sum(fm_first_order, 1) + \
  129. torch.sum(fm_second_order, 1) + \
  130. torch.sum(deep_out, 1) + bias
  131. return total_sum
  132. def fit(self, loader_train, loader_val, optimizer, epochs=1, verbose=False, print_every=5):
  133. """
  134. Training a model and valid accuracy.
  135. Inputs:
  136. - loader_train: I
  137. - loader_val: .
  138. - optimizer: Abstraction of optimizer used in training process, e.g., "torch.optim.Adam()""torch.optim.SGD()".
  139. - epochs: Integer, number of epochs.
  140. - verbose: Bool, if print.
  141. - print_every: Integer, print after every number of iterations.
  142. """
  143. """
  144. load input data
  145. """
  146. model = self.train().to(device=self.device)
  147. criterion = F.binary_cross_entropy_with_logits
  148. for epoch in range(epochs):
  149. for t, (xi, xv, y) in enumerate(loader_train):
  150. xi = xi.to(device=self.device, dtype=self.dtype)
  151. xv = xv.to(device=self.device, dtype=torch.float)
  152. y = y.to(device=self.device, dtype=self.dtype)
  153. total = model(xi, xv)
  154. # print(total.shape)
  155. # print(y.shape)
  156. loss = criterion(total, y)
  157. optimizer.zero_grad()
  158. loss.backward()
  159. optimizer.step()
  160. if verbose and t % print_every == 0:
  161. print('Epoch %d Iteration %d, loss = %.4f' % (epoch, t, loss.item()))
  162. self.check_accuracy(loader_val, model)
  163. print()
  164. def check_accuracy(self, loader, model):
  165. if loader.dataset.train:
  166. print('Checking accuracy on validation set')
  167. else:
  168. print('Checking accuracy on test set')
  169. num_correct = 0
  170. num_samples = 0
  171. model.eval() # set model to evaluation mode
  172. with torch.no_grad():
  173. for xi, xv, y in loader:
  174. xi = xi.to(device=self.device, dtype=self.dtype) # move to device, e.g. GPU
  175. xv = xv.to(device=self.device, dtype=self.dtype)
  176. y = y.to(device=self.device, dtype=self.dtype)
  177. total = model(xi, xv)
  178. preds = (F.sigmoid(total) > 0.5).to(dtype=self.dtype)
  179. # print(preds.dtype)
  180. # print(y.dtype)
  181. # print(preds.eq(y).cpu().sum())
  182. num_correct += (preds == y).sum()
  183. num_samples += preds.size(0)
  184. # print("successful")
  185. acc = float(num_correct) / num_samples
  186. print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))
  1. - torch-renhub实现
  2. ```python
  3. from torch_rechub.basic.layers import FM, MLP, LR, EmbeddingLayer
  4. from tqdm import tqdm
  5. import torch
  6. class DeepFM(torch.nn.Module):
  7. def __init__(self, deep_features, fm_features, mlp_params):
  8. """
  9. Deep和FM分别处理deep_features和fm_features两个不同的特征
  10. mlp_params表示MLP多层感知机的参数
  11. """
  12. super().__init__()
  13. self.deep_features = deep_features
  14. self.fm_features = fm_features
  15. self.deep_dims = sum([fea.embed_dim for fea in deep_features])
  16. self.fm_dims = sum([fea.embed_dim for fea in fm_features])
  17. # LR建模一阶特征交互
  18. self.linear = LR(self.fm_dims)
  19. # FM建模二阶特征交互
  20. self.fm = FM(reduce_sum=True)
  21. # 对特征做嵌入表征
  22. self.embedding = EmbeddingLayer(deep_features + fm_features)
  23. # 设置MLP多层感知机
  24. self.mlp = MLP(self.deep_dims, **mlp_params)
  25. def forward(self, x):
  26. # Dense Embeddings
  27. input_deep = self.embedding(x, self.deep_features, squeeze_dim=True)
  28. input_fm = self.embedding(x, self.fm_features, squeeze_dim=False)
  29. y_linear = self.linear(input_fm.flatten(start_dim=1))
  30. y_fm = self.fm(input_fm)
  31. y_deep = self.mlp(input_deep)
  32. # 最终的预测值为一阶特征交互,二阶特征交互,以及深层模型的组合
  33. y = y_linear + y_fm + y_deep
  34. # 利用sigmoid将预测得分规整到0,1区间内
  35. return torch.sigmoid(y.squeeze(1))