Title

xDeepFM: Combining Explicit and Implicit Feature Interactionsfor Recommender Systems

Summary

Combine the explicit high-order interaction module (CIN) with implicit interaction module and traditional FM module.
CIN: aims to learn high-order feature interactions explicitly with two virtues:

  1. it can learn certain bounded-degree feature interactions effectively.
  2. it learns feature interactions at a vector-wise level

    Research Objective

    xDeepFM model with:
  • No manual feature engineering
  • Neural network based model learns feature interactions explicitly.
  • Taking into account both memorization and generalization.

    Problem Statement

  • Learning to interact features without manual engineering is a meaningful task, it is promising to exploit DNNs to learn sophisticated and selective feature interactions.

  • DNNs focus more on high-order feature interactions while capture little low-order interactions. Hybrid architectures such as Wide&Deep and DeepFM can overcome this with both memorization and generalization.
  • DNNs model high-order feature interactions in an implicit fashion.
    • But DNNs model high-order feature interactions in an implicit fashion, no theoretical conclusion on what the maximum degree of feature interactions is.
    • DNNs model feature interactions at the bit-wise level, which is different from the traditional FM framework which models feature interactions at the vector-wise level.

      Method(s)

      Combine the explicit high-order interaction module with implicit interaction module and traditional FM module, and name the joint model eXtreme Deep Factorization Machine (xDeepFM).

      CIN

      We formulate the output of field embedding as a matrix:
      xDeepFM - 图4
      where xDeepFM - 图5is row number, xDeepFM - 图6 is the dimension of the fieldembedding, the i-th row in xDeepFM - 图7 is the embedding vector of the i-th field: xDeepFM - 图8.
      The output of the k-th layer in CIN is also a matrix:
      xDeepFM - 图9
      where xDeepFM - 图10 denotes the number of(embedding) feature vectors in the k-th layer and we let xDeepFM - 图11.
      For each layer, xDeepFM - 图12 is calculated via:
      xDeepFM - 图13
      where xDeepFM - 图14 and xDeepFM - 图15 denotes Hadamard product.
      微信截图_20200507185633.png
      If we introduce an intermediate tensor xDeepFM - 图17, which is the outer products of hidden layer xDeepFM - 图18 and original feature matrix xDeepFM - 图19. Then xDeepFM - 图20 can be regarded as a special type of image and xDeepFM - 图21 is a filter.

      There are xDeepFM - 图22 parameters at the k-th layer

We slide the filter across xDeepFM - 图23 along the embedding dimension (D) as shown in Figure (b), and get an hidden vector xDeepFM - 图24, like a feature map. Therefor, xDeepFM - 图25 is a collection of xDeepFM - 图26 differect feature maps.
微信截图_20200507190339.png
Figure (c) provides an overview of the architecture of CIN. Let xDeepFM - 图28 denotes the depth of the network. Every hidden layer xDeepFM - 图29 has a connection with output units.
First apply sum pooling on each feature map of the hidden layer:
xDeepFM - 图30for xDeepFM - 图31
Thus, we have a pooling vector xDeepFM - 图32 with length xDeepFM - 图33 for the k-th hidden layer.
Then concat all this polling vectors from hidden layers:
xDeepFM - 图34
If we use CIN directly for binary classification, the output unit is a sigmoid node on xDeepFM - 图35:
xDeepFM - 图36

Combination with Implicit Networks

微信截图_20200507194917.png
Its resulting output unit becomes:
xDeepFM - 图38
where xDeepFM - 图39 is raw features, xDeepFM - 图40 and xDeepFM - 图41 are outputs of the plain DNN and CIN respectively.
For binary classification, the loss function is:
xDeepFM - 图42

Evaluation

CIN Analysis

  • Space complexity
    • CIN
      • xDeepFM - 图43
      • when necessary, can exploit a L-order decomposition and replace xDeepFM - 图44with two smaller matrices xDeepFM - 图45andxDeepFM - 图46 where xDeepFM - 图47, complexity becomexDeepFM - 图48
    • DNN
      • xDeepFM - 图49
  • Time complexity (major downside)

    • CIN
      • xDeepFM - 图50
    • DNN
      • xDeepFM - 图51

        Experiments

        Performance of individual models
        image.png
  • Indicate that for practical datasets, higher-order interactions over sparse features are necessary, because DNN, CrossNet and CIN outperform FM significantly on all the three datasets.

  • CIN is the best individual model, which demonstrates the effectiveness of CIN on modeling explicit high-order feature interactions.

    Overall performance of different models

    image.png

  • LR is far worse than all the rest models, which demonstrates that factorization-based models are essential for measuring sparse features.

  • Wide&Deep, DCN, DeepFM and xDeepFM are significantly better than DNN, which directly reflects that, despite their simplicity, incorporating hybrid components are important for boosting the accuracy of predictive systems.
  • xDeepFM achieves the best performance on all datasets, which demonstrates that combining explicit and implicit high-order feature interaction is necessary.

    Hyper-Parameter Study

    Depth of Networkimage.pngimage.png

    Model performance degrades when the depth of network is set greater than 3. It is caused by overfitting evidenced by that we notice that the loss of training data still keeps decreasing when we add more hidden layers.

Number of Neurons per Layer

image.pngimage.png

In this experi- ment we fix the depth of network at 3.

Activation Function

image.pngimage.png

Identify function is indeed the most suitable one for neurons in CIN

Conclusion

xDeepFM can auto- matically learn high-order feature interactions in both explicit and implicit fashions, which is of great significance to reducing manual feature engineering work. We conduct comprehensive experiments and the results demonstrate that our xDeepFM outperforms state-of-the-art models consistently on three real-world datasets.

Notes(optional)

Outer Porduct

微信截图_20200507192319.png

Reference(optional)

DeepFM

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: A factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).

DCN

Ruoxi Wang, Bin Fu, Gang Fu, and MingliangWang. 2017. Deep & Cross Network for Ad Click Predictions. arXiv preprint arXiv:1708.05123 (2017).