Title

Title

Wide & Deep Learning for Recommender Systems

Summary

Present Wide & Deep learning framework to combine the strengths of wide linear model’s memorization and deep neural networks’ generalization.

Research Objective

Present the Wide & Deep learning framework to achieve both memorization and generalization in one model, by jointly training a linear model component and a neural network component.

Problem Statement

liner models with cross-product feature transformation
- pros
  - can memorize “exception rules” with fewer parameters (generalize well and make less relevant recommendations)
- cons
  - do not generalize to query-item feature paris that have not appeared in the training data
  - need manual feature engineering
emdedding based models e.g., factorization machines, deep neural networks
- pros
  - can generalize to previously unseen query-item feature paris by learning a low-dimensional dense embedding vector for each query and item feature, with less burden of feature engineering
- cons
  - dense embeddings will lead to nonzero predictions for all query-item pairs, and thus can over-generalize and make less relevant recommendations

Method(s)

The Wide Component

where x is a vector of d features. One of the most important transformations is the cross-product transformation.

cross-product transformation where is a boolean variable that is if the i-th feature is part of the k-th transformation , and otherwise. For binary features, a cross-product transformation (e.g., “AND(gender=female, language=en)”) is if and only if the constituent features (“gender=female” and “language=en”) are all , and otherwise.
The Deep Component

High-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, usually on the order of Wide&Deep - 图14 to Wide&Deep - 图15 .

Joint Training

For a logistic regression, the model predictions is
Wide&Deep - 图16
where Wide&Deep - 图17 is the binary class label, Wide&Deep - 图18 is cross product transformations of the original features Wide&Deep - 图19 , Wide&Deep - 图20 is the neural network’s final activation.

Evaluation

Data Generation

Continues real-valued features are normalized to Wide&Deep - 图21 by mapping a feature value Wide&Deep - 图22 to its cumulative distribution function Wide&Deep - 图23 , divided into Wide&Deep - 图24 quantiles. The normalized value is Wide&Deep - 图25 for values in the i-th quantiles.

App Acquisition

Serving Performance

Conclusion

Memorization and generalization are both important for
recommender systems.
Wide linear models can effectively memorize sparse feature interactions using cross-product feature transformations, while deep neural networks can generalize to previously unseen feature interactions through low-dimensional embeddings.

深度学习

Wide&Deep