数据

数据 样本量
local_train_splitByUser 1,086,120
local_test_splitByUser 121,216
item-info 2,370,585
reviews-info 8,898,041

local_train_splitByUser数据例子:

  1. #click user item cat itemList(用^B分割的) catList(用^B分割的)
  2. 0 AZPJ9LUT0FEPY B00AMNNTIA Literature & Fiction 03077444340062248391047053070709789246221590516400 BooksBooksBooksBooksBooks
  3. 1 AZPJ9LUT0FEPY 0800731603 Books 03077444340062248391047053070709789246221590516400 BooksBooksBooksBooksBooks
  4. 0 A2NRV79GKAU726 B003NNV10O Russian 0814472869007146207415839423000812538366B007IXVSBM19302784621482334291 BooksBooksBooksBooksBakingBooksBooks

local_test_splitByUser数据例子:

  1. #click user item cat itemList(用^B分割的) catList(用^B分割的)
  2. 0 A3BI7R43VUZ1TY B00JNHU0T2 Literature & Fiction 0989464105B00B01691C14778097321608442845 BooksLiterature & FictionBooksBooks
  3. 1 A3BI7R43VUZ1TY 0989464121 Books 0989464105B00B01691C14778097321608442845 BooksLiterature & FictionBooksBooks
  4. 0 A2Z3AHJPXG3ZNP B0072YSPJ0 Literature & Fiction 147831096014922314521477603425B00FRKLA6Q BooksBooksBooksUrban

item-info数据例子:

  1. #item cat
  2. B00M029T4O Literature & Fiction
  3. B00LZ7WVJ0 Abuse
  4. B00M1336U0 Budgeting & Money Management

reviews-info数据例子:

  1. #user item rating time
  2. A10000012B7CGYKOMPQ4L 000100039X 5.0 1355616000
  3. A2S166WSCFIFP5 000100039X 5.0 1071100800
  4. A1BM81XB4QHOA3 000100039X 5.0 1390003200

uid_voc.pkl: dict(user, userId)
mid_voc.pkl: dict(item, itemId)
cat_voc:pkl: dict(cat, catId)

运行结果

DNN

跑通了,10iter结果为:
image.png
训练比较慢,只利用到了1个cpu,实际有48核可用。
image.png

DIN

跑通了,10iter结果为:
image.png

DIEN

跑不通。查找原因

疑问

  • 数据量?训练时间?

image.png
image.png

  • 多少维特征?
  • 模型如何部署?
  • 输入数据的例子?

超参调节

DNN

  • 学习率和batch_size | 目录 | 实验参数 | 结果 | | —- | —- | —- | | dnn2 | 默认参数 | | | dnn1 | batch_size=128*8 | 最佳test_auc=0.7174 | | | | |

todo

  • 提高gpu使用率

batch_size=1024
image.png