数据
数据 | 样本量 | |
---|---|---|
local_train_splitByUser | 1,086,120 | |
local_test_splitByUser | 121,216 | |
item-info | 2,370,585 | |
reviews-info | 8,898,041 |
local_train_splitByUser数据例子:
#click user item cat itemList(用^B分割的) catList(用^B分割的)
0 AZPJ9LUT0FEPY B00AMNNTIA Literature & Fiction 03077444340062248391047053070709789246221590516400 BooksBooksBooksBooksBooks
1 AZPJ9LUT0FEPY 0800731603 Books 03077444340062248391047053070709789246221590516400 BooksBooksBooksBooksBooks
0 A2NRV79GKAU726 B003NNV10O Russian 0814472869007146207415839423000812538366B007IXVSBM19302784621482334291 BooksBooksBooksBooksBakingBooksBooks
local_test_splitByUser数据例子:
#click user item cat itemList(用^B分割的) catList(用^B分割的)
0 A3BI7R43VUZ1TY B00JNHU0T2 Literature & Fiction 0989464105B00B01691C14778097321608442845 BooksLiterature & FictionBooksBooks
1 A3BI7R43VUZ1TY 0989464121 Books 0989464105B00B01691C14778097321608442845 BooksLiterature & FictionBooksBooks
0 A2Z3AHJPXG3ZNP B0072YSPJ0 Literature & Fiction 147831096014922314521477603425B00FRKLA6Q BooksBooksBooksUrban
item-info数据例子:
#item cat
B00M029T4O Literature & Fiction
B00LZ7WVJ0 Abuse
B00M1336U0 Budgeting & Money Management
reviews-info数据例子:
#user item rating time
A10000012B7CGYKOMPQ4L 000100039X 5.0 1355616000
A2S166WSCFIFP5 000100039X 5.0 1071100800
A1BM81XB4QHOA3 000100039X 5.0 1390003200
uid_voc.pkl: dict(user, userId)
mid_voc.pkl: dict(item, itemId)
cat_voc:pkl: dict(cat, catId)
运行结果
DNN
跑通了,10iter结果为:
训练比较慢,只利用到了1个cpu,实际有48核可用。
DIN
跑通了,10iter结果为:
DIEN
跑不通。查找原因
疑问
- 数据量?训练时间?
- 多少维特征?
- 模型如何部署?
- 输入数据的例子?
超参调节
DNN
- 学习率和batch_size | 目录 | 实验参数 | 结果 | | —- | —- | —- | | dnn2 | 默认参数 | | | dnn1 | batch_size=128*8 | 最佳test_auc=0.7174 | | | | |
todo
- 提高gpu使用率
batch_size=1024