one-hot encoding
举例:源数据
item_id | f_a | f_b |
---|---|---|
1 | a | o |
2 | b | p |
3 | c | q |
4 | d | r |
one-hot 转换后,映射表为:
序号 | col_name | col_value | mapping |
---|---|---|---|
1 | f_a | a | 0 |
2 | f_a | b | 1 |
3 | f_a | c | 2 |
4 | f_a | d | 3 |
5 | f_b | o | 4 |
6 | f_b | p | 5 |
7 | f_b | q | 6 |
8 | f_b | r | 7 |
kv结果表为:
item_id | f_a | f_b | kv |
---|---|---|---|
1 | a | o | 0:1,4:1 |
2 | b | p | 1:1,5:1 |
3 | c | q | 2:1,6:1 |
4 | d | r | 3:1,7:1 |
稠密性结果表为:
item_id | f_a | f_b | f_a_a_0 | f_a_b_1 | f_a_c_2 | f_a_d_3 | f_b_o_4 | f_b_p_5 | f_b_q_6 | f_b_r_7 |
---|---|---|---|---|---|---|---|---|---|---|
1 | a | o | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
2 | b | p | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
3 | c | q | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
4 | d | r | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
label-encoding
还是上面的数据,先看编码表:
f_a | f_b | f_a_index | f_b_index |
---|---|---|---|
a | o | 1 | 1 |
b | p | 2 | 0 |
c | q | 3 | 2 |
d | r | 0 | 3 |
再来看结果表:
item_id | f_a | f_b | f_a_index | f_b_index |
---|---|---|---|---|
1 | a | o | 1 | 1 |
2 | b | p | 2 | 0 |
3 | c | q | 3 | 2 |
4 | d | r | 0 | 3 |