one-hot encoding
举例:源数据
| item_id | f_a | f_b |
|---|---|---|
| 1 | a | o |
| 2 | b | p |
| 3 | c | q |
| 4 | d | r |
one-hot 转换后,映射表为:
| 序号 | col_name | col_value | mapping |
|---|---|---|---|
| 1 | f_a | a | 0 |
| 2 | f_a | b | 1 |
| 3 | f_a | c | 2 |
| 4 | f_a | d | 3 |
| 5 | f_b | o | 4 |
| 6 | f_b | p | 5 |
| 7 | f_b | q | 6 |
| 8 | f_b | r | 7 |
kv结果表为:
| item_id | f_a | f_b | kv |
|---|---|---|---|
| 1 | a | o | 0:1,4:1 |
| 2 | b | p | 1:1,5:1 |
| 3 | c | q | 2:1,6:1 |
| 4 | d | r | 3:1,7:1 |
稠密性结果表为:
| item_id | f_a | f_b | f_a_a_0 | f_a_b_1 | f_a_c_2 | f_a_d_3 | f_b_o_4 | f_b_p_5 | f_b_q_6 | f_b_r_7 |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | a | o | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 2 | b | p | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 3 | c | q | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 4 | d | r | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
label-encoding
还是上面的数据,先看编码表:
| f_a | f_b | f_a_index | f_b_index |
|---|---|---|---|
| a | o | 1 | 1 |
| b | p | 2 | 0 |
| c | q | 3 | 2 |
| d | r | 0 | 3 |
再来看结果表:
| item_id | f_a | f_b | f_a_index | f_b_index |
|---|---|---|---|---|
| 1 | a | o | 1 | 1 |
| 2 | b | p | 2 | 0 |
| 3 | c | q | 3 | 2 |
| 4 | d | r | 0 | 3 |
