原文链接:https://segmentfault.com/a/1190000015283224
    我们处理feature的时候往往先要normalize encoding,使用python可以很容易做:

    1. from sklearn import preprocessing
    2. from scipy.stats import rankdata
    3. x = [[1], [3], [34], [21], [10], [12]]
    4. std_x = preprocessing.StandardScaler().fit_transform(x)
    5. norm_x= preprocessing.MinMaxScaler().fit_transform(x)
    6. norm_x2= preprocessing.LabelEncoder().fit_transform(x)
    7. print('std_x=\n', std_x)
    8. print('norm_x=\n', norm_x)
    9. print('norm_2=\n', norm_x2)
    10. print('oringial order =', rankdata(x))
    11. print('stand order =', rankdata(std_x))
    12. print('normalize order=', rankdata(norm_x))

    其中preprocessing.LabelEncoder().fit_transform(x)就是做normalize encoding,上面的程序输入如下:

    std_x=
     [[-1.1124854 ]
     [-0.93448773]
     [ 1.82447605]
     [ 0.66749124]
     [-0.31149591]
     [-0.13349825]]
    norm_x=
     [[0.        ]
     [0.06060606]
     [1.        ]
     [0.60606061]
     [0.27272727]
     [0.33333333]]
    norm_2=
     [0 1 5 4 2 3]
    oringial order = [1. 2. 6. 5. 3. 4.]
    stand order    = [1. 2. 6. 5. 3. 4.]
    normalize order= [1. 2. 6. 5. 3. 4.]
    

    可以看到normailize之后的结果是 [0 1 5 4 2 3]。这样做的好处是什么呢?
    下面图片转自知乎(https://www.zhihu.com/questio…
    image.png
    image.png
    image.png