One-Hot Encoding

  1. import pandas as pd
  2. from sklearn.preprocessing import OneHotEncoder
  3. from sklearn.preprocessing import LabelBinarizer
  4. from sklearn.feature_extraction import DictVectorizer
  5. data = pd.DataFrame({'name':['Tom','Andy','David'],'age':[20,21,22],'height':[175,165,180]})
  6. print(data)

LabelBinarizer()

get_dummies()
DictVectorizer()
feature_extraction模块中提供了DictVectorizer类,即把字典向量化,采用0/1二值向量化.

使用NLTK或者jieba分词


参考资料