One-Hot Encoding
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelBinarizer
from sklearn.feature_extraction import DictVectorizer
data = pd.DataFrame({'name':['Tom','Andy','David'],'age':[20,21,22],'height':[175,165,180]})
print(data)
LabelBinarizer()
get_dummies()
DictVectorizer()
feature_extraction模块中提供了DictVectorizer类,即把字典向量化,采用0/1二值向量化.
使用NLTK或者jieba分词
- 略