准确度:计算Y_True / Y_predict之间的准确度

  1. import numpy as np
  2. from matplotlib import pyplot as plt
  3. from sklearn import datasets
  4. digits= datasets.load_digits()
  5. X = digits.data
  6. from sklearn.neighbors import KNeighborsClassifier
  7. from sklearn.model_selection import train_test_split
  8. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
  9. knn = KNeighborsClassifier(n_neighbors=3)
  10. knn.fit(X_train, y_train)
  11. y_predict = knn.predict(X_test)
  12. # 准确度
  13. sum(y_test == y_predict) / y_test.shape[0]
  14. # sklearn库的准确度
  15. knn.score(X_test, y_test)
  16. # sklearn准确度的函数
  17. from sklearn.metrics import accuracy_score
  18. accuracy_score(y_test, y_predict)

超参数

什么是超参数:在运行机器学习算法之前需要设置的参数。

寻找最好的K

  1. best_k = 1
  2. best_score =0.0
  3. weights = None
  4. for k in range(1, 11):
  5. for weight in ['uniform', 'distance']:
  6. knn1 = KNeighborsClassifier(n_neighbors=k, weights=weight)
  7. knn1.fit(X_train, y_train)
  8. score = knn1.score(X_test, y_test)
  9. if score > best_score:
  10. weights = weight
  11. best_score = score
  12. best_k = k
  13. print('best_k:%s' % best_k)
  14. print('best_score:%s' % best_score)
  15. print('best_weights:%s' % weights)

K近邻算法考虑距离的权重— 取距离的倒数。

网格搜索

  1. param_grid = [
  2. {
  3. 'weights': ['uniform'],
  4. 'n_neighbors': [i for i in range(1, 11)]
  5. },
  6. {
  7. 'weights': ['distance'],
  8. 'n_neighbors': [i for i in range(1, 11)],
  9. 'p': [i for i in range(1, 5)],
  10. },
  11. ]
  12. from sklearn.model_selection import GridSearchCV
  13. grid = GridSearchCV(KNeighborsClassifier(), param_grid, n_jobs=4)
  14. grid.fit(X_train, y_train)
  15. # 最佳模型
  16. grid.best_estimator_

最值归一化

什么是归一化:将数据集映射到0-1之间 为什么:量纲不同导致距离的计算会偏向某一个个特征。 适用于有明显的边界。例如分数。受outlier影响较大

  1. x = np.random.randint(0, 100, size=100)
  2. (x - np.min(x)) / (np.max(x) - np.min(x))

均值方差归一化

把所有的数据归一到均值是0方差为1的分布中

  1. std_data = (x - np.mean(x)) / np.std(x)

机器学习库的均值方差归一化和最值归一化

  1. from sklearn.preprocessing import StandardScaler, MinMaxScaler
  2. std = StandardScaler()
  3. std.fit_transform(x.reshape(-1, 1))