相似性度量

  1. 欧式距离 机器学习评价指标 - 图1%7D%5E2#card=math&code=%7B%28x-y%29%7D%5E2&id=ZVKNM)
  2. 余弦相似度 机器学习评价指标 - 图2 两个样本点愈相似,则相似系数值愈接近1;样本点愈不相似,则相似系数值愈 接近0

评估指标

模型评估(聚类评估)
  1. 1)轮廓系数:![](https://g.yuque.com/gr/latex?s(i)%3D%5Cfrac%20%7Bb(i)-a(i)%7D%20%7Bmax%5Cleft%20%5C%7B%20a(i)%2Cb(i)%20%5Cright%20%5C%7D%7D#card=math&code=s%28i%29%3D%5Cfrac%20%7Bb%28i%29-a%28i%29%7D%20%7Bmax%5Cleft%20%5C%7B%20a%28i%29%2Cb%28i%29%20%5Cright%20%5C%7D%7D&id=C4YJX)
  2. 轮廓系数是聚类效果好坏的一种评价方式。最佳值为1,最差值为-1。接近0的值表示重叠的群集。负值通常表示样本已分配给错误的聚类,因为不同的聚类更为相似。
  3. 2 熵:![](https://g.yuque.com/gr/latex?Entropy(S)%20%3D%20Entropy(p_1%2C...p_n)%20%3D%20-%20%5Csum_%7Bi%3D1%7D%5E%7Bn%7Dp_ilog_2(p_i)#card=math&code=Entropy%28S%29%20%3D%20Entropy%28p_1%2C...p_n%29%20%3D%20-%20%5Csum_%7Bi%3D1%7D%5E%7Bn%7Dp_ilog_2%28p_i%29&id=y2JGE)
  4. 熵越小,数据越纯;熵越大,数据越混乱。
  5. 注:_熵为0的时候,所有样本的目标属性取值相同。_
  6. 3)纯度:![](https://g.yuque.com/gr/latex?purity%20%3D%20%5Csum_%7Bi%3D1%7D%5E%7Bk%7D%5Cfrac%7Bm_i%7D%7Bm%7DP_i#card=math&code=purity%20%3D%20%5Csum_%7Bi%3D1%7D%5E%7Bk%7D%5Cfrac%7Bm_i%7D%7Bm%7DP_i&id=L11Ab) ,其中 ![](https://g.yuque.com/gr/latex?P_i%20%3D%20max(P_%7Bij%7D)#card=math&code=P_i%20%3D%20max%28P_%7Bij%7D%29&id=N3E3I)
  1. #轮廓系数
  2. from sklearn.metrics import silhouette_score
  3. score = silhouette_score(weight, KM_model.labels_, metric='euclidean')
  4. print("轮廓系数为:",score)
  1. # 纯度
  2. true = result.max(axis=1)
  3. total = result.sum(axis=1)
  4. accuracy = true / total
  5. print("各簇的纯度:\n", accuracy)
  6. #print("总纯度: %.2f" % (accuracy.sum(axis=0) / 3))
  7. print("总纯度: %.2f" % (true.sum() / total.sum()))
  1. # 熵
  2. import math
  3. len = len(df_news['category'].unique())
  4. e_i = [0 for x in range(0,len)]
  5. m=0
  6. result = np.array(result)
  7. print("每个聚类的熵为:")
  8. for i in range(0,27):
  9. for j in range(0,len):
  10. m += result[i][j]
  11. p_i_j = result[i][j]*1.0/result[i].sum() + 0.00000001
  12. e_i[i] += 0 - p_i_j * math.log(p_i_j,2)
  13. print("第",i,"类 :",e_i[i])
  14. e=0
  15. for i in range(0,27):
  16. e += result[i].sum()/m*e_i[i]
  17. print("整体熵 %.6f" %e)

1 分类指标评价计算示例
  1. ## accuracy
  2. import numpy as np
  3. from sklearn.metrics import accuracy_score
  4. y_pred = [0, 1, 0, 1]
  5. y_true = [0, 1, 1, 1]
  6. print('ACC:',accuracy_score(y_true, y_pred))
  1. ## Precision,Recall,F1-score
  2. from sklearn import metrics
  3. y_pred = [0, 1, 0, 0]
  4. y_true = [0, 1, 0, 1]
  5. print('Precision',metrics.precision_score(y_true, y_pred))
  6. print('Recall',metrics.recall_score(y_true, y_pred))
  7. print('F1-score:',metrics.f1_score(y_true, y_pred))
  1. ## AUC
  2. import numpy as np
  3. from sklearn.metrics import roc_auc_score
  4. y_true = np.array([0, 0, 1, 1])
  5. y_scores = np.array([0.1, 0.4, 0.35, 0.8])
  6. print('AUC socre:',roc_auc_score(y_true, y_scores))

2 回归指标评价计算示例
  1. # coding=utf-8
  2. import numpy as np
  3. from sklearn import metrics
  4. # MAPE需要自己实现
  5. def mape(y_true, y_pred):
  6. return np.mean(np.abs((y_pred - y_true) / y_true))
  7. y_true = np.array([1.0, 5.0, 4.0, 3.0, 2.0, 5.0, -3.0])
  8. y_pred = np.array([1.0, 4.5, 3.8, 3.2, 3.0, 4.8, -2.2])
  9. # MSE
  10. print('MSE:',metrics.mean_squared_error(y_true, y_pred))
  11. # RMSE
  12. print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true, y_pred)))
  13. # MAE 本次比赛所用
  14. print('MAE:',metrics.mean_absolute_error(y_true, y_pred))
  15. # MAPE
  16. print('MAPE:',mape(y_true, y_pred))
  1. ## R2-score
  2. from sklearn.metrics import r2_score
  3. y_true = [3, -0.5, 2, 7]
  4. y_pred = [2.5, 0.0, 2, 8]
  5. print('R2-score:',r2_score(y_true, y_pred))

损失函数

平方损失函数
机器学习评价指标 - 图3%3D1%2FN%20%5Csum%7Bi%3D1%7D%5EN(y_i%E2%88%92t_i)%5E2%0A#card=math&code=L%28x%29%3D1%2FN%20%5Csum%7Bi%3D1%7D%5EN%28yi%E2%88%92t_i%29%5E2%0A&id=xq5vn)
交叉熵损失函数
机器学习评价指标 - 图4%3D%E2%88%92%20%5Csum
%7Bx%7Dp(x)log(q(x))%0A#card=math&code=H%28p%2Cq%29%3D%E2%88%92%20%5Csum_%7Bx%7Dp%28x%29log%28q%28x%29%29%0A&id=TNHjG)