相似性度量
- 欧式距离
%7D%5E2#card=math&code=%7B%28x-y%29%7D%5E2&id=ZVKNM)
- 余弦相似度
两个样本点愈相似,则相似系数值愈接近1;样本点愈不相似,则相似系数值愈 接近0
评估指标
模型评估(聚类评估)
(1)轮廓系数:%3D%5Cfrac%20%7Bb(i)-a(i)%7D%20%7Bmax%5Cleft%20%5C%7B%20a(i)%2Cb(i)%20%5Cright%20%5C%7D%7D#card=math&code=s%28i%29%3D%5Cfrac%20%7Bb%28i%29-a%28i%29%7D%20%7Bmax%5Cleft%20%5C%7B%20a%28i%29%2Cb%28i%29%20%5Cright%20%5C%7D%7D&id=C4YJX)轮廓系数是聚类效果好坏的一种评价方式。最佳值为1,最差值为-1。接近0的值表示重叠的群集。负值通常表示样本已分配给错误的聚类,因为不同的聚类更为相似。(2) 熵:%20%3D%20Entropy(p_1%2C...p_n)%20%3D%20-%20%5Csum_%7Bi%3D1%7D%5E%7Bn%7Dp_ilog_2(p_i)#card=math&code=Entropy%28S%29%20%3D%20Entropy%28p_1%2C...p_n%29%20%3D%20-%20%5Csum_%7Bi%3D1%7D%5E%7Bn%7Dp_ilog_2%28p_i%29&id=y2JGE)熵越小,数据越纯;熵越大,数据越混乱。注:_熵为0的时候,所有样本的目标属性取值相同。_(3)纯度: ,其中 #card=math&code=P_i%20%3D%20max%28P_%7Bij%7D%29&id=N3E3I)
#轮廓系数from sklearn.metrics import silhouette_scorescore = silhouette_score(weight, KM_model.labels_, metric='euclidean')print("轮廓系数为:",score)
# 纯度true = result.max(axis=1)total = result.sum(axis=1)accuracy = true / totalprint("各簇的纯度:\n", accuracy)#print("总纯度: %.2f" % (accuracy.sum(axis=0) / 3))print("总纯度: %.2f" % (true.sum() / total.sum()))
# 熵import mathlen = len(df_news['category'].unique())e_i = [0 for x in range(0,len)]m=0result = np.array(result)print("每个聚类的熵为:")for i in range(0,27):for j in range(0,len):m += result[i][j]p_i_j = result[i][j]*1.0/result[i].sum() + 0.00000001e_i[i] += 0 - p_i_j * math.log(p_i_j,2)print("第",i,"类 :",e_i[i])e=0for i in range(0,27):e += result[i].sum()/m*e_i[i]print("整体熵 %.6f" %e)
1 分类指标评价计算示例
## accuracyimport numpy as npfrom sklearn.metrics import accuracy_scorey_pred = [0, 1, 0, 1]y_true = [0, 1, 1, 1]print('ACC:',accuracy_score(y_true, y_pred))
## Precision,Recall,F1-scorefrom sklearn import metricsy_pred = [0, 1, 0, 0]y_true = [0, 1, 0, 1]print('Precision',metrics.precision_score(y_true, y_pred))print('Recall',metrics.recall_score(y_true, y_pred))print('F1-score:',metrics.f1_score(y_true, y_pred))
## AUCimport numpy as npfrom sklearn.metrics import roc_auc_scorey_true = np.array([0, 0, 1, 1])y_scores = np.array([0.1, 0.4, 0.35, 0.8])print('AUC socre:',roc_auc_score(y_true, y_scores))
2 回归指标评价计算示例
# coding=utf-8import numpy as npfrom sklearn import metrics# MAPE需要自己实现def mape(y_true, y_pred):return np.mean(np.abs((y_pred - y_true) / y_true))y_true = np.array([1.0, 5.0, 4.0, 3.0, 2.0, 5.0, -3.0])y_pred = np.array([1.0, 4.5, 3.8, 3.2, 3.0, 4.8, -2.2])# MSEprint('MSE:',metrics.mean_squared_error(y_true, y_pred))# RMSEprint('RMSE:',np.sqrt(metrics.mean_squared_error(y_true, y_pred)))# MAE 本次比赛所用print('MAE:',metrics.mean_absolute_error(y_true, y_pred))# MAPEprint('MAPE:',mape(y_true, y_pred))
## R2-scorefrom sklearn.metrics import r2_scorey_true = [3, -0.5, 2, 7]y_pred = [2.5, 0.0, 2, 8]print('R2-score:',r2_score(y_true, y_pred))
损失函数
平方损失函数%3D1%2FN%20%5Csum%7Bi%3D1%7D%5EN(y_i%E2%88%92t_i)%5E2%0A#card=math&code=L%28x%29%3D1%2FN%20%5Csum%7Bi%3D1%7D%5EN%28yi%E2%88%92t_i%29%5E2%0A&id=xq5vn)
交叉熵损失函数%3D%E2%88%92%20%5Csum%7Bx%7Dp(x)log(q(x))%0A#card=math&code=H%28p%2Cq%29%3D%E2%88%92%20%5Csum_%7Bx%7Dp%28x%29log%28q%28x%29%29%0A&id=TNHjG)
