1. Ridge回归的损失函数
2. 数据获取与预处理
数据介绍:http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant
数据下载地址:http://archive.ics.uci.edu/ml/machine-learning-databases/00294/
里面是一个循环发电场的数据,共有9568个样本数据,每个数据有5列,分别是:AT(温度), V(压力), AP(湿度), RH(压强), PE(输出电力)。我们不用纠结于每项具体的意思。
我们的问题是得到一个线性的关系,对应PE是样本输出,而AT/V/AP/RH这4个是样本特征, 机器学习的目的就是通过调节超参数𝛼α得到一个线性回归模型,即:
3. 数据读取与训练集测试集划分
倒入库的声明
import pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import datasets, linear_model
用pandas读取数据
data = pd.read_csv("./data/CCPP/CCPP.csv")data.head()

我们用AT, V,AP和RH这4个列作为样本特征。用PE作为样本输出
X = data[["AT", "V", "AP", "RH"]]y = data[["PE"]]
划分训练集和测试集
from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
4. 用scikit-learn运行Ridge回归
暂定参数a = 1,后面用交叉验证来选取最优化的参数
from sklearn.linear_model import Ridgeridge = Ridge(alpha=1)ridge.fit(X_train, y_train)print(ridge.coef_)print(ridge.intercept_)# Output:[[-1.96862642 -0.23930532 0.05685793 -0.15860993]][460.04983649]
5. 用scikit-learn选择Ridge回归超参数𝛼
from sklearn.linear_model import RidgeCVridgeCV = RidgeCV(alphas=[0.01, 0.1, 0.5, 1, 3, 5, 6, 6.5, 7, 10, 20, 100])ridgeCV.fit(X_train, y_train)ridgeCV.alpha_# Output:7
输出结果为:7.0,说明在我们给定的这组超参数中, 7是最优的α值。
6.模型评价
y_pred = ridgeCV.predict(X_test)from sklearn import metrics# 用scikit-learn计算MSEprint("MSE:%d",metrics.mean_squared_error(y_test, y_pred))# 用scikit-learn计算RMSEprint("RMSE:%data",np.sqrt(metrics.mean_squared_error(y_test, y_pred)))# Output:MSE:%d 20.83732091966483RMSE:%data 4.5647914431729335
7.交叉验证
from sklearn.model_selection import cross_val_predictpredicted = cross_val_predict(ridgeCV, X, y, cv=10)# 用scikit-learn计算MSEprint("MSE:%d",metrics.mean_squared_error(y, predicted))# 用scikit-learn计算RMSEprint("RMSE:%data",np.sqrt(metrics.mean_squared_error(y, predicted)))# Output:MSE:%d 20.793674972051814RMSE:%data 4.560008220612307
9. 画图观察结果
fig, ax = plt.subplots()ax.scatter(y, predicted)ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=4)ax.set_xlabel('Measured')ax.set_ylabel('Predicted')plt.show()

