🐾sklearn 学习之路 - 🦀 Simple Linear Regression - 《🧙🏼‍♂️Python入门》

1. 数据集 train_test_split
- train_test_split 参数
2. 构建模型，预测 linear_model.LinearRegression
导入模型，模型参数默认
训练模型 LR.fit
预测模型LR.predict(X_test),此时输出类别数据
打印截距
打印模型系数
均方误差值
R^2 coefficient of determination; higer, better
1-((y_test-LR.predict(X_test))2).sum()/((y_test - y_test.mean())2).sum
准确率accuracy
- #

更多基础统计知识，可前往【统计学】专栏，欢迎补充更正哦~~

#Simple LinearRegression
# 导包
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

1. 数据集 train_test_split

# 加载糖尿病数据集
diabetes = datasets.load_diabetes()
X = diabetes.data[:, np.newaxis, 2]  # diabetes.data[:,2],:, np.newaxis变成二维
y = diabetes.target
#X=pd.DataFrame(X)
#print(b)
#test_size：test占比例，random_state为整数即每次输出的数字都一样
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('\\nShape of X_train :',X_train.shape) #二维
print('Shape of y_train :',y_train.shape) #一维
###################################
Shape of X_train : (353, 1)
Shape of y_train : (353,)

train_test_split 参数

X_train,X_test,y_train,y_test=sklearn.model_selection.train_test_split(train_data,train_target,test_size=0.4, random_state=0,stratify=y_train) | train_data | 需划分的自变量 | | —- | —- | | train_target | 需划分的因变量 | | test_size | 划分比例，test的比例 | | random_state | 随机数种子，为0即每次运行划分不一样，整数即划分一样 | | stratify | 填X或y，按照其比例划分，如里面有A，B两个分类，结果其A,B比例和原始数据一样 |

2. 构建模型，预测 linear_model.LinearRegression

无法像R语言一样得出整个式子，只能分别求截距和系数 ```python
导入模型，模型参数默认
LR = linear_model.LinearRegression()
训练模型 LR.fit
LR.fit(X_train, y_train)
预测模型LR.predict(X_test),此时输出类别数据
打印截距
print(‘intercept:%.3f’ % LR.intercept)
打印模型系数
print(‘coef:%.3f’ % LR.coef)
均方误差值
print(‘Mean squared error: %.3f’ % mean_squared_error(y_test,LR.predict(X_test)))##((y_test-LR.predict(X_test))**2).mean()
R^2 coefficient of determination; higer, better
print(‘Variance score: %.3f’ % r2_score(y_test,LR.predict(X_test)))
1-((y_test-LR.predict(X_test))2).sum()/((y_test - y_test.mean())2).sum
准确率accuracy
print(‘score: %.3f’ % LR.score(X_test,y_test))

#

intercept:152.003 coef:998.578 Mean squared error: 4061.826 Variance score: 0.233 score: 0.233


- 主要看R^2，判断模型效果。越高表明模型拟合好
<a name="Eg4L4"></a>
## 3. 可视化
```python
# visualising the Traning set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, LR.predict(X_train), color = 'blue')
plt.title('Traning set')
#plt.xlabel('Year of Experience')
#plt.ylabel('Salary')
plt.show()
#visualise，绿色点为test测试点，红色点为拟合直线，由LR.predict(X_test)得出绘制
plt.scatter(X_test , y_test ,color ='green')
plt.plot(X_test ,LR.predict(X_test) ,color='red',linewidth =3)
plt.title('Test set')
plt.show()

蓝色是拟合线，也可以看出拟合很差

🦀 Simple Linear Regression

1. 数据集 train_test_split

train_test_split 参数

2. 构建模型，预测 linear_model.LinearRegression

导入模型，模型参数默认

训练模型 LR.fit

预测模型LR.predict(X_test),此时输出类别数据

打印截距

打印模型系数

均方误差值

R^2 coefficient of determination; higer, better

1-((y_test-LR.predict(X_test))2).sum()/((y_test - y_test.mean())2).sum

准确率accuracy

#