1.普通线性回归

  • 使用最小二乘法计算误差,使用方程法(准确求解,慢,不适用于高维)或梯度下降法(近似求解,快,适用高维)优化

    2.正则化线性回归

  1. Ridge Regression 岭回归
    • L2正则化,系数前加平方项,α越小数越大
    • sklearn.linear_model.Ridge(alpha=1, fit_intercept=True, solver=”auto”, normalize=False)
      • alpha(0-10)越大,正则化力度越大,权重系数越小
      • normalize:是否标准化
    • sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)
      • 可交叉验证
  2. Lasso回归
    • L1正则化,绝对值函数在零点不可导,最终生成稀疏矩阵
  3. Elastic Net 弹性网络
    • 前两者的综合,r1=0退化为岭回归,r=1退化为Lasso回归
  4. Early stopping
    • 通过限制错误率的阈值,进行停止

      3.代码

      ```python from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge, RidgeCV from sklearn.metrics import mean_squared_error import joblib import os

加载数据

boston = load_boston()

数据集划分

x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=0)

标准化数据

transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.fit_transform(x_test)

模型构建

if os.path.exists(‘./test.pkl’): estimator = joblib.load(‘test.pkl’) else: estimator = Ridge() estimator.fit(xtrain, y_train) print(“这个模型的偏置是:”, estimator.intercept) # 这个模型的偏置是: 22.6212871287129

模型保存

joblib.dump(estimator, ‘test.pkl’)

模型评估

y_pre = estimator.predict(x_test) print(“预测值是:”, y_pre) print(“准确率是:”, estimator.score(x_test, y_test)) # 准确率是: 0.6057861591433175 print(“均方误差是:”, mean_squared_error(y_test, y_pre)) # 均方误差是: 32.10021771755563 ```