数据描述

波士顿房价数据集(Boston House Price Dataset)(https://www.kaggle.com/c/boston-housing

boston_housing数据集对房价数据进行回归分析,数据来自1970年代,波斯顿周边地区的房价,是用于机器学习的经典数据集。该数据集很小,共计506条数据

每条数据包含房屋以及房屋周围的详细信息。其中包含城镇犯罪率,一氧化氮浓度,住宅平均房间数,到中心区域的加权距离以及自住房平均房价等等。

  • CRIM:城镇人均犯罪率。
  • ZN:住宅用地超过 25000 sq.ft. 的比例。
  • INDUS:城镇非零售商用土地的比例。
  • CHAS:查理斯河空变量(如果边界是河流,则为1;否则为0)。
  • NOX:一氧化氮浓度。
  • RM:住宅平均房间数。
  • AGE:1940 年之前建成的自用房屋比例。
  • DIS:到波士顿五个中心区域的加权距离。
  • RAD:辐射性公路的接近指数。
  • TAX:每 10000 美元的全值财产税率。
  • PTRATIO:城镇师生比例。
  • B:1000(Bk-0.63)^ 2,其中 Bk 指代城镇中黑人的比例。
  • LSTAT:人口中地位低下者的比例。
  • MEDV:自住房的平均房价,以千美元计。

keras普通实现

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from keras.datasets import boston_housing
  4. from keras.models import Sequential
  5. from keras.layers import Dense
  6. from keras.callbacks import EarlyStopping
  7. from sklearn.preprocessing import StandardScaler
  8. def load_data():
  9. (x_train, y_train), (x_test, y_test) = boston_housing.load_data()
  10. x = np.vstack((x_train,x_test))
  11. y = np.concatenate((y_train, y_test))
  12. x = StandardScaler().fit_transform(x)
  13. y = StandardScaler().fit_transform(y.reshape(-1, 1))
  14. x_train = x[1:401, :]
  15. x_test = x[401:, :]
  16. y_train = y[1:401, :]
  17. y_test = y[401:, :]
  18. return (x_train, y_train), (x_test, y_test)
  19. def draw_train_history(history):
  20. # summarize history for loss
  21. plt.plot(history.history['loss'])
  22. plt.plot(history.history['val_loss'])
  23. plt.title('model loss')
  24. plt.ylabel('loss')
  25. plt.xlabel('epoch')
  26. plt.legend(['train', 'validation'], loc='upper left')
  27. plt.show()
  28. def build_model():
  29. model = Sequential()
  30. model.add(Dense(64, activation='relu', input_shape=(13,)))
  31. model.add(Dense(64, activation='relu'))
  32. model.add(Dense(1, activation='linear'))
  33. model.compile(optimizer='adam',
  34. loss='mean_squared_error')
  35. return model
  36. if __name__ == '__main__':
  37. (x_train, y_train), (x_test, y_test) = load_data()
  38. model = build_model()
  39. early_stopping = EarlyStopping(monitor='loss', patience=10)
  40. history = model.fit(x_train, y_train,
  41. epochs=500,
  42. batch_size=64,
  43. validation_split=0.2,
  44. callbacks=[early_stopping])
  45. draw_train_history(history)
  46. loss = model.evaluate(x_test, y_test, batch_size=64)
  47. print("test loss: {}".format(loss))

模型结构

回归-keras - 图1

模型输出

  1. test loss: 0.20772670450664701

模型损失曲线

回归-keras - 图2

keras集成学习-平均法

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from keras.datasets import boston_housing
  4. from keras.models import Model
  5. from keras.layers import Input, Dense, average
  6. from keras.callbacks import EarlyStopping
  7. from sklearn.preprocessing import StandardScaler
  8. def load_data():
  9. (x_train, y_train), (x_test, y_test) = boston_housing.load_data()
  10. x = np.vstack((x_train,x_test))
  11. y = np.concatenate((y_train, y_test))
  12. x = StandardScaler().fit_transform(x)
  13. y = StandardScaler().fit_transform(y.reshape(-1, 1))
  14. x_train = x[1:401, :]
  15. x_test = x[401:, :]
  16. y_train = y[1:401, :]
  17. y_test = y[401:, :]
  18. return (x_train, y_train), (x_test, y_test)
  19. def draw_train_history(history):
  20. # summarize history for loss
  21. plt.plot(history.history['loss'])
  22. plt.plot(history.history['val_loss'])
  23. plt.title('model loss')
  24. plt.ylabel('loss')
  25. plt.xlabel('epoch')
  26. plt.legend(['train', 'validation'], loc='upper left')
  27. plt.show()
  28. def build_model():
  29. inputs = Input(shape=(13, ))
  30. model1_1 = Dense(64, activation='relu')(inputs)
  31. model2_1 = Dense(128, activation='relu')(inputs)
  32. model3_1 = Dense(32, activation='relu')(inputs)
  33. model1_2 = Dense(32, activation='relu')(model1_1)
  34. model2_2 = Dense(64, activation='relu')(model2_1)
  35. model3_2 = Dense(16, activation='relu')(model3_1)
  36. model1_out = Dense(1, activation='linear')(model1_2)
  37. model2_out = Dense(1, activation='linear')(model2_2)
  38. model3_out = Dense(1, activation='linear')(model3_2)
  39. out = average([model1_out, model2_out, model3_out])
  40. model = Model(inputs=inputs, outputs=out)
  41. model.compile(optimizer='adam',
  42. loss='mean_squared_error')
  43. return model
  44. if __name__ == '__main__':
  45. (x_train, y_train), (x_test, y_test) = load_data()
  46. model = build_model()
  47. early_stopping = EarlyStopping(monitor='loss', patience=10)
  48. history = model.fit(x_train, y_train,
  49. epochs=500,
  50. batch_size=64,
  51. validation_split=0.2,
  52. callbacks=[early_stopping])
  53. draw_train_history(history)
  54. model.save("regression-average-ensemble.h5")
  55. loss = model.evaluate(x_test, y_test, batch_size=64)
  56. print("test loss: {}".format(loss))

模型结构

回归-keras - 图3

模型输出

  1. test loss: 0.14132633038929532

模型损失曲线

回归-keras - 图4

keras-scikit_learn集成学习

  1. import numpy as np
  2. from keras.datasets import boston_housing
  3. from keras.wrappers.scikit_learn import KerasRegressor
  4. from keras.models import Sequential
  5. from keras.layers import Dense
  6. from sklearn.preprocessing import StandardScaler
  7. from sklearn.ensemble import VotingRegressor
  8. from sklearn.externals import joblib
  9. def load_data():
  10. (x_train, y_train), (x_test, y_test) = boston_housing.load_data()
  11. x = np.vstack((x_train,x_test))
  12. y = np.concatenate((y_train, y_test))
  13. x = StandardScaler().fit_transform(x)
  14. y = StandardScaler().fit_transform(y.reshape(-1, 1))
  15. x_train = x[1:401, :]
  16. x_test = x[401:, :]
  17. y_train = y[1:401, :]
  18. y_test = y[401:, :]
  19. return (x_train, y_train), (x_test, y_test)
  20. def build_model1():
  21. model = Sequential()
  22. model.add(Dense(128, activation='relu', input_shape=(13, )))
  23. model.add(Dense(64, activation='relu'))
  24. model.add(Dense(1, activation='linear'))
  25. model.compile(optimizer='adam',
  26. loss='mean_squared_error')
  27. return model
  28. def build_model2():
  29. model = Sequential()
  30. model.add(Dense(64, activation='relu', input_shape=(13, )))
  31. model.add(Dense(32, activation='relu'))
  32. model.add(Dense(1, activation='linear'))
  33. model.compile(optimizer='adam',
  34. loss='mean_squared_error')
  35. return model
  36. def build_model3():
  37. model = Sequential()
  38. model.add(Dense(32, activation='relu', input_shape=(13, )))
  39. model.add(Dense(16, activation='relu'))
  40. model.add(Dense(1, activation='linear'))
  41. model.compile(optimizer='adam',
  42. loss='mean_squared_error')
  43. return model
  44. if __name__ == '__main__':
  45. (x_train, y_train), (x_test, y_test) = load_data()
  46. model1 = KerasRegressor(build_fn=build_model1, epochs=100, batch_size=64)
  47. model1._estimator_type = "regressor"
  48. model2 = KerasRegressor(build_fn=build_model2, epochs=100, batch_size=64)
  49. model2._estimator_type = "regressor"
  50. model3 = KerasRegressor(build_fn=build_model3, epochs=100, batch_size=64)
  51. model3._estimator_type = "regressor"
  52. cls = VotingRegressor(estimators=[
  53. ('model1', model1),
  54. ('model2', model2),
  55. ('model3', model3)
  56. ])
  57. cls.fit(x_train, y_train)
  58. joblib.dump(cls, "sklearn-regressor.h5")
  59. print("score: ", cls.score(x_test, y_test))

这里我们使用scikit_learn中的VotingRegressor作为模型集成的工具,在sklearn的文档中是这样描述VotingRegressor的

A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.

也就是它会将每个基学习器的输出结果做平均作为最后的输出结果。

如果我们使用StackingRegressor做模型集成工具的话,他会将每个基学习器的输出再放进一个regressor中做最后的预测,也就是下面我们要实现的学习法。我们来看一下文档中的描述

Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

模型输出

  1. score: 0.8482947319781606

keras集成学习-学习法

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from keras.datasets import boston_housing
  4. from keras.models import Model
  5. from keras.layers import Input, Dense, concatenate
  6. from keras.callbacks import EarlyStopping
  7. from sklearn.preprocessing import StandardScaler
  8. def load_data():
  9. (x_train, y_train), (x_test, y_test) = boston_housing.load_data()
  10. x = np.vstack((x_train,x_test))
  11. y = np.concatenate((y_train, y_test))
  12. x = StandardScaler().fit_transform(x)
  13. y = StandardScaler().fit_transform(y.reshape(-1, 1))
  14. x_train = x[1:401, :]
  15. x_test = x[401:, :]
  16. y_train = y[1:401, :]
  17. y_test = y[401:, :]
  18. return (x_train, y_train), (x_test, y_test)
  19. def draw_train_history(history):
  20. # summarize history for loss
  21. plt.plot(history.history['loss'])
  22. plt.plot(history.history['val_loss'])
  23. plt.title('model loss')
  24. plt.ylabel('loss')
  25. plt.xlabel('epoch')
  26. plt.legend(['train', 'validation'], loc='upper left')
  27. plt.show()
  28. def build_model():
  29. inputs = Input(shape=(13, ))
  30. model1_1 = Dense(64, activation='relu')(inputs)
  31. model2_1 = Dense(128, activation='relu')(inputs)
  32. model3_1 = Dense(32, activation='relu')(inputs)
  33. model1_2 = Dense(32, activation='relu')(model1_1)
  34. model2_2 = Dense(64, activation='relu')(model2_1)
  35. model3_2 = Dense(16, activation='relu')(model3_1)
  36. model1_3 = Dense(1, activation='linear')(model1_2)
  37. model2_3 = Dense(1, activation='linear')(model2_2)
  38. model3_3 = Dense(1, activation='linear')(model3_2)
  39. con = concatenate([model1_3, model2_3, model3_3])
  40. output = Dense(1, activation='linear')(con)
  41. model = Model(inputs=inputs, outputs=output)
  42. model.compile(optimizer='adam',
  43. loss='mean_squared_error')
  44. return model
  45. if __name__ == '__main__':
  46. (x_train, y_train), (x_test, y_test) = load_data()
  47. model = build_model()
  48. early_stopping = EarlyStopping(monitor='loss', patience=10)
  49. history = model.fit(x_train, y_train,
  50. epochs=500,
  51. batch_size=64,
  52. validation_split=0.2,
  53. callbacks=[early_stopping])
  54. draw_train_history(history)
  55. model.save("regression-ensemble.h5")
  56. loss = model.evaluate(x_test, y_test, batch_size=64)
  57. print("test loss: {}".format(loss))

模型结构

回归-keras - 图5

模型输出

  1. test loss: 0.16616634471075875

模型损失曲线

回归-keras - 图6

代码位置

https://github.com/Knowledge-Precipitation-Tribe/Neural-network/tree/master/code/Ensemble-Learning