模型能直接应用吗?

得到的模型靠谱吗,能否和真实环境的数据拟合的很好?
image.png

解决方案

数据集切分
image.png

数据准备

image.png
image.png

切分数据集

数据集切分成两部分:训练集和测试集
image.png

封装切分算法

  1. # playML/model_selection.py
  2. import numpy as np
  3. def train_test_split(X, y, test_ratio=0.2, seed=None):
  4. """将数据 X 和 y 按照test_ratio分割成X_train, X_test, y_train, y_test"""
  5. assert X.shape[0] == y.shape[0], \
  6. "the size of X must be equal to the size of y"
  7. assert 0.0 <= test_ratio <= 1.0, \
  8. "test_ration must be valid"
  9. if seed:
  10. np.random.seed(seed)
  11. shuffled_indexes = np.random.permutation(len(X))
  12. test_size = int(len(X) * test_ratio)
  13. test_indexes = shuffled_indexes[:test_size]
  14. train_indexes = shuffled_indexes[test_size:]
  15. X_train = X[train_indexes]
  16. y_train = y[train_indexes]
  17. X_test = X[test_indexes]
  18. y_test = y[test_indexes]
  19. return X_train, X_test, y_train, y_test

image.png

模型准确度

image.png

sklearn中的切分算法

image.png

  1. # 关键源码
  2. from sklearn.model_selection import train_test_split
  3. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=666)