模型能直接应用吗?
得到的模型靠谱吗,能否和真实环境的数据拟合的很好?
解决方案
数据集切分
数据准备

切分数据集
数据集切分成两部分:训练集和测试集
封装切分算法
# playML/model_selection.pyimport numpy as npdef train_test_split(X, y, test_ratio=0.2, seed=None):    """将数据 X 和 y 按照test_ratio分割成X_train, X_test, y_train, y_test"""    assert X.shape[0] == y.shape[0], \        "the size of X must be equal to the size of y"    assert 0.0 <= test_ratio <= 1.0, \        "test_ration must be valid"    if seed:        np.random.seed(seed)    shuffled_indexes = np.random.permutation(len(X))    test_size = int(len(X) * test_ratio)    test_indexes = shuffled_indexes[:test_size]    train_indexes = shuffled_indexes[test_size:]    X_train = X[train_indexes]    y_train = y[train_indexes]    X_test = X[test_indexes]    y_test = y[test_indexes]    return X_train, X_test, y_train, y_test
模型准确度
sklearn中的切分算法

# 关键源码from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=666)