一,内置数据集

调用 描述
load_boston([return_X_y]) Load and return the boston house-prices dataset (regression).
load_iris([return_X_y]) Load and return the iris dataset (classification).
load_diabetes([return_X_y]) Load and return the diabetes dataset (regression).
load_digits([n_class, return_X_y]) Load and return the digits dataset (classification).
load_linnerud([return_X_y]) Load and return the linnerud dataset (multivariate regression).
load_wine([return_X_y]) Load and return the wine dataset (classification).
load_breast_cancer([return_X_y]) Load and return the breast cancer wisconsin dataset (classification).
  1. from sklearn.datasets import *
  2. import pandas as pd
  3. data,tagrget=load_boston(return_X_y=True)

二,真实数据集

比较大,需要从网络中下载

调用 描述
fetch_olivetti_faces([data_home, shuffle, …]) Load the Olivetti faces data-set from AT&T (classification).
fetch_20newsgroups([data_home, subset, …]) Load the filenames and data from the 20 newsgroups dataset (classification).
fetch_20newsgroups_vectorized([subset, …]) Load the 20 newsgroups dataset and vectorize it into token counts (classification).
fetch_lfw_people([data_home, funneled, …]) Load the Labeled Faces in the Wild (LFW) people dataset (classification).
fetch_lfw_pairs([subset, data_home, …]) Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).
fetch_covtype([data_home, …]) Load the covertype dataset (classification).
fetch_rcv1([data_home, subset, …]) Load the RCV1 multilabel dataset (classification).
fetch_kddcup99([subset, data_home, shuffle, …]) Load the kddcup99 dataset (classification).
fetch_california_housing([data_home, …]) Load the California housing dataset (regression).

数据集的调用方法

三,模拟数据集

scikit-learn模块内置了许多随机函数来生成对应的模拟数据集

  1. x, y = make_blobs(n_samples=100, n_features=2)
  2. x, y = make_gaussian_quantiles(n_samples=100, n_features=2, n_classes=3)
  3. x, y = make_hastie_10_2(n_samples=12000)
  4. x, y = make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2)
调用 描述
make_biclusters(shape, n_clusters[, noise, …]) Generate an array with constant block diagonal structure for biclustering.
make_checkerboard(shape, n_clusters[, …]) Generate an array with block checkerboard structure for biclustering.

四,流形学习生成器

调用 描述
make_s_curve([n_samples, noise, random_state]) Generate an S curve dataset.
make_swiss_roll([n_samples, noise, random_state]) Generate a swiss roll dataset.

还有很多的数据集
https://sklearn.apachecn.org/docs/master/47.html

五,样本图片

调用 描述
load_sample_images() Load sample images for image manipulation.
load_sample_image(image_name) Load the numpy array of a single sample image

六. 其他数据集
针对openml.org这一开源的机器学习网站,提供了下载其数据集的函数,用法如下