Sklearn - 001-数据集的加载 - 《动手深度学习》

一，内置数据集
二，真实数据集
三，模拟数据集
四，流形学习生成器
五，样本图片

一，内置数据集

调用	描述
load_boston([return_X_y])	Load and return the boston house-prices dataset (regression).
load_iris([return_X_y])	Load and return the iris dataset (classification).
load_diabetes([return_X_y])	Load and return the diabetes dataset (regression).
load_digits([n_class, return_X_y])	Load and return the digits dataset (classification).
load_linnerud([return_X_y])	Load and return the linnerud dataset (multivariate regression).
load_wine([return_X_y])	Load and return the wine dataset (classification).
load_breast_cancer([return_X_y])	Load and return the breast cancer wisconsin dataset (classification).

from sklearn.datasets import *
import pandas as pd
data,tagrget=load_boston(return_X_y=True)

二，真实数据集

比较大，需要从网络中下载

调用	描述
fetch_olivetti_faces([data_home, shuffle, …])	Load the Olivetti faces data-set from AT&T (classification).
fetch_20newsgroups([data_home, subset, …])	Load the filenames and data from the 20 newsgroups dataset (classification).
fetch_20newsgroups_vectorized([subset, …])	Load the 20 newsgroups dataset and vectorize it into token counts (classification).
fetch_lfw_people([data_home, funneled, …])	Load the Labeled Faces in the Wild (LFW) people dataset (classification).
fetch_lfw_pairs([subset, data_home, …])	Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).
fetch_covtype([data_home, …])	Load the covertype dataset (classification).
fetch_rcv1([data_home, subset, …])	Load the RCV1 multilabel dataset (classification).
fetch_kddcup99([subset, data_home, shuffle, …])	Load the kddcup99 dataset (classification).
fetch_california_housing([data_home, …])	Load the California housing dataset (regression).

数据集的调用方法

三，模拟数据集

scikit-learn模块内置了许多随机函数来生成对应的模拟数据集

x, y = make_blobs(n_samples=100, n_features=2)
x, y = make_gaussian_quantiles(n_samples=100, n_features=2, n_classes=3)
x, y = make_hastie_10_2(n_samples=12000)
x, y = make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2)

调用	描述
make_biclusters(shape, n_clusters[, noise, …])	Generate an array with constant block diagonal structure for biclustering.
make_checkerboard(shape, n_clusters[, …])	Generate an array with block checkerboard structure for biclustering.

四，流形学习生成器

调用	描述
make_s_curve([n_samples, noise, random_state])	Generate an S curve dataset.
make_swiss_roll([n_samples, noise, random_state])	Generate a swiss roll dataset.

还有很多的数据集
https://sklearn.apachecn.org/docs/master/47.html

五，样本图片

调用	描述
load_sample_images()	Load sample images for image manipulation.
load_sample_image(image_name)	Load the numpy array of a single sample image

六. 其他数据集
针对openml.org这一开源的机器学习网站，提供了下载其数据集的函数，用法如下