一,内置数据集
| 调用 | 描述 |
|---|---|
| load_boston([return_X_y]) | Load and return the boston house-prices dataset (regression). |
| load_iris([return_X_y]) | Load and return the iris dataset (classification). |
| load_diabetes([return_X_y]) | Load and return the diabetes dataset (regression). |
| load_digits([n_class, return_X_y]) | Load and return the digits dataset (classification). |
| load_linnerud([return_X_y]) | Load and return the linnerud dataset (multivariate regression). |
| load_wine([return_X_y]) | Load and return the wine dataset (classification). |
| load_breast_cancer([return_X_y]) | Load and return the breast cancer wisconsin dataset (classification). |
from sklearn.datasets import *import pandas as pddata,tagrget=load_boston(return_X_y=True)
二,真实数据集
比较大,需要从网络中下载
| 调用 | 描述 |
|---|---|
| fetch_olivetti_faces([data_home, shuffle, …]) | Load the Olivetti faces data-set from AT&T (classification). |
| fetch_20newsgroups([data_home, subset, …]) | Load the filenames and data from the 20 newsgroups dataset (classification). |
| fetch_20newsgroups_vectorized([subset, …]) | Load the 20 newsgroups dataset and vectorize it into token counts (classification). |
| fetch_lfw_people([data_home, funneled, …]) | Load the Labeled Faces in the Wild (LFW) people dataset (classification). |
| fetch_lfw_pairs([subset, data_home, …]) | Load the Labeled Faces in the Wild (LFW) pairs dataset (classification). |
| fetch_covtype([data_home, …]) | Load the covertype dataset (classification). |
| fetch_rcv1([data_home, subset, …]) | Load the RCV1 multilabel dataset (classification). |
| fetch_kddcup99([subset, data_home, shuffle, …]) | Load the kddcup99 dataset (classification). |
| fetch_california_housing([data_home, …]) | Load the California housing dataset (regression). |
数据集的调用方法
三,模拟数据集
scikit-learn模块内置了许多随机函数来生成对应的模拟数据集
x, y = make_blobs(n_samples=100, n_features=2)x, y = make_gaussian_quantiles(n_samples=100, n_features=2, n_classes=3)x, y = make_hastie_10_2(n_samples=12000)x, y = make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2)
| 调用 | 描述 |
|---|---|
| make_biclusters(shape, n_clusters[, noise, …]) | Generate an array with constant block diagonal structure for biclustering. |
| make_checkerboard(shape, n_clusters[, …]) | Generate an array with block checkerboard structure for biclustering. |
四,流形学习生成器
| 调用 | 描述 |
|---|---|
| make_s_curve([n_samples, noise, random_state]) | Generate an S curve dataset. |
| make_swiss_roll([n_samples, noise, random_state]) | Generate a swiss roll dataset. |
还有很多的数据集
https://sklearn.apachecn.org/docs/master/47.html
五,样本图片
| 调用 | 描述 |
|---|---|
| load_sample_images() | Load sample images for image manipulation. |
| load_sample_image(image_name) | Load the numpy array of a single sample image |
六. 其他数据集
针对openml.org这一开源的机器学习网站,提供了下载其数据集的函数,用法如下
