1、介绍、安装

(1)基本介绍

  • Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.
  • It assumes a very basic working knowledge of machine learning practices (model fitting, predicting, cross-validation, etc.).

    (2)安装

    ```shell

    jy: 创建 python-3.9.0 虚拟环境并激活(如有可用虚拟环境, 可跳过这一步)

    conda create —name jy-env_py39 python==3.9.0 conda activate jy-env_py39

jy: -U / —upgrade 表示更新至最新;

pip install -U scikit-learn

jy: 查看包版本

pip show scikit-learn

jy: 查看虚拟环境下已安装的包

pip freeze

jy: 查看 sklearn 包以及相关依赖包的版本;

python -c “import sklearn; sklearn.show_versions()”

  1. <a name="PdC5Z"></a>
  2. ## (3)环境依赖说明
  3. - Scikit-learn plotting capabilities (i.e., functions start with "plot_" and classes end with "Display") require Matplotlib. The examples require Matplotlib and some examples require scikit-image, pandas, or seaborn. The minimum version of Scikit-learn dependencies are listed below along with its purpose.
  4. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/25833371/1650184635103-995af47f-ffac-4fcb-9af9-6739b1e7a79a.png#clientId=u32d36f31-a850-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=487&id=uc73e9c36&margin=%5Bobject%20Object%5D&name=image.png&originHeight=973&originWidth=1145&originalType=binary&ratio=1&rotation=0&showTitle=false&size=146314&status=done&style=none&taskId=u971d22a0-513f-4146-9c8a-701d8b8b02d&title=&width=572.5)
  5. - **Warning**
  6. - Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.
  7. - Scikit-learn 0.21 supported Python 3.5-3.7.
  8. - Scikit-learn 0.22 supported Python 3.5-3.8.
  9. - Scikit-learn 0.23 - 0.24 require Python 3.6 or newer.
  10. - Scikit-learn 1.0 and later requires Python 3.7 or newer.
  11. - **Note:**For installing on PyPy, PyPy3-v5.10+, Numpy 1.14.0+, and scipy 1.1.0+ are required.
  12. <a name="reA4C"></a>
  13. # 2、使用示例
  14. - This guide should give you an overview of some of the main features of the library, including estimator fitting and predicting, pre-processing steps, pipelines, cross-validation tools and automatic hyper-parameter searches.
  15. <a name="FQs67"></a>
  16. ## (1)Fitting and predicting: estimator basics
  17. - `Scikit-learn` provides dozens of built-in machine learning algorithms and models, called `estimators`. Each estimator can be fitted to some data using its `fit` method.
  18. - `estimators`
  19. - [https://scikit-learn.org/stable/glossary.html#term-estimators](https://scikit-learn.org/stable/glossary.html#term-estimators)
  20. - `fit`
  21. - [https://scikit-learn.org/stable/glossary.html#term-fit](https://scikit-learn.org/stable/glossary.html#term-fit)
  22. - The `fit` method generally accepts 2 inputs:
  23. - The samples matrix (or design matrix) `X`. The size of `X` is typically `(n_samples, n_features)`, which means that samples are represented as rows and features are represented as columns.
  24. - The target values `y` which are real numbers for regression tasks, or integers for classification (or any other discrete set of values). For unsupervized learning tasks, `y` does not need to be specified. `y` is usually 1d array where the `i-th` entry corresponds to the target of the `i-th` sample (row) of `X`.
  25. - Once the estimator is fitted, it can be used for predicting target values of new data. You don’t need to re-train the estimator.
  26. ```python
  27. from sklearn.ensemble import RandomForestClassifier
  28. clf = RandomForestClassifier(random_state=0)
  29. # jy: 2 samples, 3 features
  30. X = [[ 1, 2, 3],
  31. [11, 12, 13]]
  32. # jy: classes of each sample
  33. y = [0, 1]
  34. clf.fit(X, y)
  35. # jy: predict classes of the training data
  36. res_1 = clf.predict(X)
  37. # jy: [0, 1]
  38. print(res_1)
  39. # jy: predict classes of new data
  40. res_2 = clf.predict([[4, 5, 6], [14, 15, 16]])
  41. # jy: [0, 1]
  42. print(res_2)

(2)Transformers and pre-processors

  • Machine learning workflows are often composed of different parts. A typical pipeline consists of a pre-processing step that transforms or imputes the data, and a final predictor that predicts target values.
  • In scikit-learn, pre-processors and transformers follow the same API as the estimator objects (they actually all inherit from the same BaseEstimator class). The transformer objects don’t have a predict method but rather a transform method that outputs a newly transformed sample matrix X:

  • Sometimes, you want to apply different transformations to different features: the ColumnTransformer is designed for these use-cases.

(4)Model evaluation

(5)Automatic parameter searches

3、参考链接