">
https://github.com/fengdu78/lihang-code">https://github.com/fengdu78/lihang-code
feature
GEKKO
EDA
bamboolib
statsmodels
- pps
data_dashboard
- Snorkel
time
blog
ML
bayesian
AutoML
- nni
- Vowpal Wabbit
- https://github.com/slundberg/shap">树集成示例（XGBoost/LightGBM/CatBoost/scikit-learn/pyspark 模型）
  https://github.com/slundberg/shap
- interpret
hyperparameter-optimization

lihang-code

https://feature-engine.readthedocs.io/en/1.1.x/index.html

feature

https://towardsdatascience.com/what-is-feature-engineering-importance-tools-and-techniques-for-machine-learning-2080b0269f10
LazyPredict
AutoFeat

GEKKO

GEKKO 是一个 Python 包，用于机器学习和混合整数和微分代数方程的优化。它与用于线性、二次、非线性和混合整数规划（LP、QP、NLP、MILP、MINLP）的大规模求解器相结合。操作模式包括参数回归、数据协调、实时优化、动态模拟和非线性预测控制。GEKKO 是一个面向对象的 Python 库，用于促进 APMonitor 的本地执行。

EDA

bamboolib

https://docs.bamboolib.8080labs.com/documentation/how-tos/installation-and-setup/install-bamboolib

statsmodels

是一个 Python 模块，它提供了用于估计许多不同统计模型以及进行统计测试和统计数据探索的类和函数。每个估算器都有大量的结果统计数据列表。结果根据现有的统计包进行测试，以确保它们是正确的。 �

pps

PPS 是一种不对称的、与数据类型无关的分数，可以检测两列之间的线性或非线性关系。分数范围从 0（无预测能力）到 1（完全预测能力）。它可以用作相关性（矩阵）的替代方案。�
https://github.com/8080labs/ppscore#api

data_dashboard

Dashboard 探索数据并创建基线机器学习模型。
https://github.com/maciek3000/data_dashboard

Snorkel

https://github.com/snorkel-team/snorkel
一种快速生成弱监督训练数据的系统

time

sktime

基于时间序列的机器学习框架
https://github.com/alan-turing-institute/sktime

tslearn

tslearn是一个 Python 包，提供用于分析时间序列的机器学习工具。这个包建立在（因此依赖于）scikit-learn和numpy库 scipy。
https://tslearn.readthedocs.io/en/stable/index.html#
PyFlux Guide – Python Library For Time Series Analysis And Prediction

statsmodels

pyflux

https://pyflux.readthedocs.io/en/latest/
PyFlux 是一个用于时间序列分析和预测的库。用户可以从灵活的建模和推理选项中进行选择，并将输出用于预测和回顾。用户可以建立一个完整的概率模型，其中数据yy 和潜在变量（参数） zz 通过联合概率被视为随机变量 p(y,z)p(y,z). 概率方法的优势在于它提供了更完整的不确定性图景，这对于预测等时间序列任务很重要。或者，对于速度，用户可以在同一个统一 API 中简单地使用最大似然估计来估计速度

blog

https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/
https://www.kaggle.com/kashnitsky/topic-9-part-1-time-series-analysis-in-python
https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/

https://hackernoon.com/best-libraries-that-will-assist-you-in-eda-2021-edition-2l1734hl
https://www.kdnuggets.com/2021/03/11-essential-code-blocks-exploratory-data-analysis.html
https://cloud.google.com/blog/products/ai-machine-learning/building-ml-models-with-eda-feature-selection
https://www.analyticsvidhya.com/blog/2021/04/rapid-fire-eda-process-using-python-for-ml-implementation/
https://www.datacamp.com/community/tutorials/kaggle-machine-learning-eda
https://towardsdatascience.com/data-dashboard-python-package-for-eda-and-baseline-ml-model-creation-f10840a6cea0
https://www.mygreatlearning.com/blog/understanding-eda-in-python/

ML

thundergbm

https://github.com/Xtra-Computing/thundergbm

xgboost

https://github.com/dmlc/xgboost

catboost

https://catboost.ai/

lightgbm

https://lightgbm.readthedocs.io/en/latest/Advanced-Topics.html

scikit-learn

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet

Apache Spark MLlib

https://help.aliyun.com/document_detail/173152.html

ngboost

 [https://github.com/stanfordmlgroup/ngboost](https://github.com/stanfordmlgroup/ngboost)<br />ngboost 是一个实现自然梯度提升的 Python 库，如[“NGBoost：用于概率预测的自然梯度提升”中所述](https://stanfordmlgroup.github.io/projects/ngboost/)。它建立在[Scikit-Learn](https://scikit-learn.org/stable/)之上，[并被](https://scikit-learn.org/stable/)设计为在选择合适的评分规则、分布和基础学习器方面具有可扩展性和模块化。此[幻灯片中](https://drive.google.com/file/d/183BWFAdFms81MKy6hSku8qI97OwS_JH_/view?usp=sharing)提供了对 NGBoost 基础方法的教学性介绍。

decision tree

https://github.com/serengil/chefboost https://github.com/benedekrozemberczki/awesome-decision-tree-papers

bayesian

https://github.com/krasserm/bayesian-machine-learning 贝叶斯机器学习笔记本

AutoML
FlAML
https://github.com/microsoft/FLAML
A fast library for AutoML and tuning.

autopytorch/autokeras

https://github.com/keras-team/autokeras

auto-sklearn

https://automl.github.io/auto-sklearn/master/
auto-sklearn是一个自动化机器学习工具包，是 scikit-learn 估算器的替代品：
auto-sklearn将机器学习用户从算法选择和超参数调整中解放出来。它利用了最近在贝叶斯优化、元学习和集成构建方面的优势。阅读我们在NIPS 2015上发表的论文，了解有关auto-sklearn背后技术的更多信息。

PyCaret

Python中的开源，低码机学习库
https://github.com/pycaret/pycaret

LightAutoML

LAMA-自动模型创建框架
https://github.com/sberbank-ai-lab/LightAutoML

modAL

Python的模块化主动学习框架
https://github.com/modAL-python/modAL

MLBox

   [https://zhuanlan.zhihu.com/p/85832939](https://zhuanlan.zhihu.com/p/85832939)<br />**MLBox是一个功能强大的自动机器学习python库。** 它提供以下功能：

快速进行数据读取和分布式数据预处理/清洗/格式化。
高度可靠的特征选择和信息泄漏检测。
高维空间中精确超参数优化。
最新的分类和回归预测模型（Deep Learning，Stacking，LightGBM……）。
用模型解释进行预测。

TPOT
https://zhuanlan.zhihu.com/p/55076846
https://epistasislab.github.io/tpot/
一款自动化机器学习的利器
https://github.com/EpistasisLab/tpot

auto_ml
https://github.com/ClimbsRocks/auto_ml
用于生产和分析的自动化机器学习

H2O 非免费

blog
https://www.automl.org/automl-blog/

nni

https://github.com/microsoft/nni

NNI（神经网络智能）是一个轻量级但功能强大的工具包，可帮助用户自动进行 特征工程、神经架构搜索、超参数调整和模型压缩。
该工具管理自动化机器学习 (AutoML) 实验，调度和运行由调整算法生成的实验试验作业，以在不同的训练环境（如本地机器、远程服务器、OpenPAI、Kubeflow、FrameworkController）中搜索最佳神经架构和/或超参数在 K8S（AKS 等）、DLWorkspace（又名 DLTS）、AML（Azure 机器学习）、AdaptDL（又名 ADL）、其他云选项甚至混合模式上。

Vowpal Wabbit

Vowpal Wabbit https://github.com/VowpalWabbit/vowpal_wabbit

Vowpal Wabbit 是一个机器学习系统，它通过在线、散列、allreduce、reduces、learning2search、主动和交互式学习等技术推动机器学习的前沿。特别关注强化学习，实施了几种上下文老虎机算法，并且在线性质很好地解决了这个问题。Vowpal Wabbit 是实现和成熟最先进算法的目的地，同时考虑到性能。

输入格式。学习算法的输入格式比预期的要灵活得多。示例可以具有由自由格式文本组成的特征，这些特征以词袋方式进行解释。在不同的命名空间中甚至可以有多组自由格式文本。
速度。学习算法很快——类似于其他几个在线算法实现。有几种优化算法可用，基线是损失函数上的稀疏梯度下降 (GD)。
可扩展性。这与快速不同。相反，这里的重要特征是程序的内存占用是有界的，与数据无关。这意味着训练集在学习开始之前不会加载到主内存中。此外，特征集的大小与使用散列技巧的训练数据量无关。
功能交互。特征的子集可以在内部配对，以便算法在子集的叉积中是线性的。这对于排名问题很有用。在将特征输入学习算法之前显式扩展特征的替代方案可能既是计算密集型又是空间密集型，这取决于它的处理方式。
树集成示例（XGBoost/LightGBM/CatBoost/scikit-learn/pyspark 模型）
https://github.com/slundberg/shap
树集成示例（XGBoost/LightGBM/CatBoost/scikit-learn/pyspark 模型）
虽然 SHAP 可以解释任何机器学习模型的输出，但我们已经为树集成方法开发了一种高速精确算法（参见我们的Nature MI 论文）。XGBoost、LightGBM、CatBoost、scikit-learn和pyspark树模型支持快速 C++ 实现
interpret
https://github.com/interpretml/interpret
EBM 是 Microsoft Research *开发的可解释模型。它使用现代机器学习技术，如装袋、梯度提升和自动交互检测，为传统的 GAM（广义加性模型）注入新的活力。这使得 EBM 与随机森林和梯度提升树等最先进的技术一样准确。然而，与这些黑盒模型不同，EBM 产生准确的解释并且可由领域专家编辑。

hyperparameter-optimization

optuna

https://github.com/optuna/optuna-examples
https://medium.com/optuna/lightgbm-tuner-new-optuna-integration-for-hyperparameter-optimization-8b7095e99258
https://github.com/optuna/optuna

tune-sklearn

https://github.com/ray-project/tune-sklearn

Algorithm	search_optimization value	Summary	Website	pip install
(Random Search)	“random”	Randomized Search		built-in
SkoptSearch	“bayesian”	Bayesian Optimization	[Scikit-Optimize ]	scikit-optimize
HyperOptSearch	“hyperopt”	Tree-Parzen Estimators	[HyperOpt ]	hyperopt
TuneBOHB	“bohb”	Bayesian Opt/HyperBand	[BOHB ]	hpbandster ConfigSpace
Optuna	“optuna”	Tree-Parzen Estimators	[Optuna ]	optuna

hyperopt

https://github.com/search?q=hyperopt

scikit-optimize

https://github.com/scikit-optimize/scikit-optimize

hyperas

https://github.com/maxpumperla/hyperas

Gradient-Free-Optimizers

https://github.com/SimonBlanke/Gradient-Free-Optimizers

hpbandster

https://automl.github.io/HpBandSter/build/html/best_practices.html
HpBandSter (HyperBand on STERoids) 实现了最近发布的优化机器学习算法超参数的方法。

[

](https://github.com/stanfordmlgroup/ngboost)

[

](https://github.com/microsoft/nni)

机器学习框架Machine Learn

https://github.com/fengdu78/lihang-code

feature

GEKKO

EDA

bamboolib

statsmodels

pps

data_dashboard

Snorkel

time

sktime

tslearn

statsmodels

pyflux

blog

ML

thundergbm

xgboost

catboost

lightgbm

scikit-learn

Apache Spark MLlib

ngboost

decision tree

bayesian

AutoML

FlAML

autopytorch/autokeras

auto-sklearn

PyCaret

LightAutoML

modAL

MLBox

TPOT

auto_ml

H2O 非免费

blog

nni

Vowpal Wabbit

树集成示例（XGBoost/LightGBM/CatBoost/scikit-learn/pyspark 模型）https://github.com/slundberg/shap

interpret

hyperparameter-optimization

optuna

tune-sklearn

hyperopt

scikit-optimize

hyperas

Gradient-Free-Optimizers

hpbandster

树集成示例（XGBoost/LightGBM/CatBoost/scikit-learn/pyspark 模型）
https://github.com/slundberg/shap