MLflow Python API是一种不依赖于框架的机器学习工具,它提供了开发人员API,用于编写与不同的ML框架和后端集成的插件。
插件为定制MLflow Python客户端的行为和集成第三方工具提供了强大的机制,允许您:

  • 与实验数据、工件和模型的第三方存储解决方案集成
  • 与第三方身份验证提供商集成,例如从一个特殊文件读取HTTP身份验证凭据
  • 使用MLflow客户端与其他REST api通信,例如您组织现有的实验跟踪api
  • 自动捕获额外的元数据作为运行标记,例如与运行相关的git存储库
  • 添加新的后端以执行MLflow项目入口点。

MLflow Python API支持几种类型的插件:

  • Tracking Store:覆盖跟踪后端逻辑,例如,登录到第三方存储解决方案
  • ArtifactRepository:覆盖工件日志记录逻辑,例如记录到第三方存储解决方案
  • Tracking Store:指定要在通过mlflow.start_run() fluent API创建的运行时设置的上下文标记。
  • Model Registry Store:覆盖模型注册表后端逻辑,例如,登录到第三方存储解决方案
  • MLFlow Project backend端:覆盖本地执行后端,在自己的集群上执行项目(Databricks, kubernetes等)

    使用MLflow插件

    MLflow插件是可以使用PyPI或conda安装的Python包。这个例子从源代码安装了一个跟踪存储插件,并在一个示例脚本中使用它。

    安装插件

    首先,克隆MLflow并安装这个示例插件:
    1. git clone https://github.com/mlflow/mlflow
    2. cd mlflow
    3. pip install -e tests/resources/mlflow-test-plugin

    使用插件运行代码

    这个插件定义了一个自定义的跟踪存储,用于使用file-plugin模式跟踪uri。插件实现委托给MLflow的内置基于文件的运行存储。要使用这个插件,你可以运行任何使用MLflow的代码,将跟踪URI设置为file-plugin:// scheme:
    1. MLFLOW_TRACKING_URI=file-plugin:$(PWD)/mlruns python examples/quickstart/mlflow_tracking.py
    启动MLflow UI:
    1. cd ..
    2. mlflow server --backend-store-uri ./mlflow/mlruns
    http://localhost:5000查看结果。你应该看到一个新创建的运行,带有一个名为“param1”的参数和一个名为“foo”的度量:

MLflow插件 - 图1

编写自己的MLflow插件

定义一个插件

您可以将MLflow插件定义为一个独立的Python包,可以通过PyPI或conda分发安装。请参阅https://github.com/mlflow/mlflow/tree/master/tests/resources/mlflow-test-plugin获取实现所有可用插件类型的示例包。
示例包中包含了一个setup.py,它声明了一些入口点:

  1. setup(
  2. name="mflow-test-plugin",
  3. # Require MLflow as a dependency of the plugin, so that plugin users can simply install
  4. # the plugin and then immediately use it with MLflow
  5. install_requires=["mlflow"],
  6. ...
  7. entry_points={
  8. # Define a Tracking Store plugin for tracking URIs with scheme 'file-plugin'
  9. "mlflow.tracking_store": "file-plugin=mlflow_test_plugin.file_store:PluginFileStore",
  10. # Define a ArtifactRepository plugin for artifact URIs with scheme 'file-plugin'
  11. "mlflow.artifact_repository":
  12. "file-plugin=mlflow_test_plugin.local_artifact:PluginLocalArtifactRepository",
  13. # Define a RunContextProvider plugin. The entry point name for run context providers
  14. # is not used, and so is set to the string "unused" here
  15. "mlflow.run_context_provider": "unused=mlflow_test_plugin.run_context_provider:PluginRunContextProvider",
  16. # Define a Model Registry Store plugin for tracking URIs with scheme 'file-plugin'
  17. "mlflow.model_registry_store":
  18. "file-plugin=mlflow_test_plugin.sqlalchemy_store:PluginRegistrySqlAlchemyStore",
  19. # Define a MLflow Project Backend plugin called 'dummy-backend'
  20. "mlflow.project_backend":
  21. "dummy-backend=mlflow_test_plugin.dummy_backend:PluginDummyProjectBackend",
  22. # Define a MLflow model deployment plugin for target 'faketarget'
  23. "mlflow.deployments": "faketarget=mlflow_test_plugin.fake_deployment_plugin",
  24. },
  25. )

entry_points字典中的每个元素都指定一个插件。您可以选择在包中实现一个或多个插件类型,而不需要实现所有类型。下面描述了由每个入口点定义的插件类型及其在MLflow中相应的参考实现。在编写自己的插件时,你可以使用参考实现:

Description Entry-point group Entry-point name and value Reference Implementation
Plugins for overriding definitions of tracking APIs like mlflow.log_metric, mlflow.start_run for a specific tracking URI scheme. mlflow.tracking_store The entry point value (e.g. mlflow_test_plugin.local_store:PluginFileStore) specifies a custom subclass of mlflow.tracking.store.AbstractStore (e.g., the PluginFileStore class within the mlflow_test_plugin module).
The entry point name (e.g. file-plugin) is the tracking URI scheme with which to associate the custom AbstractStore implementation.
Users who install the example plugin and set a tracking URI of the form file-plugin://<path> will use the custom AbstractStore implementation defined in PluginFileStore. The full tracking URI is passed to the PluginFileStore constructor.
FileStore
Plugins for defining artifact read/write APIs like mlflow.log_artifact, MlflowClient.download_artifacts for a specified artifact URI scheme (e.g. the scheme used by your in-house blob storage system). mlflow.artifact_repository The entry point value (e.g. mlflow_test_plugin.local_artifact:PluginLocalArtifactRepository) specifies a custom subclass of mlflow.store.artifact.artifact_repo.ArtifactRepository (e.g., the PluginLocalArtifactRepository class within the mlflow_test_plugin module).
The entry point name (e.g. file-plugin) is the artifact URI scheme with which to associate the custom ArtifactRepository implementation.
Users who install the example plugin and log to a run whose artifact URI is of the form file-plugin://<path> will use the custom ArtifactRepository implementation defined in PluginLocalArtifactRepository. The full artifact URI is passed to the PluginLocalArtifactRepository constructor.
LocalArtifactRepository
Plugins for specifying custom context tags at run creation time, e.g. tags identifying the git repository associated with a run. mlflow.run_context_provider The entry point name is unused. The entry point value (e.g. mlflow_test_plugin.run_context_provider:PluginRunContextProvider) specifies a custom subclass of mlflow.tracking.context.abstract_context.RunContextProvider (e.g., the PluginRunContextProvider class within the mlflow_test_plugin module) to register. GitRunContext, DefaultRunContext
Plugins for overriding definitions of Model Registry APIs like mlflow.register_model. mlflow.model_registry_store Note
The Model Registry is in beta (as of MLflow 1.5). Model Registry APIs are not guaranteed to be stable, and Model Registry plugins may break in the future.
The entry point value (e.g. mlflow_test_plugin.sqlalchemy_store:PluginRegistrySqlAlchemyStore) specifies a custom subclass of mlflow.tracking.model_registry.AbstractStore (e.g., the PluginRegistrySqlAlchemyStore class within the mlflow_test_plugin module)
The entry point name (e.g. file-plugin) is the tracking URI scheme with which to associate the custom AbstractStore implementation.
Users who install the example plugin and set a tracking URI of the form file-plugin://<path> will use the custom AbstractStore implementation defined in PluginFileStore. The full tracking URI is passed to the PluginFileStore constructor.
SqlAlchemyStore
Plugins for running MLflow projects against custom execution backends (e.g. to run projects against your team’s in-house cluster or job scheduler). mlflow.project.backend The entry point value (e.g. mlflow_test_plugin.dummy_backend:PluginDummyProjectBackend) specifies a custom subclass of mlflow.project.backend.AbstractBackend) N/A (will be added soon)
Plugins for deploying models to custom serving tools. mlflow.deployments The entry point name (e.g. redisai) is the target name. The entry point value (e.g. mlflow_test_plugin.fake_deployment_plugin) specifies a module defining: 1) Exactly one subclass of mlflow.deployments.BaseDeploymentClient (e.g., the PluginDeploymentClient class). MLflow’s mlflow.deployments.get_deploy_client API directly returns an instance of this subclass to the user, so you’re encouraged to write clear user-facing method and class docstrings as part of your plugin implementation. 2) The run_local and target_help functions, with the target parameter excluded, as shown here PluginDeploymentClient.

测试您的插件

我们建议测试你的插件,以确保它符合MLflow所期望的要求。例如,一个跟踪存储插件应该包含验证其log_metric、log_param等实现正确性的测试。另请参阅MLflow参考实现的测试作为示例: