id: install_cluster-milvusoperator.md label: Milvus Operator related_key: Kubernetes order: 2 group: install_cluster-docker.md

summary: 了解如何使用 Milvus Operator 在 Kubernetes 上安装 Milvus 集群

安装分布式版 Milvus

{{fragments/installation_guide_cluster.md}}

{{tab}}

创建 Kubernetes 集群

如果已经在生产环境中创建了 K8s 集群,你可以跳过此步骤,直接开始部署 Milvus Operator。如未创建 K8s 集群,你可以根据以下步骤快速创建一个用于测试的 K8s 集群,并使用其安装分布式版 Milvus。本文将介绍两种创建 K8s 集群的方法:

  • 使用 minikube 在虚拟机中创建 Kubernetes 集群
  • 使用 kind 在 docker 中创建 Kubernetes 集群
使用 minikube 及 kind 创建的集群只可用于测试,不可以用在生产环境中。

使用 minikube 在虚拟机中创建 K8s 集群

minikube 是一种可以让你在本地轻松运行 Kubernetes 的工具。

1. 安装 minikube

更多细节参考 安装前提

2. 使用 minikube 启用 K8s 集群

安装 minikube 后,运行如下指令,启用 K8s 集群。

  1. $ minikube start

成功启用 K8s 集群后,你可以看到如下结果。输出结果可能根据你的操作系统和虚拟机监控器会有所不同。

  1. 😄 minikube v1.21.0 on Darwin 11.4
  2. 🎉 minikube 1.23.2 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.23.2
  3. 💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
  4. Automatically selected the docker driver. Other choiceshyperkit, ssh
  5. 👍 Starting control plane node minikube in cluster minikube
  6. 🚜 Pulling base image ...
  7. minikube was unable to download gcr.io/k8s-minikube/kicbase:v0.0.23, but successfully downloaded kicbase/stable:v0.0.23 as a fallback image
  8. 🔥 Creating docker container (CPUs=2, Memory=8100MB) ...
  9. This container is having trouble accessing https://k8s.gcr.io
  10. 💡 To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/
  11. 🐳 Preparing Kubernetes v1.20.7 on Docker 20.10.7
  12. Generating certificates and keys ...
  13. Booting up control plane ...
  14. Configuring RBAC rules ...
  15. 🔎 Verifying Kubernetes components...
  16. Using image gcr.io/k8s-minikube/storage-provisioner:v5
  17. 🌟 Enabled addons: storage-provisioner, default-storageclass
  18. 🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

3. 检查 K8s 集群状态

运行命令 $ kubectl cluster-info ,检查你所创建的 K8s 集群状态。确保你可以使用 kubectl 访问 K8s 集群。输出结果如下:

  1. Kubernetes control plane is running at https://127.0.0.1:63754
  2. KubeDNS is running at https://127.0.0.1:63754/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
  3. To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

使用 kind 创建 K8s 集群

kind 是一种使用 Docker 容器作为 node 节点,运行本地Kubernetes 集群的工具。

1. 创建配置文件

创建 kind.yaml 配置文件。

  1. kind: Cluster
  2. apiVersion: kind.x-k8s.io/v1alpha4
  3. nodes:
  4. - role: control-plane
  5. - role: worker
  6. - role: worker
  7. - role: worker

2. 创建 K8s 集群

使用 kind.yaml 配置文件创建 K8s 集群。

  1. $ kind create cluster --name myk8s --config kind.yaml

成功启动 K8s 集群后,可以看到如下结果:

  1. Creating cluster "myk8s" ...
  2. Ensuring node image (kindest/node:v1.21.1) 🖼
  3. Preparing nodes 📦 📦 📦 📦
  4. Writing configuration 📜
  5. Starting control-plane 🕹️
  6. Installing CNI 🔌
  7. Installing StorageClass 💾
  8. Joining worker nodes 🚜
  9. Set kubectl context to "kind-myk8s"
  10. You can now use your cluster with:
  11. kubectl cluster-info --context kind-myk8s
  12. Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/

3. 检查 K8s 集群状态

运行指令 $ kubectl cluster-info,检查你所创建的 K8s 集群状态。确保你可以使用 kubectl 访问 K8s 集群。输出结果如下:

  1. Kubernetes control plane is running at https://127.0.0.1:55668
  2. CoreDNS is running at https://127.0.0.1:55668/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
  3. To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

部署 Milvus Operator

Milvus Operator 解决方案能够帮助你在目标 K8s 集群上部署 Milvus 服务栈,包含所有 Milvus 组件及 etcd、Pulsar、MinIO 等相关第三方组件。Milvus Operator 会在 Kubernetes 自定义资源 基础上定义 Milvus 集群的自定义资源。定义资源后,你可以声明式使用 K8s API 并管理 Milvus 部署栈以确保服务可扩展和高可用。

部署前提

  • 确保你可以使用 kubectl 访问 K8s 集群。
  • 确保已经安装 StorageClass 组件。minikube 及 kind 默认安装 Storageclass 组件。 运行指令 kubectl get sc,检查是否已安装 Storageclass 组件。如已安装,你可以看到如下结果。如未安装,请手动配置 sStorageclass。 详见改变默认 StorageClass
  1. NAME PROVISIONER RECLAIMPOLICY VOLUMEBIINDINGMODE ALLOWVOLUMEEXPANSION AGE
  2. standard (default) k8s.io/minikube-hostpath Delete Immediate false 3m36s

1. 安装 cert-manager

Milvus Operator 使用 cert-manager 为 webhook 服务生成证书。运行如下指令,安装 cert-manager。

  1. $ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.yaml

安装成功后,你可以看到如下结果:

  1. customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
  2. customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
  3. customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
  4. customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
  5. customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
  6. customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
  7. namespace/cert-manager created
  8. serviceaccount/cert-manager-cainjector created
  9. serviceaccount/cert-manager created
  10. serviceaccount/cert-manager-webhook created
  11. clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
  12. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
  13. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
  14. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
  15. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
  16. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
  17. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
  18. clusterrole.rbac.authorization.k8s.io/cert-manager-view created
  19. clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
  20. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
  21. clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
  22. clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
  23. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
  24. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
  25. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
  26. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
  27. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
  28. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
  29. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
  30. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
  31. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
  32. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
  33. role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
  34. role.rbac.authorization.k8s.io/cert-manager:leaderelection created
  35. role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
  36. rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
  37. rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
  38. rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
  39. service/cert-manager created
  40. service/cert-manager-webhook created
  41. deployment.apps/cert-manager-cainjector created
  42. deployment.apps/cert-manager created
  43. deployment.apps/cert-manager-webhook created
  44. mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
  45. validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
需要安装 cert-manager 1.13 或以上版本。

运行指令 $ kubectl get pods -n cert-manager,检查 cert-manager 是否正在运行。如果正在运行,你可以看到所有 pods 都在运行中,如下所示:

  1. NAME READY STATUS RESTARTS AGE
  2. cert-manager-848f547974-gccz8 1/1 Running 0 70s
  3. cert-manager-cainjector-54f4cc6b5-dpj84 1/1 Running 0 70s
  4. cert-manager-webhook-7c9588c76-tqncn 1/1 Running 0 70s

2. 安装 Milvus Operator

运行如下指令,安装 Milvus Operator。

  1. $ kubectl apply -f https://raw.githubusercontent.com/milvus-io/milvus-operator/main/deploy/manifests/deployment.yaml

安装成功后,你可以看到如下结果:

  1. namespace/milvus-operator created
  2. customresourcedefinition.apiextensions.k8s.io/milvusclusters.milvus.io created
  3. serviceaccount/milvus-operator-controller-manager created
  4. role.rbac.authorization.k8s.io/milvus-operator-leader-election-role created
  5. clusterrole.rbac.authorization.k8s.io/milvus-operator-manager-role created
  6. clusterrole.rbac.authorization.k8s.io/milvus-operator-metrics-reader created
  7. clusterrole.rbac.authorization.k8s.io/milvus-operator-proxy-role created
  8. rolebinding.rbac.authorization.k8s.io/milvus-operator-leader-election-rolebinding created
  9. clusterrolebinding.rbac.authorization.k8s.io/milvus-operator-manager-rolebinding created
  10. clusterrolebinding.rbac.authorization.k8s.io/milvus-operator-proxy-rolebinding created
  11. configmap/milvus-operator-manager-config created
  12. service/milvus-operator-controller-manager-metrics-service created
  13. service/milvus-operator-webhook-service created
  14. deployment.apps/milvus-operator-controller-manager created
  15. certificate.cert-manager.io/milvus-operator-serving-cert created
  16. issuer.cert-manager.io/milvus-operator-selfsigned-issuer created
  17. mutatingwebhookconfiguration.admissionregistration.k8s.io/milvus-operator-mutating-webhook-configuration created
  18. validatingwebhookconfiguration.admissionregistration.k8s.io/milvus-operator-validating-webhook-configuration created

运行指令 $ kubectl get pods -n milvus-operator,检查 Milvus Operator 是否正在运行。如果正在运行中,你可以看到 Milvus Operator 的 pod 正在运行中,如下所示:

  1. NAME READY STATUS RESTARTS AGE
  2. milvus-operator-controller-manager-698fc7dc8d-rlmtk 1/1 Running 0 46s

安装分布式版 Milvus

本文在安装分布式版 Milvus 时使用了默认配置。所有 Milvus 组件均启用了多个副本,这会消耗大量资源。本地资源有限时,你可以 使用最低配置 安装分布式版 Milvus。

1. 部署 Milvus 集群

启用 Milvus Operator 后,运行如下指令,部署 Milvus 集群。

  1. $ kubectl apply -f https://raw.githubusercontent.com/milvus-io/milvus-operator/main/config/samples/milvuscluster_default.yaml

部署完毕后,你可以看到如下结果:

  1. milvuscluster.milvus.io/my-release created

2. 检查 Milvus 集群状态

运行如下指令,检查 Milvus 集群状态。

  1. $ kubectl get mc my-release -o yaml

你可以通过输出结果中 status 一栏确认 Milvus 集群的当前状态。如果 Milvus 集群还在创建中,status 一栏会显示 Unhealthy

  1. apiVersion: milvus.io/v1alpha1
  2. kind: MilvusCluster
  3. metadata:
  4. ...
  5. status:
  6. conditions:
  7. - lastTransitionTime: "2021-11-02T02:52:04Z"
  8. message: 'Get "http://my-release-minio.default:9000/minio/admin/v3/info": dial
  9. tcp 10.96.78.153:9000: connect: connection refused'
  10. reason: ClientError
  11. status: "False"
  12. type: StorageReady
  13. - lastTransitionTime: "2021-11-02T02:52:04Z"
  14. message: connection error
  15. reason: PulsarNotReady
  16. status: "False"
  17. type: PulsarReady
  18. - lastTransitionTime: "2021-11-02T02:52:04Z"
  19. message: All etcd endpoints are unhealthy
  20. reason: EtcdNotReady
  21. status: "False"
  22. type: EtcdReady
  23. - lastTransitionTime: "2021-11-02T02:52:04Z"
  24. message: Milvus Dependencies is not ready
  25. reason: DependencyNotReady
  26. status: "False"
  27. type: MilvusReady
  28. endpoint: my-release-milvus.default:19530
  29. status: Unhealthy

运行如下指令,检查 Milvus pod 当前状态。

  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-etcd-0 0/1 Running 0 16s
  3. my-release-etcd-1 0/1 ContainerCreating 0 16s
  4. my-release-etcd-2 0/1 ContainerCreating 0 16s
  5. my-release-minio-0 1/1 Running 0 16s
  6. my-release-minio-1 1/1 Running 0 16s
  7. my-release-minio-2 0/1 Running 0 16s
  8. my-release-minio-3 0/1 ContainerCreating 0 16s
  9. my-release-pulsar-bookie-0 0/1 Pending 0 15s
  10. my-release-pulsar-bookie-1 0/1 Pending 0 15s
  11. my-release-pulsar-bookie-init-h6tfz 0/1 Init:0/1 0 15s
  12. my-release-pulsar-broker-0 0/1 Init:0/2 0 15s
  13. my-release-pulsar-broker-1 0/1 Init:0/2 0 15s
  14. my-release-pulsar-proxy-0 0/1 Init:0/2 0 16s
  15. my-release-pulsar-proxy-1 0/1 Init:0/2 0 15s
  16. my-release-pulsar-pulsar-init-d2t56 0/1 Init:0/2 0 15s
  17. my-release-pulsar-recovery-0 0/1 Init:0/1 0 16s
  18. my-release-pulsar-toolset-0 1/1 Running 0 16s
  19. my-release-pulsar-zookeeper-0 0/1 Pending 0 16s

3. 启用 Milvus 组件

Milvus Operator 会先创建 etcd、Pulsar、MinIO 等第三方组件,随后再创建 Milvus 组件。因此,目前你仅能看到 etcd、Pulsar 及 MinIO 的 pod。Milvus Operator 会在所有第三方组件启用后启动 Milvus 组件。Milvus 集群状态如下所示:

  1. ...
  2. status:
  3. conditions:
  4. - lastTransitionTime: "2021-11-02T05:59:41Z"
  5. reason: StorageReady
  6. status: "True"
  7. type: StorageReady
  8. - lastTransitionTime: "2021-11-02T06:06:23Z"
  9. message: Pulsar is ready
  10. reason: PulsarReady
  11. status: "True"
  12. type: PulsarReady
  13. - lastTransitionTime: "2021-11-02T05:59:41Z"
  14. message: Etcd endpoints is healthy
  15. reason: EtcdReady
  16. status: "True"
  17. type: EtcdReady
  18. - lastTransitionTime: "2021-11-02T06:06:24Z"
  19. message: '[datacoord datanode indexcoord indexnode proxy querycoord querynode
  20. rootcoord] not ready'
  21. reason: MilvusComponentNotHealthy
  22. status: "False"
  23. type: MilvusReady
  24. endpoint: my-release-milvus.default:19530
  25. status: Unhealthy

再次检查 Milvus Pods 状态。

  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-etcd-0 1/1 Running 0 6m49s
  3. my-release-etcd-1 1/1 Running 0 6m49s
  4. my-release-etcd-2 1/1 Running 0 6m49s
  5. my-release-milvus-datacoord-6c7bb4b488-k9htl 0/1 ContainerCreating 0 16s
  6. my-release-milvus-datanode-5c686bd65-wxtmf 0/1 ContainerCreating 0 16s
  7. my-release-milvus-indexcoord-586b9f4987-vb7m4 0/1 Running 0 16s
  8. my-release-milvus-indexnode-5b9787b54-xclbx 0/1 ContainerCreating 0 16s
  9. my-release-milvus-proxy-84f67cdb7f-pg6wf 0/1 ContainerCreating 0 16s
  10. my-release-milvus-querycoord-865cc56fb4-w2jmn 0/1 Running 0 16s
  11. my-release-milvus-querynode-5bcb59f6-nhqqw 0/1 ContainerCreating 0 16s
  12. my-release-milvus-rootcoord-fdcccfc84-9964g 0/1 Running 0 16s
  13. my-release-minio-0 1/1 Running 0 6m49s
  14. my-release-minio-1 1/1 Running 0 6m49s
  15. my-release-minio-2 1/1 Running 0 6m49s
  16. my-release-minio-3 1/1 Running 0 6m49s
  17. my-release-pulsar-bookie-0 1/1 Running 0 6m48s
  18. my-release-pulsar-bookie-1 1/1 Running 0 6m48s
  19. my-release-pulsar-bookie-init-h6tfz 0/1 Completed 0 6m48s
  20. my-release-pulsar-broker-0 1/1 Running 0 6m48s
  21. my-release-pulsar-broker-1 1/1 Running 0 6m48s
  22. my-release-pulsar-proxy-0 1/1 Running 0 6m49s
  23. my-release-pulsar-proxy-1 1/1 Running 0 6m48s
  24. my-release-pulsar-pulsar-init-d2t56 0/1 Completed 0 6m48s
  25. my-release-pulsar-recovery-0 1/1 Running 0 6m49s
  26. my-release-pulsar-toolset-0 1/1 Running 0 6m49s
  27. my-release-pulsar-zookeeper-0 1/1 Running 0 6m49s
  28. my-release-pulsar-zookeeper-1 1/1 Running 0 6m
  29. my-release-pulsar-zookeeper-2 1/1 Running 0 6m26s

所有组建启用后,Milvus 集群的 status 显示为 Healthy

  1. ...
  2. status:
  3. conditions:
  4. - lastTransitionTime: "2021-11-02T05:59:41Z"
  5. reason: StorageReady
  6. status: "True"
  7. type: StorageReady
  8. - lastTransitionTime: "2021-11-02T06:06:23Z"
  9. message: Pulsar is ready
  10. reason: PulsarReady
  11. status: "True"
  12. type: PulsarReady
  13. - lastTransitionTime: "2021-11-02T05:59:41Z"
  14. message: Etcd endpoints is healthy
  15. reason: EtcdReady
  16. status: "True"
  17. type: EtcdReady
  18. - lastTransitionTime: "2021-11-02T06:12:36Z"
  19. message: All Milvus components are healthy
  20. reason: MilvusClusterHealthy
  21. status: "True"
  22. type: MilvusReady
  23. endpoint: my-release-milvus.default:19530
  24. status: Healthy

再次检查 Milvus pod 状态。你可以看到所有 pod 都在运行中。

  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-etcd-0 1/1 Running 0 14m
  3. my-release-etcd-1 1/1 Running 0 14m
  4. my-release-etcd-2 1/1 Running 0 14m
  5. my-release-milvus-datacoord-6c7bb4b488-k9htl 1/1 Running 0 6m
  6. my-release-milvus-datanode-5c686bd65-wxtmf 1/1 Running 0 6m
  7. my-release-milvus-indexcoord-586b9f4987-vb7m4 1/1 Running 0 6m
  8. my-release-milvus-indexnode-5b9787b54-xclbx 1/1 Running 0 6m
  9. my-release-milvus-proxy-84f67cdb7f-pg6wf 1/1 Running 0 6m
  10. my-release-milvus-querycoord-865cc56fb4-w2jmn 1/1 Running 0 6m
  11. my-release-milvus-querynode-5bcb59f6-nhqqw 1/1 Running 0 6m
  12. my-release-milvus-rootcoord-fdcccfc84-9964g 1/1 Running 0 6m
  13. my-release-minio-0 1/1 Running 0 14m
  14. my-release-minio-1 1/1 Running 0 14m
  15. my-release-minio-2 1/1 Running 0 14m
  16. my-release-minio-3 1/1 Running 0 14m
  17. my-release-pulsar-bookie-0 1/1 Running 0 14m
  18. my-release-pulsar-bookie-1 1/1 Running 0 14m
  19. my-release-pulsar-bookie-init-h6tfz 0/1 Completed 0 14m
  20. my-release-pulsar-broker-0 1/1 Running 0 14m
  21. my-release-pulsar-broker-1 1/1 Running 0 14m
  22. my-release-pulsar-proxy-0 1/1 Running 0 14m
  23. my-release-pulsar-proxy-1 1/1 Running 0 14m
  24. my-release-pulsar-pulsar-init-d2t56 0/1 Completed 0 14m
  25. my-release-pulsar-recovery-0 1/1 Running 0 14m
  26. my-release-pulsar-toolset-0 1/1 Running 0 14m
  27. my-release-pulsar-zookeeper-0 1/1 Running 0 14m
  28. my-release-pulsar-zookeeper-1 1/1 Running 0 13m
  29. my-release-pulsar-zookeeper-2 1/1 Running 0 13m

安装分布式版 Milvus 后,你可以学习如何 管理 Milvus 连接

卸载分布式版 Milvus

运行如下指令,卸载分布式版 Milvus。

  1. $ kubectl delete mc my-release
  • 使用默认配置删除 Milvus 实例时,不会一同删除 etcd、Pulsar、MinIO 等其他第三方组件。因此,下次安装 Milvus 实例时,可再次使用上述第三方组件。.
  • 如需同时在虚拟私有云(PVC)中删除第三方组件,详见 配置文件
  • 删除 K8s 集群

    无需再使用测试环境中的 K8s 集群时,你可以删除集群。

    如果你使用 minikube 安装 K8s 集群,运行指令 $ minikube delete

    如果你使用 kind 安装 K8s 集群,运行指令 $ kind delete cluster --name myk8s

    更多内容

    安装 Milvus 后,你可以: