背景
目前单调度器在某些场景下无法满足高吞吐量的需求。除了针对单个调度器的性能优化之外,另一种选择是部署多个火山调度器来提高整体调度吞吐量。
解决
之前我们使用label将集群节点划分为多个section,每个火山调度器负责一个section,然后在Pod Spec中指定schedulerName并提交。这在某些情况下是不方便的,尤其是对于大型集群。该文档为用户提供了另一个选项来部署多个调度程序,它需要对工作负载和节点进行较少的修改。statefulset 用于部署火山调度程序。作业和节点根据哈希算法自动分配给调度程序。
通过statefulset启动多个实例, 由于statefulset里面每个POD都有独立id, ID 0 ~ replica -1, 便于实例获取id,进行任务切分。
kind: StatefulSetapiVersion: apps/v1metadata:name: volcano-schedulernamespace: volcano-systemlabels:app: volcano-schedulerspec:replicas: 3selector:matchLabels:app: volcano-schedulerserviceName: "volcano-scheduler"template:metadata:labels:app: volcano-schedulerspec:serviceAccount: volcano-schedulercontainers:- name: volcano-schedulerimage: volcanosh/vc-scheduler:ae78900d21dce8522eb04b6817aac66c9abd01e2args:- --logtostderr- --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf- -v=3- 2>&1imagePullPolicy: "IfNotPresent"env:- name: MULTI_SCHEDULER_ENABLEvalue: "true"- name: SCHEDULER_NUMvalue: "3"- name: SCHEDULER_POD_NAMEvalueFrom:fieldRef:fieldPath: metadata.namevolumeMounts:- name: scheduler-configmountPath: /volcano.schedulervolumes:- name: scheduler-configconfigMap:name: volcano-scheduler-configmap---apiVersion: v1kind: Servicemetadata:name: volcano-schedulerlabels:app: volcano-schedulerspec:ports:- port: 80name: volcano-schedulerclusterIP: Noneselector:app: volcano-scheduler
注意: SCHEDULER_NUM 需要和replica一致
