背景
目前单调度器在某些场景下无法满足高吞吐量的需求。除了针对单个调度器的性能优化之外,另一种选择是部署多个火山调度器来提高整体调度吞吐量。
解决
之前我们使用label将集群节点划分为多个section,每个火山调度器负责一个section,然后在Pod Spec中指定schedulerName并提交。这在某些情况下是不方便的,尤其是对于大型集群。该文档为用户提供了另一个选项来部署多个调度程序,它需要对工作负载和节点进行较少的修改。statefulset 用于部署火山调度程序。作业和节点根据哈希算法自动分配给调度程序。
通过statefulset启动多个实例, 由于statefulset里面每个POD都有独立id, ID 0 ~ replica -1, 便于实例获取id,进行任务切分。
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: volcano-scheduler
namespace: volcano-system
labels:
app: volcano-scheduler
spec:
replicas: 3
selector:
matchLabels:
app: volcano-scheduler
serviceName: "volcano-scheduler"
template:
metadata:
labels:
app: volcano-scheduler
spec:
serviceAccount: volcano-scheduler
containers:
- name: volcano-scheduler
image: volcanosh/vc-scheduler:ae78900d21dce8522eb04b6817aac66c9abd01e2
args:
- --logtostderr
- --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
- -v=3
- 2>&1
imagePullPolicy: "IfNotPresent"
env:
- name: MULTI_SCHEDULER_ENABLE
value: "true"
- name: SCHEDULER_NUM
value: "3"
- name: SCHEDULER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: scheduler-config
mountPath: /volcano.scheduler
volumes:
- name: scheduler-config
configMap:
name: volcano-scheduler-configmap
---
apiVersion: v1
kind: Service
metadata:
name: volcano-scheduler
labels:
app: volcano-scheduler
spec:
ports:
- port: 80
name: volcano-scheduler
clusterIP: None
selector:
app: volcano-scheduler
注意: SCHEDULER_NUM 需要和replica一致