背景

目前单调度器在某些场景下无法满足高吞吐量的需求。除了针对单个调度器的性能优化之外,另一种选择是部署多个火山调度器来提高整体调度吞吐量。

解决

之前我们使用label将集群节点划分为多个section,每个火山调度器负责一个section,然后在Pod Spec中指定schedulerName并提交。这在某些情况下是不方便的,尤其是对于大型集群。该文档为用户提供了另一个选项来部署多个调度程序,它需要对工作负载和节点进行较少的修改。statefulset 用于部署火山调度程序。作业和节点根据哈希算法自动分配给调度程序。

通过statefulset启动多个实例, 由于statefulset里面每个POD都有独立id, ID 0 ~ replica -1, 便于实例获取id,进行任务切分。

  1. kind: StatefulSet
  2. apiVersion: apps/v1
  3. metadata:
  4. name: volcano-scheduler
  5. namespace: volcano-system
  6. labels:
  7. app: volcano-scheduler
  8. spec:
  9. replicas: 3
  10. selector:
  11. matchLabels:
  12. app: volcano-scheduler
  13. serviceName: "volcano-scheduler"
  14. template:
  15. metadata:
  16. labels:
  17. app: volcano-scheduler
  18. spec:
  19. serviceAccount: volcano-scheduler
  20. containers:
  21. - name: volcano-scheduler
  22. image: volcanosh/vc-scheduler:ae78900d21dce8522eb04b6817aac66c9abd01e2
  23. args:
  24. - --logtostderr
  25. - --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
  26. - -v=3
  27. - 2>&1
  28. imagePullPolicy: "IfNotPresent"
  29. env:
  30. - name: MULTI_SCHEDULER_ENABLE
  31. value: "true"
  32. - name: SCHEDULER_NUM
  33. value: "3"
  34. - name: SCHEDULER_POD_NAME
  35. valueFrom:
  36. fieldRef:
  37. fieldPath: metadata.name
  38. volumeMounts:
  39. - name: scheduler-config
  40. mountPath: /volcano.scheduler
  41. volumes:
  42. - name: scheduler-config
  43. configMap:
  44. name: volcano-scheduler-configmap
  45. ---
  46. apiVersion: v1
  47. kind: Service
  48. metadata:
  49. name: volcano-scheduler
  50. labels:
  51. app: volcano-scheduler
  52. spec:
  53. ports:
  54. - port: 80
  55. name: volcano-scheduler
  56. clusterIP: None
  57. selector:
  58. app: volcano-scheduler

注意: SCHEDULER_NUM 需要和replica一致