背景

Nvidia device 插件独享GPU插件,共享gpu插件之前有AliyunContainerService/gpushare-device-plugin 但是已经比较久没有人维护。共享主要对应公司小模型,调用量不大,独占显卡太浪费情况。

管理GPU资源

共享、独享gpu资源怎么划分。gpu device-plugin必须是互斥的,例如安装Nvidia device 插件机器不要安装volcano-device-plugin。因此可以通过Node 标签进行锁定,通过daemonset 控制器管理这些插件部署。

独享GPU管理

机器设置

需要使用Nvidia device插件管理独享gpu资源机器, 分类标签classify=nvidia-gpu-plugin

  1. kubectl label node <node-name> classify=nvidia-gpu-plugin

插件部署

部署nvidia-device-plugin.yaml

  1. apiVersion: apps/v1
  2. kind: DaemonSet
  3. metadata:
  4. name: nvidia-device-plugin-daemonset
  5. namespace: kube-system
  6. spec:
  7. selector:
  8. matchLabels:
  9. name: nvidia-device-plugin-ds
  10. updateStrategy:
  11. type: RollingUpdate
  12. template:
  13. metadata:
  14. annotations:
  15. scheduler.alpha.kubernetes.io/critical-pod: ""
  16. labels:
  17. name: nvidia-device-plugin-ds
  18. spec:
  19. tolerations:
  20. - key: CriticalAddonsOnly
  21. operator: Exists
  22. - key: nvidia.com/gpu
  23. operator: Exists
  24. effect: NoSchedule
  25. priorityClassName: "system-node-critical"
  26. nodeSelector:
  27. classify: nvidia-gpu-plugin # nvidia-gpu-plugin 独享gpu机器
  28. containers:
  29. - image: nvcr.io/nvidia/k8s-device-plugin:v0.10.0
  30. name: nvidia-device-plugin-ctr
  31. args: ["--fail-on-init-error=false"]
  32. securityContext:
  33. allowPrivilegeEscalation: false
  34. capabilities:
  35. drop: ["ALL"]
  36. volumeMounts:
  37. - name: device-plugin
  38. mountPath: /var/lib/kubelet/device-plugins
  39. volumes:
  40. - name: device-plugin
  41. hostPath:
  42. path: /var/lib/kubelet/device-plugins

提交配置

  1. $ kubectl apply -f nvidia-device-plugin.yaml -n kube-system

共享GPU管理

机器配置

需要使用volcan device 插件管理共享gpu资源机器, 分类标签classify=volcano-gpu-share

  1. kubectl label node <node-name> classify=volcano-gpu-share

插件部署

参考: https://github.com/volcano-sh/devices/blob/master/README.md#quick-start
volcano-device-plugin.yml

  1. ---
  2. apiVersion: v1
  3. kind: ServiceAccount
  4. metadata:
  5. name: volcano-device-plugin
  6. namespace: kube-system
  7. ---
  8. kind: ClusterRole
  9. apiVersion: rbac.authorization.k8s.io/v1
  10. metadata:
  11. name: volcano-device-plugin
  12. rules:
  13. - apiGroups: [""]
  14. resources: ["nodes"]
  15. verbs: ["get"]
  16. - apiGroups: [""]
  17. resources: ["nodes/status"]
  18. verbs: ["patch"]
  19. - apiGroups: [""]
  20. resources: ["pods"]
  21. verbs: ["get", "list", "update"]
  22. ---
  23. kind: ClusterRoleBinding
  24. apiVersion: rbac.authorization.k8s.io/v1
  25. metadata:
  26. name: volcano-device-plugin
  27. subjects:
  28. - kind: ServiceAccount
  29. name: volcano-device-plugin
  30. namespace: kube-system
  31. roleRef:
  32. kind: ClusterRole
  33. name: volcano-device-plugin
  34. apiGroup: rbac.authorization.k8s.io
  35. ---
  36. apiVersion: apps/v1
  37. kind: DaemonSet
  38. metadata:
  39. name: volcano-device-plugin
  40. namespace: kube-system
  41. spec:
  42. selector:
  43. matchLabels:
  44. name: volcano-device-plugin
  45. updateStrategy:
  46. type: RollingUpdate
  47. template:
  48. metadata:
  49. annotations:
  50. scheduler.alpha.kubernetes.io/critical-pod: ""
  51. labels:
  52. name: volcano-device-plugin
  53. spec:
  54. tolerations:
  55. - key: CriticalAddonsOnly
  56. operator: Exists
  57. - key: volcano.sh/gpu-memory
  58. operator: Exists
  59. effect: NoSchedule
  60. priorityClassName: "system-node-critical"
  61. serviceAccount: volcano-device-plugin
  62. nodeSelector:
  63. classify: volcano-gpu-share # 只有打上共享gpu标签机器才启动此插件
  64. containers:
  65. - image: volcanosh/volcano-device-plugin:latest
  66. name: volcano-device-plugin
  67. env:
  68. - name: NODE_NAME
  69. valueFrom:
  70. fieldRef:
  71. fieldPath: spec.nodeName
  72. securityContext:
  73. allowPrivilegeEscalation: false
  74. capabilities:
  75. drop: ["ALL"]
  76. volumeMounts:
  77. - name: device-plugin
  78. mountPath: /var/lib/kubelet/device-plugins
  79. volumes:
  80. - name: device-plugin
  81. hostPath:
  82. path: /var/lib/kubelet/device-plugins

提交配置

  1. $ kubectl apply -f volcano-device-plugin.yml -n kube-system

调度器打开共享选项

  1. $ kubectl edit cm -n volcano-system volcano-scheduler-configmap

配置下面:

  1. kind: ConfigMap
  2. apiVersion: v1
  3. metadata:
  4. name: volcano-scheduler-configmap
  5. namespace: volcano-system
  6. data:
  7. volcano-scheduler.conf: |
  8. actions: "enqueue, allocate, backfill"
  9. tiers:
  10. - plugins:
  11. - name: priority
  12. - name: gang
  13. - name: conformance
  14. - plugins:
  15. - name: drf
  16. - name: predicates
  17. arguments:
  18. predicate.GPUSharingEnable: true # enable gpu sharing
  19. - name: proportion
  20. - name: nodeorder
  21. - name: binpack
  22. #... 其他配置

检查启动是否成功:

  1. $ kubectl describe nodes
  2. ....
  3. Capacity:
  4. cpu: 64
  5. ephemeral-storage: 148960192Ki
  6. hugepages-1Gi: 0
  7. hugepages-2Mi: 0
  8. memory: 528054860Ki
  9. nvidia.com/gpu: 5 # 出现nvidia.com/gpu nvidia插件代表成功了
  10. pods: 110
  11. volcano.sh/gpu-memory: 97072 # 出现volcano.sh/gpu-memory volcano代表成功了
  12. volcano.sh/gpu-number: 4
  13. ....

节点角色转

如果把共享、独享GPU角色进行转化,不能直接修改标签配置,这样导致已经运行实例不会转化到其他机器上。需要先下架机器,等待所以任务完成、主要驱逐以后再修改标签

  1. 配置机器不调度状态

    1. $ kubectl cordon <node-name>
  2. 等待节点pod退出,主动剔除pod

  3. 修改节点标签

    1. $ kubectl label node <node-name> --overwrite classify=[volcano-gpu-share|nvidia-gpu-plugin]
  4. 恢复节点调度

    1. $ kubectl uncordon <node-name>