背景
Nvidia device 插件独享GPU插件,共享gpu插件之前有AliyunContainerService/gpushare-device-plugin 但是已经比较久没有人维护。共享主要对应公司小模型,调用量不大,独占显卡太浪费情况。
管理GPU资源
共享、独享gpu资源怎么划分。gpu device-plugin必须是互斥的,例如安装Nvidia device 插件机器不要安装volcano-device-plugin。因此可以通过Node 标签进行锁定,通过daemonset 控制器管理这些插件部署。
独享GPU管理
机器设置
需要使用Nvidia device插件管理独享gpu资源机器, 分类标签classify=nvidia-gpu-plugin
kubectl label node <node-name> classify=nvidia-gpu-plugin
插件部署
部署nvidia-device-plugin.yaml
apiVersion: apps/v1kind: DaemonSetmetadata:name: nvidia-device-plugin-daemonsetnamespace: kube-systemspec:selector:matchLabels:name: nvidia-device-plugin-dsupdateStrategy:type: RollingUpdatetemplate:metadata:annotations:scheduler.alpha.kubernetes.io/critical-pod: ""labels:name: nvidia-device-plugin-dsspec:tolerations:- key: CriticalAddonsOnlyoperator: Exists- key: nvidia.com/gpuoperator: Existseffect: NoSchedulepriorityClassName: "system-node-critical"nodeSelector:classify: nvidia-gpu-plugin # nvidia-gpu-plugin 独享gpu机器containers:- image: nvcr.io/nvidia/k8s-device-plugin:v0.10.0name: nvidia-device-plugin-ctrargs: ["--fail-on-init-error=false"]securityContext:allowPrivilegeEscalation: falsecapabilities:drop: ["ALL"]volumeMounts:- name: device-pluginmountPath: /var/lib/kubelet/device-pluginsvolumes:- name: device-pluginhostPath:path: /var/lib/kubelet/device-plugins
提交配置
$ kubectl apply -f nvidia-device-plugin.yaml -n kube-system
共享GPU管理
机器配置
需要使用volcan device 插件管理共享gpu资源机器, 分类标签classify=volcano-gpu-share
kubectl label node <node-name> classify=volcano-gpu-share
插件部署
参考: https://github.com/volcano-sh/devices/blob/master/README.md#quick-start
volcano-device-plugin.yml
---apiVersion: v1kind: ServiceAccountmetadata:name: volcano-device-pluginnamespace: kube-system---kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata:name: volcano-device-pluginrules:- apiGroups: [""]resources: ["nodes"]verbs: ["get"]- apiGroups: [""]resources: ["nodes/status"]verbs: ["patch"]- apiGroups: [""]resources: ["pods"]verbs: ["get", "list", "update"]---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata:name: volcano-device-pluginsubjects:- kind: ServiceAccountname: volcano-device-pluginnamespace: kube-systemroleRef:kind: ClusterRolename: volcano-device-pluginapiGroup: rbac.authorization.k8s.io---apiVersion: apps/v1kind: DaemonSetmetadata:name: volcano-device-pluginnamespace: kube-systemspec:selector:matchLabels:name: volcano-device-pluginupdateStrategy:type: RollingUpdatetemplate:metadata:annotations:scheduler.alpha.kubernetes.io/critical-pod: ""labels:name: volcano-device-pluginspec:tolerations:- key: CriticalAddonsOnlyoperator: Exists- key: volcano.sh/gpu-memoryoperator: Existseffect: NoSchedulepriorityClassName: "system-node-critical"serviceAccount: volcano-device-pluginnodeSelector:classify: volcano-gpu-share # 只有打上共享gpu标签机器才启动此插件containers:- image: volcanosh/volcano-device-plugin:latestname: volcano-device-pluginenv:- name: NODE_NAMEvalueFrom:fieldRef:fieldPath: spec.nodeNamesecurityContext:allowPrivilegeEscalation: falsecapabilities:drop: ["ALL"]volumeMounts:- name: device-pluginmountPath: /var/lib/kubelet/device-pluginsvolumes:- name: device-pluginhostPath:path: /var/lib/kubelet/device-plugins
提交配置
$ kubectl apply -f volcano-device-plugin.yml -n kube-system
调度器打开共享选项
$ kubectl edit cm -n volcano-system volcano-scheduler-configmap
配置下面:
kind: ConfigMapapiVersion: v1metadata:name: volcano-scheduler-configmapnamespace: volcano-systemdata:volcano-scheduler.conf: |actions: "enqueue, allocate, backfill"tiers:- plugins:- name: priority- name: gang- name: conformance- plugins:- name: drf- name: predicatesarguments:predicate.GPUSharingEnable: true # enable gpu sharing- name: proportion- name: nodeorder- name: binpack#... 其他配置
检查启动是否成功:
$ kubectl describe nodes....Capacity:cpu: 64ephemeral-storage: 148960192Kihugepages-1Gi: 0hugepages-2Mi: 0memory: 528054860Kinvidia.com/gpu: 5 # 出现nvidia.com/gpu nvidia插件代表成功了pods: 110volcano.sh/gpu-memory: 97072 # 出现volcano.sh/gpu-memory volcano代表成功了volcano.sh/gpu-number: 4....
节点角色转
如果把共享、独享GPU角色进行转化,不能直接修改标签配置,这样导致已经运行实例不会转化到其他机器上。需要先下架机器,等待所以任务完成、主要驱逐以后再修改标签
配置机器不调度状态
$ kubectl cordon <node-name>
等待节点pod退出,主动剔除pod
修改节点标签
$ kubectl label node <node-name> --overwrite classify=[volcano-gpu-share|nvidia-gpu-plugin]
恢复节点调度
$ kubectl uncordon <node-name>
