背景
Nvidia device 插件独享GPU插件,共享gpu插件之前有AliyunContainerService/gpushare-device-plugin 但是已经比较久没有人维护。共享主要对应公司小模型,调用量不大,独占显卡太浪费情况。
管理GPU资源
共享、独享gpu资源怎么划分。gpu device-plugin必须是互斥的,例如安装Nvidia device 插件机器不要安装volcano-device-plugin。因此可以通过Node 标签进行锁定,通过daemonset 控制器管理这些插件部署。
独享GPU管理
机器设置
需要使用Nvidia device插件管理独享gpu资源机器, 分类标签classify=nvidia-gpu-plugin
kubectl label node <node-name> classify=nvidia-gpu-plugin
插件部署
部署nvidia-device-plugin.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
priorityClassName: "system-node-critical"
nodeSelector:
classify: nvidia-gpu-plugin # nvidia-gpu-plugin 独享gpu机器
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.10.0
name: nvidia-device-plugin-ctr
args: ["--fail-on-init-error=false"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
提交配置
$ kubectl apply -f nvidia-device-plugin.yaml -n kube-system
共享GPU管理
机器配置
需要使用volcan device 插件管理共享gpu资源机器, 分类标签classify=volcano-gpu-share
kubectl label node <node-name> classify=volcano-gpu-share
插件部署
参考: https://github.com/volcano-sh/devices/blob/master/README.md#quick-start
volcano-device-plugin.yml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: volcano-device-plugin
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: volcano-device-plugin
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes/status"]
verbs: ["patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: volcano-device-plugin
subjects:
- kind: ServiceAccount
name: volcano-device-plugin
namespace: kube-system
roleRef:
kind: ClusterRole
name: volcano-device-plugin
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: volcano-device-plugin
namespace: kube-system
spec:
selector:
matchLabels:
name: volcano-device-plugin
updateStrategy:
type: RollingUpdate
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: volcano-device-plugin
spec:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: volcano.sh/gpu-memory
operator: Exists
effect: NoSchedule
priorityClassName: "system-node-critical"
serviceAccount: volcano-device-plugin
nodeSelector:
classify: volcano-gpu-share # 只有打上共享gpu标签机器才启动此插件
containers:
- image: volcanosh/volcano-device-plugin:latest
name: volcano-device-plugin
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
提交配置
$ kubectl apply -f volcano-device-plugin.yml -n kube-system
调度器打开共享选项
$ kubectl edit cm -n volcano-system volcano-scheduler-configmap
配置下面:
kind: ConfigMap
apiVersion: v1
metadata:
name: volcano-scheduler-configmap
namespace: volcano-system
data:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
- name: conformance
- plugins:
- name: drf
- name: predicates
arguments:
predicate.GPUSharingEnable: true # enable gpu sharing
- name: proportion
- name: nodeorder
- name: binpack
#... 其他配置
检查启动是否成功:
$ kubectl describe nodes
....
Capacity:
cpu: 64
ephemeral-storage: 148960192Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 528054860Ki
nvidia.com/gpu: 5 # 出现nvidia.com/gpu nvidia插件代表成功了
pods: 110
volcano.sh/gpu-memory: 97072 # 出现volcano.sh/gpu-memory volcano代表成功了
volcano.sh/gpu-number: 4
....
节点角色转
如果把共享、独享GPU角色进行转化,不能直接修改标签配置,这样导致已经运行实例不会转化到其他机器上。需要先下架机器,等待所以任务完成、主要驱逐以后再修改标签
配置机器不调度状态
$ kubectl cordon <node-name>
等待节点pod退出,主动剔除pod
修改节点标签
$ kubectl label node <node-name> --overwrite classify=[volcano-gpu-share|nvidia-gpu-plugin]
恢复节点调度
$ kubectl uncordon <node-name>