默认的scheduler的调度过程
1、预选策略:从所有节点当中选择基本符合选择条件的节点。
2、优选函数:在众多符合基本条件的节点中使用优选函数,计算节点各自的得分,通过比较进行排序。
3、从最高得分的节点中随机选择出一个作为Pod运行的节点。可以通过自己的预设来影响预选、优选过程,从而实现符合我们期望的调度结果。
影响调度方式
1、节点选择器:nodeSelector,甚至可以设置nodename来选择节点本身。
2、亲和性调度:nodeAffinity(节点亲和性)、podAffinity(Pod亲和性)、podAntiAffinity(Pod的反亲和性)
3、污点和容忍度:Taint、toleration
| 策略名称 | 匹配目标 | 支持的操作符 | 支持拓扑域 | 设计目标 |
|---|---|---|---|---|
| nodeAffinity | 主机标签 | In,NotIn,Exists,DoesNotExist,Gt,Lt | 不支持 | 决定Pod可以部署在哪些主机上 |
| podAffinity | Pod标签 | In,NotIn,Exists,DoesNotExist | 支持 | 决定Pod可以和哪些Pod部署在同一拓扑域 |
| podAntiAffinity | Pod标签 | In,NotIn,Exists,DoesNotExist | 支持 | 决定Pod不可以和哪些Pod部署在同一拓扑域 |
匹配表达式
亲和性策略(Affinity)能够提供比NodeSelector或者Taints更灵活丰富的调度方式,例如:丰富的匹配表达式(In, NotIn, Exists, DoesNotExist. Gt, and Lt)这里的匹配逻辑是label在某个列表中,可选的操作符有:
- In: label的值在某个列表中
- NotIn:label的值不在某个列表中
- Exists:某个label存在
- DoesNotExist:某个label不存在
- Gt:label的值大于某个值(字符串比较)
- Lt:label的值小于某个值(字符串比较)
Pod与Node关系调度
nodeSelector:定向调度
可以通过nodeSelector,将pod定向调度到指定Node节点上
(1)、首先通过kubectl label给Node节点打上标签
kubectl label nodes <node-name> <label-key>=<label-value>
删除label,在key后面加一个“-”
kubectl label nodes <node-name> <label-key>-
(2)、在Pod定义中加上nodeSelector的设置
apiVersion: apps/v1kind: Deploymentmetadata:name: nginx-deploymentspec:selector:matchLabels:app: nginxreplicas: 1template:metadata:labels:app: nginxspec:containers:- name: nginximage: nginx:latestnodeSelector:key-1: value-1
需要注意的是, 如果我们指定了Pod的nodeSelector条件, 且在集群中不存在包含相应标签的Node, 则即使在集群中还有其他可供使用的Node, 这个Pod也无法被成功调度
nodeAffinity:Node亲和性调度
kubectl explain pod.spec.affinity.nodeAffinity
- requiredDuringSchedulingIgnoredDuringExecution 硬亲和性 (预选)必须满足亲和性
- matchExpressions 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。就是在node节点中包含foo和bar的标签中调度
- matchFields 匹配字段,Node属性(比如:metadate.name)
- matchExpressions 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。就是在node节点中包含foo和bar的标签中调度
preferredDuringSchedulingIgnoredDuringExecution 软亲和性(优选) 能满足最好,不满足也没关系
- preference 优先级
- matchExpressions
- matchFields
- matchExpressions
- weight 权重1-100范围内
例:硬亲和性,pod只会被调度到存在zone=foo或zone=bar标签的节点上,若没有节点存在此标签,则pod将不会被调度到任何节点上,即为Pending状态。
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deploy namespace: default spec: selector: matchLabels: app: nginx replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: 预选 nodeSelectorTerms: - matchExpressions: // 匹配Node节点label - key: key operator: In values: - value - foo matchFields: // 匹配Node节点属性 - key: metadata.name operator: In values: - k8s-node1 preferredDuringSchedulingIgnoredDuringExecution: 优选 - weight: 40 // 权重 preference: matchExpressions: // 匹配Node节点label - key: disk-type operator: In values: - ssd - weight: 50 // 权重 preference: matchFields: // 匹配Node节点字段 - key: metadata.name operator: In values: - k8s-node1- preference 优先级
nodeAffinity总结:
- 同时指定
nodeSelectorandnodeAffinity,pod必须都满足 nodeAffinity有多个nodeSelectorTerms,pod只需满足一个nodeSelectorTerms多个matchExpressions,pod必须都满足- 由于
IgnoredDuringExecution,所以改变labels不会影响已经运行pod
Pod与Pod关系调度
podAffinity : Pod亲和性调度
必须跟某些Pod调度到一起
requiredDuringSchedulingIgnoredDuringExecution
优先和某些Pod调度到一起
preferredDuringSchedulingIgnoredDuringExecution
topologyKey: 拓扑域
kuberbetes topologyKey 官方文档:topologyKey
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-test-affinity
namespace: default
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution: (预选)必须跟某些pod调度到一起
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname // 通过node的label的key值确定一组node组
preferredDuringSchedulingIgnoredDuringExecution: (优选)
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
如果在创建之前,删除Node节点kubernetes.io/hostname的标签,创建的pod会因为没有匹配的node节点出现pending。
podAntiAffinity:Pod反亲和性
禁止跟某些Pod调度到一起
requiredDuringSchedulingIgnoredDuringExecution
优先不和某些Pod调度到一起
preferredDuringSchedulingIgnoredDuringExecution
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-podantiaffinity
namespace: default
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
