事件查看

  1. [root@clientvm ~]# kubectl describe pod nginx2
  2. [root@clientvm ~]# kubectl logs nginx2

节点维护

禁止调度

  1. [root@clientvm ~]# kubectl cordon worker1.example.com
  2. node/worker1.example.com cordoned
  3. [root@clientvm ~]# kubectl get node
  4. NAME STATUS ROLES AGE VERSION
  5. master.example.com Ready master 19d v1.19.4
  6. worker1.example.com Ready,SchedulingDisabled <none> 18d v1.19.4
  7. worker2.example.com Ready <none> 18d v1.19.4

驱逐Pod

  1. [root@clientvm ~]# kubectl drain worker1.example.com --ignore-daemonsets
  2. node/worker1.example.com already cordoned
  3. WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-c9ghq
  4. node/worker1.example.com drained

重新开始调度

  1. [root@clientvm ~]# kubectl uncordon worker1.example.com
  2. node/worker1.example.com uncordoned
  3. [root@clientvm ~]#
  4. [root@clientvm ~]# kubectl get node
  5. NAME STATUS ROLES AGE VERSION
  6. master.example.com Ready master 19d v1.19.4
  7. worker1.example.com Ready <none> 18d v1.19.4
  8. worker2.example.com Ready <none> 18d v1.19.4

cordon 停止调度
影响最小,只会将node调为SchedulingDisabled
之后再发创建pod,不会被调度到该节点
旧有的pod不会受到影响,仍正常对外提供服务
恢复调度
kubectl uncordon node_name
================================================
drain 驱逐节点
首先,驱逐node上的pod,其他节点重新创建
接着,将节点调为 SchedulingDisabled
恢复调度
kubectl uncordon node_name

安装Metrics Server

Metrics Server是Kubernetes内置自动缩放管道的可扩展,高效的容器资源指标来源。
Metrics Server从Kubelet收集资源指标,并通过Metrics API在Kubernetes apiserver中公开它们,以供Horizontal Pod Autoscaler使用。 还可以通过kubectl top访问Metrics API,从而更容易调试自动缩放。
Metrics Server提供:

  • 适用于大多数群集的Deployment
  • 可扩展支持多达5,000个节点集群
  • 资源效率:Metrics Server使用1m核心CPU和每个节点3 MB内存

安装参考官方文档:
https://github.com/kubernetes-sigs/metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
下载components.yaml后要修改如下内容:

  1. containers:
  2. - args:
  3. - --cert-dir=/tmp
  4. - --secure-port=4443
  5. - --kubelet-preferred-address-types=InternalIP
  6. - --kubelet-insecure-tls
  7. - --kubelet-use-node-status-port
  8. image: bitnami/metrics-server:0.4.1
  1. [root@clientvm ~]# kubectl apply -f /resources/yaml/metrics-server-components.yaml
  2. serviceaccount/metrics-server created
  3. clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
  4. clusterrole.rbac.authorization.k8s.io/system:metrics-server created
  5. rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
  6. clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
  7. clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
  8. service/metrics-server created
  9. deployment.apps/metrics-server created

查看

  1. [root@clientvm ~]# kubectl top node
  2. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  3. master.example.com 119m 2% 1417Mi 51%
  4. worker1.example.com 24m 0% 953Mi 20%
  5. worker2.example.com 1079m 26% 1079Mi 23%
  6. [root@clientvm ~]# kubectl top pod
  7. NAME CPU(cores) MEMORY(bytes)
  8. nginx-taint 0m 1Mi
  9. nginx2 0m 1Mi
  10. readiness-exec 2m 0Mi
  11. test-pod-secret-volume 0m 1Mi
  12. with-node-affinity 0m 1Mi
  13. with-pod-affinity 0m 1Mi

HPA(Horizontal Pod Autoscaler)

Pod 水平自动扩缩(Horizontal Pod Autoscaler) 可以基于 CPU 利用率自动扩缩 Deployment、ReplicaSet 和 StatefulSet 中的 Pod 数量。 Pod 自动扩缩不适用于无法扩缩的对象,比如 DaemonSet。

Pod 水平自动扩缩器的实现是一个控制回路,由控制器管理器(/etc/kubernetes/manifests/kube-controller-manager.yaml)的 --horizontal-pod-autoscaler-sync-period 参数指定周期(默认值为 15 秒),其他可用值:
—horizontal-pod-autoscaler-downscale-stabilization:即自从上次缩容执行结束后,多久可以再次执行缩容,默认时间是 5 分钟
--horizontal-pod-autoscaler-initial-readiness-delay 参数(默认为 30s)用于设置 Pod 准备时间, 在此时间内的 Pod 统统被认为未就绪。
--horizontal-pod-autoscaler-cpu-initialization-period 参数(默认为5分钟) 用于设置 Pod 的初始化时间, 在此时间内的 Pod,CPU 资源度量值将不会被采纳。

对于按 Pod 统计的资源指标(如 CPU),控制器从资源指标 API 中获取每一个 HorizontalPodAutoscaler 指定的 Pod 的度量值,如果设置了目标使用率, 控制器获取每个 Pod 中的容器资源使用情况,并计算资源使用率。

注:自 Kubernetes 1.11 起,从 Heapster 获取指标特性已废弃。

HPA默认行为:

  1. behavior:
  2. scaleDown:
  3. stabilizationWindowSeconds: 300
  4. policies:
  5. - type: Percent
  6. value: 100
  7. periodSeconds: 15
  8. scaleUp:
  9. stabilizationWindowSeconds: 0
  10. policies:
  11. - type: Percent
  12. value: 100
  13. periodSeconds: 15
  14. - type: Pods
  15. value: 4
  16. periodSeconds: 15
  17. selectPolicy: Max

HPA实验
1) 创建Deployment & Service

  1. [root@clientvm ~]# cat hpa-example.yaml
  2. apiVersion: apps/v1
  3. kind: Deployment
  4. metadata:
  5. name: php-apache
  6. spec:
  7. selector:
  8. matchLabels:
  9. run: php-apache
  10. replicas: 1
  11. template:
  12. metadata:
  13. labels:
  14. run: php-apache
  15. spec:
  16. containers:
  17. - name: php-apache
  18. image: pilchard/hpa-example
  19. imagePullPolicy: IfNotPresent
  20. ports:
  21. - containerPort: 80
  22. resources:
  23. limits:
  24. cpu: 500m
  25. requests:
  26. cpu: 200m
  27. ---
  28. apiVersion: v1
  29. kind: Service
  30. metadata:
  31. name: php-apache
  32. labels:
  33. run: php-apache
  34. spec:
  35. ports:
  36. - port: 80
  37. selector:
  38. run: php-apache
  1. [root@clientvm ~]# kubectl apply -f hpa-example.yaml
  2. deployment.apps/php-apache created
  3. service/php-apache created
  4. [root@master manifests]# curl 10.105.137.102
  5. OK!

2) 创建HPA

  1. [root@clientvm ~]# kubectl autoscale deployment -h
  2. Usage:
  3. kubectl autoscale (-f FILENAME | TYPE NAME | TYPE/NAME) [--min=MINPODS] --max=MAXPODS [--cpu-percent=CPU] [options]
  4. Examples:
  5. # Auto scale a deployment "foo", with the number of pods between 2 and 10, no target CPU utilization specified so a
  6. default autoscaling policy will be used:
  7. kubectl autoscale deployment foo --min=2 --max=10
  8. # Auto scale a replication controller "foo", with the number of pods between 1 and 5, target CPU utilization at 80%:
  9. kubectl autoscale rc foo --max=5 --cpu-percent=80
  1. [root@clientvm ~]# kubectl autoscale deployment --max=10 php-apache --cpu-percent=30
  2. horizontalpodautoscaler.autoscaling/php-apache autoscaled
  3. [root@clientvm ~]# kubectl get hpa
  4. NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
  5. php-apache Deployment/php-apache 0%/30% 1 10 1 20s

3) 测试
在master上使用ab工具发送大量并发

  1. [root@master manifests]# ab -n 1000 -c 200 http://10.105.137.102/

在clientvm节点上观察

  1. [root@clientvm ~]# kubectl get pod -w
  2. NAME READY STATUS RESTARTS AGE
  3. nginx2 1/1 Running 0 4d23h
  4. php-apache-69f8f79bfc-sbzhm 1/1 Running 0 8m48s
  5. php-apache-69f8f79bfc-wdwwq 0/1 Pending 0 0s
  6. php-apache-69f8f79bfc-wdwwq 0/1 Pending 0 0s
  7. php-apache-69f8f79bfc-fkb7k 0/1 Pending 0 0s
  8. php-apache-69f8f79bfc-92pv4 0/1 Pending 0 0s
  9. php-apache-69f8f79bfc-fkb7k 0/1 Pending 0 0s
  10. php-apache-69f8f79bfc-92pv4 0/1 Pending 0 0s
  11. php-apache-69f8f79bfc-wdwwq 0/1 ContainerCreating 0 0s
  12. php-apache-69f8f79bfc-fkb7k 0/1 ContainerCreating 0 0s
  13. php-apache-69f8f79bfc-92pv4 0/1 ContainerCreating 0 0s
  14. php-apache-69f8f79bfc-92pv4 1/1 Running 0 2s
  15. php-apache-69f8f79bfc-fkb7k 1/1 Running 0 2s
  16. php-apache-69f8f79bfc-wdwwq 1/1 Running 0 2s
  17. php-apache-69f8f79bfc-fwb5x 0/1 Pending 0 0s
  18. php-apache-69f8f79bfc-fwb5x 0/1 Pending 0 0s
  19. php-apache-69f8f79bfc-s5624 0/1 Pending 0 0s
  20. php-apache-69f8f79bfc-qkf7f 0/1 Pending 0 0s
  21. php-apache-69f8f79bfc-s5624 0/1 Pending 0 0s
  22. php-apache-69f8f79bfc-jbgtp 0/1 Pending 0 0s
  23. php-apache-69f8f79bfc-qkf7f 0/1 Pending 0 0s
  24. php-apache-69f8f79bfc-fwb5x 0/1 ContainerCreating 0 0s
  25. php-apache-69f8f79bfc-jbgtp 0/1 Pending 0 0s
  26. php-apache-69f8f79bfc-s5624 0/1 ContainerCreating 0 0s
  27. php-apache-69f8f79bfc-qkf7f 0/1 ContainerCreating 0 0s
  28. php-apache-69f8f79bfc-jbgtp 0/1 ContainerCreating 0 0s
  29. php-apache-69f8f79bfc-qkf7f 1/1 Running 0 3s
  30. php-apache-69f8f79bfc-jbgtp 1/1 Running 0 3s
  31. php-apache-69f8f79bfc-fwb5x 1/1 Running 0 3s
  1. [root@clientvm k8s]# kubectl get hpa
  2. NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
  3. php-apache Deployment/php-apache 206%/30% 1 10 10 7m3s

HPA的算法:

  1. 期望副本数 = ceil[当前副本数 * (当前指标 / 期望指标)]

etcd

Backup Etcd

  1. 在master节点上查找etcd 容器

    1. [root@master manifests]# docker ps | grep etcd
    2. 44623e84772a d4ca8726196c "etcd --advertise-cl…" 6 days ago Up 6 days k8s_etcd_etcd-master.example.com_kube-system_1511fba334ccb18c8972b0adfa135f94_0
  2. copy etcdctl命令到本机

    1. [root@master manifests]# docker cp 44623e84772a:/usr/local/bin/etcdctl /usr/bin/
  3. 根据 静态Pod etcd.yaml文件的内容指定ca相关证书备份etcd

image.png

  1. [root@master manifests]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
  2. --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  3. --cert=/etc/kubernetes/pki/etcd/peer.crt \
  4. --key=/etc/kubernetes/pki/etcd/peer.key snapshot save /tmp/etcd.db
  1. 验证 ```bash [root@master manifests]# ETCDCTL_API=3 etcdctl —write-out=table snapshot status /tmp/etcd.db +—————+—————+——————+——————+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | +—————+—————+——————+——————+ | 88cde7c7 | 1498047 | 1854 | 3.3 MB | +—————+—————+——————+——————+
  1. <a name="Gl4Ri"></a>
  2. ## Restore Etcd
  3. 1. 停止 kubelet
  4. ```bash
  5. [root@master manifests]# systemctl stop kubelet.service
  1. 删除原有DB数据

    1. [root@master manifests]# cd /var/lib/etcd/
    2. [root@master etcd]# ls
    3. member
    4. [root@master etcd]# rm -rf /var/lib/etcd
  2. 根据 静态Pod etcd.yaml文件的内容,Restore Backup

image.png

  1. [root@master manifests]# ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd.db \
  2. > --data-dir="/var/lib/etcd" \
  3. > --name=master.example.com --skip-hash-check \
  4. > --initial-advertise-peer-urls=https://127.0.0.1:2380 \
  5. > --initial-cluster=master.example.com=https://127.0.0.1:2380
  1. 启动 kubelet 服务

    1. [root@master manifests]# systemctl start kubelet.service
  2. 验证

    1. [root@master manifests]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint health
    2. https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 11.884856ms
    1. [root@clientvm ~]# kubectl get pod -n kube-system
    2. NAME READY STATUS RESTARTS AGE
    3. calico-kube-controllers-6dfcd885bf-dk4jj 1/1 Running 0 6d23h
    4. calico-node-7nlkr 1/1 Running 0 6d23h
    5. calico-node-8xdqh 1/1 Running 0 6d23h
    6. calico-node-dskkk 1/1 Running 0 6d23h
    7. coredns-6d56c8448f-lsl8p 1/1 Running 0 7d
    8. coredns-6d56c8448f-t8t55 1/1 Running 0 23h
    9. etcd-master.example.com 1/1 Running 1 7d
    10. kube-apiserver-master.example.com 1/1 Running 0 7d
    11. kube-controller-manager-master.example.com 1/1 Running 1 6d21h
    12. kube-proxy-2ddnd 1/1 Running 0 7d
    13. kube-proxy-cjl2b 1/1 Running 0 6d23h
    14. kube-proxy-n5djk 1/1 Running 0 6d23h
    15. kube-scheduler-master.example.com 1/1 Running 1 6d21h
    16. metrics-server-85b5d6b8fb-vmprh 1/1 Running 0 3h51m

集群升级

升级Master节点

  1. 选择要升级的版本

    1. [root@master ~]# yum list --showduplicates kubeadm
    2. kubeadm.x86_64 1.19.0-0 kubernetes
    3. kubeadm.x86_64 1.19.0-0 local
    4. kubeadm.x86_64 1.19.1-0 kubernetes
    5. kubeadm.x86_64 1.19.2-0 kubernetes
    6. kubeadm.x86_64 1.19.3-0 kubernetes
    7. kubeadm.x86_64 1.19.4-0 kubernetes
    8. kubeadm.x86_64 1.19.5-0 kubernetes
    9. kubeadm.x86_64 1.20.0-0 kubernetes
  2. 升级kubeadm

    1. [root@master ~]# yum install kubeadm-1.19.1-0 -y
  3. 验证版本

    1. [root@master ~]# kubeadm version
    2. kubeadm version: &version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:24:31Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
  4. 设置不可调度,并驱逐控制节点上的Pod

    1. [root@clientvm ~]# kubectl drain master.example.com --ignore-daemonsets
  5. 在master上执行升级计划

    1. kubeadm upgrade plan
  6. 在master执行升级命令

修改集群配置对应版本,然后执行升级

  1. [root@master ~]# kubeadm upgrade apply v1.19.1
  2. ......
  3. [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.19.1". Enjoy!
  4. [upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
  1. 恢复节点调度

    1. [root@clientvm ~]# kubectl uncordon master.example.com
  2. 升级kubelet和kubectl

    1. yum install -y kubelet-1.19.1-0 kubectl-1.19.1-0
  3. 重启kubelet 服务

    1. systemctl daemon-reload
    2. systemctl restart kubelet
    1. [root@master ~]# kubectl get node
    2. NAME STATUS ROLES AGE VERSION
    3. master.example.com Ready master 7d1h v1.19.1
    4. worker1.example.com Ready <none> 7d1h v1.19.0
    5. worker2.example.com Ready <none> 7d1h v1.19.0

    逐个升级worker节点

  4. 在worker节点上升级kubeadm

    1. [root@worker1 ~]# yum install kubeadm-1.19.1-0
    2. [root@worker1 ~]# kubeadm upgrade node
  5. 将节点标记为不可调度并逐出工作负载,为维护做好准备

    1. kubectl drain worker1.example.com --ignore-daemonsets
  6. 升级kubelet和kubectl

    1. yum install -y kubelet-1.19.1-0 kubectl-1.19.1-0
  7. 重启 kubelet 服务

    1. systemctl daemon-reload
    2. systemctl restart kubelet
  8. 恢复节点调度

    1. kubectl uncordon worker1.example.com
    1. [root@master ~]# kubectl get node
    2. NAME STATUS ROLES AGE VERSION
    3. master.example.com Ready master 7d1h v1.19.1
    4. worker1.example.com Ready <none> 7d1h v1.19.1
    5. worker2.example.com Ready <none> 7d1h v1.19.0
  9. 验证集群状态,并逐一升级其他worker节点

    1. [root@master ~]# kubectl get node
    2. NAME STATUS ROLES AGE VERSION
    3. master.example.com Ready master 7d1h v1.19.1
    4. worker1.example.com Ready <none> 7d1h v1.19.1
    5. worker2.example.com Ready <none> 7d1h v1.19.1
  10. 升级完成后验证集群组件

    1. [root@clientvm ~]# kubectl get pod -n kube-system
    2. NAME READY STATUS RESTARTS AGE
    3. calico-kube-controllers-6dfcd885bf-dk4jj 1/1 Running 0 7d1h
    4. calico-node-7nlkr 1/1 Running 0 7d1h
    5. calico-node-8xdqh 1/1 Running 0 7d1h
    6. calico-node-dskkk 1/1 Running 0 7d1h
    7. coredns-6d56c8448f-h6wvr 1/1 Running 0 6m41s
    8. coredns-6d56c8448f-lsl8p 1/1 Running 0 7d1h
    9. etcd-master.example.com 1/1 Running 0 6m57s
    10. kube-apiserver-master.example.com 1/1 Running 0 6m42s
    11. kube-controller-manager-master.example.com 1/1 Running 0 6m40s
    12. kube-proxy-8422l 1/1 Running 0 6m4s
    13. kube-proxy-df9t9 1/1 Running 0 5m55s
    14. kube-proxy-vmm7j 1/1 Running 0 5m25s
    15. kube-scheduler-master.example.com 1/1 Running 0 6m38s
    16. metrics-server-85b5d6b8fb-vmprh 1/1 Running 0 5h23m
    1. [root@clientvm ~]# kubectl version
    2. Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:26:42Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
    3. Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:18:22Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}