升级说明

  • 可用的K8S集群,使用kubeadm搭建
  • 可以小版本升级,也可以跨一个大版本升级,不建议跨两个大版本升级
  • 对集群资源做好备份

升级目标

将kubernetes 1.17.9版本升级到1.18.9版本
现有集群版本已经节点如下:

  1. # kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. ecs-968f-0005 Ready node 102d v1.17.9
  4. k8s-master Ready master 102d v1.17.9

备份集群

kubeadm upgrade 不会影响你的工作负载,只会涉及 Kubernetes 内部的组件,但备份终究是好的。这里主要是对集群的所有资源进行备份,我使用的是一个开源的脚本,项目地址是:https://github.com/solomonxu/k8s-backup-restore

(1)下载脚本

  1. $ mkdir -p /data
  2. cd /data
  3. git clone https://github.com/solomonxu/k8s-backup-restore.git

(2)执行备份

  1. cd /data/k8s-backup-restore
  2. ./bin/k8s_backup.sh

如果要恢复怎么办?只需要执行如下步骤。
(1)创建恢复目录

  1. $ mkdir -p /data/k8s-backup-restore/data/restore

(2)将需要恢复的YAML清单复制到该目录下

  1. $ cp devops_deployments_gitlab.yaml ../../restore/

(3)执行恢复命令

  1. cd /data/k8s-backup-restore
  2. ./bin/k8s_restore.sh

会输出如下信息。

  1. 2021-01-06 15:09:43.954083 [11623] - INFO Kubernetes Restore start now. All yaml files which located in path [/data/k8s-backup-restore/data/restore] will be applied.
  2. 2021-01-06 15:09:43.957265 [11623] - INFO If you want to read the log record of restore, please input command ' tail -100f '
  3. 2021-01-06 15:09:43.986869 [11623] - WARN WARNING!!! This will create 1 resources from yaml files into kubernetes cluster. While same name of resources will be deleted. Please consider it carefully!
  4. Do you want to continue? [yes/no/show] y
  5. 2021-01-06 15:10:00.062598 [11623] - INFO Restore No.1 resources from yaml file: /data/k8s-backup-restore/data/restore/devops_deployments_gitlab.yaml...
  6. 2021-01-06 15:10:00.066011 [11623] - INFO Run shell: kubectl delete -f /data/k8s-backup-restore/data/restore/devops_deployments_gitlab.yaml.
  7. deployment.apps "gitlab" deleted
  8. 2021-01-06 15:10:00.423109 [11623] - INFO Delete resource from /data/k8s-backup-restore/data/restore/devops_deployments_gitlab.yaml: ok.
  9. 2021-01-06 15:10:00.426383 [11623] - INFO Run shell: kubectl create -f /data/k8s-backup-restore/data/restore/devops_deployments_gitlab.yaml.
  10. deployment.apps/gitlab created
  11. 2021-01-06 15:10:00.614960 [11623] - INFO Create resource from /data/k8s-backup-restore/data/restore/devops_deployments_gitlab.yaml: ok.
  12. 2021-01-06 15:10:00.618572 [11623] - INFO Restore 1 resources from yaml files in all: count_delete_ok=1, count_delete_failed=0, count_create_ok=1, count_create_failed=0.
  13. 2021-01-06 15:10:00.622002 [11623] - INFO Kubernetes Restore completed, all done.

(4)验证是否正常修复

  1. $ kubectl get po -n devops
  2. NAME READY STATUS RESTARTS AGE
  3. gitlab-65896f7557-786hj 1/1 Running 0 66s

升级集群

Master升级

(1)确定要升级的版本

  1. $ yum list --showduplicates kubeadm --disableexcludes=kubernetes

我这里选择的是1.18.9版本。

(2)升级kubeadm

  1. $ yum install -y kubeadm-1.18.9-0 --disableexcludes=kubernetes

升级完成后验证版本是否正确。

  1. $ kubeadm version
  2. kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9", GitCommit:"94f372e501c973a7fa9eb40ec9ebd2fe7ca69848", GitTreeState:"clean", BuildDate:"2020-09-16T13:54:01Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

(3)排空节点

  1. $ kubectl cordon k8s-master
  2. $ kubectl drain k8s-master

(4)运行升级计划,查看是否可以升级

  1. $ kubeadm upgrade plan
  2. [upgrade/config] Making sure the configuration is correct:
  3. [upgrade/config] Reading configuration from the cluster...
  4. [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
  5. [preflight] Running pre-flight checks.
  6. [upgrade] Running cluster health checks
  7. [upgrade] Fetching available versions to upgrade to
  8. [upgrade/versions] Cluster version: v1.17.9
  9. [upgrade/versions] kubeadm version: v1.18.9
  10. I0106 14:22:58.709642 10455 version.go:252] remote version is much newer: v1.20.1; falling back to: stable-1.18
  11. [upgrade/versions] Latest stable version: v1.18.14
  12. [upgrade/versions] Latest stable version: v1.18.14
  13. [upgrade/versions] Latest version in the v1.17 series: v1.17.16
  14. [upgrade/versions] Latest version in the v1.17 series: v1.17.16
  15. Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
  16. COMPONENT CURRENT AVAILABLE
  17. Kubelet 2 x v1.17.9 v1.17.16
  18. Upgrade to the latest version in the v1.17 series:
  19. COMPONENT CURRENT AVAILABLE
  20. API Server v1.17.9 v1.17.16
  21. Controller Manager v1.17.9 v1.17.16
  22. Scheduler v1.17.9 v1.17.16
  23. Kube Proxy v1.17.9 v1.17.16
  24. CoreDNS 1.6.5 1.6.7
  25. Etcd 3.4.3 3.4.3-0
  26. You can now apply the upgrade by executing the following command:
  27. kubeadm upgrade apply v1.17.16
  28. _____________________________________________________________________
  29. Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
  30. COMPONENT CURRENT AVAILABLE
  31. Kubelet 2 x v1.17.9 v1.18.14
  32. Upgrade to the latest stable version:
  33. COMPONENT CURRENT AVAILABLE
  34. API Server v1.17.9 v1.18.14
  35. Controller Manager v1.17.9 v1.18.14
  36. Scheduler v1.17.9 v1.18.14
  37. Kube Proxy v1.17.9 v1.18.14
  38. CoreDNS 1.6.5 1.6.7
  39. Etcd 3.4.3 3.4.3-0
  40. You can now apply the upgrade by executing the following command:
  41. kubeadm upgrade apply v1.18.14
  42. Note: Before you can perform this upgrade, you have to update kubeadm to v1.18.14.
  43. _____________________________________________________________________

上面显示我可以升级到更高版本,不过我这里还是升级到1.18.9。

(5)升级集群

  1. $ kubeadm upgrade apply v1.18.9 --config kubeadm.yaml
  2. W0106 14:23:58.359112 11936 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
  3. [upgrade/config] Making sure the configuration is correct:
  4. W0106 14:23:58.367062 11936 common.go:94] WARNING: Usage of the --config flag for reconfiguring the cluster during upgrade is not recommended!
  5. W0106 14:23:58.367816 11936 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
  6. [preflight] Running pre-flight checks.
  7. [upgrade] Running cluster health checks
  8. [upgrade/version] You have chosen to change the cluster version to "v1.18.9"
  9. [upgrade/versions] Cluster version: v1.17.9
  10. [upgrade/versions] kubeadm version: v1.18.9
  11. [upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
  12. [upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
  13. [upgrade/prepull] Prepulling image for component etcd.
  14. [upgrade/prepull] Prepulling image for component kube-controller-manager.
  15. [upgrade/prepull] Prepulling image for component kube-scheduler.
  16. [upgrade/prepull] Prepulling image for component kube-apiserver.
  17. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
  18. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
  19. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
  20. [apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
  21. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
  22. [upgrade/prepull] Prepulled image for component etcd.
  23. [upgrade/prepull] Prepulled image for component kube-controller-manager.
  24. [upgrade/prepull] Prepulled image for component kube-apiserver.
  25. [upgrade/prepull] Prepulled image for component kube-scheduler.
  26. [upgrade/prepull] Successfully prepulled the images for all the control plane components
  27. [upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.9"...
  28. Static pod: kube-apiserver-k8s-master hash: d002f0455950f5b76f6097191f93db28
  29. Static pod: kube-controller-manager-k8s-master hash: 54e96591b22cec4a1f5b76965fa90be7
  30. Static pod: kube-scheduler-k8s-master hash: da215ebee0354225c20c7bdf28b467f8
  31. [upgrade/etcd] Upgrading to TLS for etcd
  32. [upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.9" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
  33. [upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests328986264"
  34. [upgrade/staticpods] Preparing for "kube-apiserver" upgrade
  35. [upgrade/staticpods] Renewing apiserver certificate
  36. [upgrade/staticpods] Renewing apiserver-kubelet-client certificate
  37. [upgrade/staticpods] Renewing front-proxy-client certificate
  38. [upgrade/staticpods] Renewing apiserver-etcd-client certificate
  39. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-01-06-14-24-18/kube-apiserver.yaml"
  40. [upgrade/staticpods] Waiting for the kubelet to restart the component
  41. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
  42. Static pod: kube-apiserver-k8s-master hash: d002f0455950f5b76f6097191f93db28
  43. Static pod: kube-apiserver-k8s-master hash: d002f0455950f5b76f6097191f93db28
  44. Static pod: kube-apiserver-k8s-master hash: 6bc4f16364bf23910ec81c9e91593d95
  45. [apiclient] Found 1 Pods for label selector component=kube-apiserver
  46. [upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
  47. [upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
  48. [upgrade/staticpods] Renewing controller-manager.conf certificate
  49. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-01-06-14-24-18/kube-controller-manager.yaml"
  50. [upgrade/staticpods] Waiting for the kubelet to restart the component
  51. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
  52. Static pod: kube-controller-manager-k8s-master hash: 54e96591b22cec4a1f5b76965fa90be7
  53. Static pod: kube-controller-manager-k8s-master hash: a96ac50aab8a064c2101f684d34ee058
  54. [apiclient] Found 1 Pods for label selector component=kube-controller-manager
  55. [upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
  56. [upgrade/staticpods] Preparing for "kube-scheduler" upgrade
  57. [upgrade/staticpods] Renewing scheduler.conf certificate
  58. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-01-06-14-24-18/kube-scheduler.yaml"
  59. [upgrade/staticpods] Waiting for the kubelet to restart the component
  60. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
  61. Static pod: kube-scheduler-k8s-master hash: da215ebee0354225c20c7bdf28b467f8
  62. Static pod: kube-scheduler-k8s-master hash: 1a0670b7d3bff3fd96dbd08f176c1461
  63. [apiclient] Found 1 Pods for label selector component=kube-scheduler
  64. [upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
  65. [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
  66. [kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
  67. [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
  68. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
  69. [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
  70. [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
  71. [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
  72. [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
  73. [addons] Applied essential addon: CoreDNS
  74. [addons] Applied essential addon: kube-proxy
  75. [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.18.9". Enjoy!
  76. [upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

由输出可以看出升级执行成功。

(6)取消调度保护

  1. # kubectl uncordon k8s-master

(7)升级节点

  1. $ kubeadm upgrade node
  2. [upgrade] Reading configuration from the cluster...
  3. [upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
  4. [upgrade] Upgrading your Static Pod-hosted control plane instance to version "v1.18.9"...
  5. Static pod: kube-apiserver-k8s-master hash: 6bc4f16364bf23910ec81c9e91593d95
  6. Static pod: kube-controller-manager-k8s-master hash: a96ac50aab8a064c2101f684d34ee058
  7. Static pod: kube-scheduler-k8s-master hash: 1a0670b7d3bff3fd96dbd08f176c1461
  8. [upgrade/etcd] Upgrading to TLS for etcd
  9. [upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.9" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
  10. [upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests315032619"
  11. W0106 14:36:33.013476 30507 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
  12. [upgrade/staticpods] Preparing for "kube-apiserver" upgrade
  13. [upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade
  14. [upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
  15. [upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade
  16. [upgrade/staticpods] Preparing for "kube-scheduler" upgrade
  17. [upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade
  18. [upgrade] The control plane instance for this node was successfully updated!
  19. [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
  20. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
  21. [upgrade] The configuration for this node was successfully updated!
  22. [upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

(8)升级kubectl和kubelet

  1. $ yum install -y kubelet-1.18.9-0 kubectl-1.18.9-0 --disableexcludes=kubernetes

重启kubelet

  1. $ systemctl daemon-reload
  2. $ systemctl restart kubelet

Node升级

(1)升级kubeadm

  1. yum install -y kubeadm-1.18.9-0 --disableexcludes=kubernetes

(2)设置节点不可调度并排空节点

  1. $ kubectl cordon ecs-968f-0005
  2. $ kubectl drain ecs-968f-0005

(3)升级节点

  1. $ kubeadm upgrade node
  2. [upgrade] Reading configuration from the cluster...
  3. [upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
  4. [upgrade] Skipping phase. Not a control plane node.
  5. [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
  6. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
  7. [upgrade] The configuration for this node was successfully updated!
  8. [upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

(4)升级kubelet

  1. yum install -y kubelet-1.18.9-0 --disableexcludes=kubernetes

重启kubelet

  1. $ systemctl daemon-reload
  2. $ systemctl restart kubelet

(5)设置节点可调度

  1. kubectl uncordon ecs-968f-0005

验证集群

(1)、验证集群状态是否正常

  1. $ kubectl get no
  2. NAME STATUS ROLES AGE VERSION
  3. ecs-968f-0005 Ready node 102d v1.18.9
  4. k8s-master Ready master 102d v1.18.9

(2)、验证集群证书是否正常

  1. $ kubeadm alpha certs check-expiration
  2. [check-expiration] Reading configuration from the cluster...
  3. [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
  4. CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
  5. admin.conf Jan 06, 2022 06:36 UTC 364d no
  6. apiserver Jan 06, 2022 06:24 UTC 364d ca no
  7. apiserver-etcd-client Jan 06, 2022 06:24 UTC 364d etcd-ca no
  8. apiserver-kubelet-client Jan 06, 2022 06:24 UTC 364d ca no
  9. controller-manager.conf Jan 06, 2022 06:24 UTC 364d no
  10. etcd-healthcheck-client Sep 25, 2021 06:55 UTC 262d etcd-ca no
  11. etcd-peer Sep 25, 2021 06:55 UTC 262d etcd-ca no
  12. etcd-server Sep 25, 2021 06:55 UTC 262d etcd-ca no
  13. front-proxy-client Jan 06, 2022 06:24 UTC 364d front-proxy-ca no
  14. scheduler.conf Jan 06, 2022 06:24 UTC 364d no
  15. CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
  16. ca Sep 23, 2030 06:55 UTC 9y no
  17. etcd-ca Sep 23, 2030 06:55 UTC 9y no
  18. front-proxy-ca Sep 23, 2030 06:55 UTC 9y no

注意:kubeadm upgrade 也会自动对它在此节点上管理的证书进行续约。 如果选择不对证书进行续约,可以使用 --certificate-renewal=false

故障恢复

在升级过程中如果升级失败并且没有回滚,可以继续执行kubeadm upgrade。如果要从故障状态恢复,可以执行kubeadm upgrade --force

在升级期间,会在/etc/kubernetes/tmp目录下生成备份文件:

  • kubeadm-backup-etcd-
  • kubeadm-backup-manifests-

kubeadm-backup-etcd中包含本地etcd的数据备份,如果升级失败并且无法修复,可以将其数据复制到etcd数据目录进行手动修复。

kubeadm-backup-manifests中保存的是节点静态pod的YAML清单,如果升级失败并且无法修复,可以将其复制到/etc/kubernetes/manifests下进行手动修复。

参考:https://v1-18.docs.kubernetes.io/zh/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/