- 修改cluster.yml中的定义kubernetes版本
kubernetes_version: v1.17.2-rancher1-2 - 执行up的操作
rke up —config ./cluster.yml - etcd数据恢复
rke etcd snapshot-restore —config cluster.yml —name snapshot-test - 使用相同CA轮换所有服务证书
rke cert rotate —config cluster.yml - 使用相同CA轮换单个服务证书
rke cert rotate —service kubelet —config cluster.yml - 轮换CA和所有服务证书
rke cert rotate —rotate-ca —config cluster.yml - 如果仅增加或者删除Work node节点可以添加 —update-only参数,表明只更新Work node的资源
rke up —update-only —config cluster.yml - 添加执行权限
chmod +x estore-kube-config.sh - 恢复文件
./estore-kube-config.sh —master-ip=<任意一台master节点IP> - 测试
kubectl —kubeconfig ./kubeconfig_admin.yaml get nodes - 安装jq工具
yum -y install jq - 恢复文件
kubectl —kubeconfig kube_config_cluster.yml get configmap -n kube-system \
full-cluster-state -o json | jq -r .data.\”full-cluster-state\” | jq -r . > cluster.rkestate - 取消维护模式
kubectl uncordon
1. 集群升级
# 查看RKE支持的Kubernetes版本
rke config —list-version —all
修改cluster.yml中的定义kubernetes版本
kubernetes_version: v1.17.2-rancher1-2
执行up的操作
rke up —config ./cluster.yml
[liwm@rmaster01 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01 Ready worker 21d v1.17.2
node02 Ready worker 21d v1.17.2
rmaster01 Ready controlplane,etcd 21d v1.17.2
rmaster02 Ready controlplane,etcd 21d v1.17.2
rmaster03 Ready controlplane,etcd 21d v1.17.2
[liwm@rmaster01 ~]$
[rancher@rmaster01 ~]$ rke config --list-version --all
v1.16.8-rancher1-1
v1.17.4-rancher1-1
v1.15.11-rancher1-1
[rancher@rmaster01 ~]$ vim cluster.yml
[rancher@rmaster01 ~]$
[rancher@rmaster01 ~]$ rke up --config cluster.yml
INFO[0000] Running RKE version: v1.0.5
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [192.168.31.133]
INFO[0000] [dialer] Setup tunnel for host [192.168.31.132]
INFO[0000] [dialer] Setup tunnel for host [192.168.31.131]
INFO[0000] [dialer] Setup tunnel for host [192.168.31.134]
INFO[0000] [dialer] Setup tunnel for host [192.168.31.130]
INFO[0001] [network] No hosts added existing cluster, skipping port check
INFO[0001] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0001] Checking if container [cert-deployer] is running on host [192.168.31.133], try #1
INFO[0001] Checking if container [cert-deployer] is running on host [192.168.31.134], try #1
INFO[0001] Checking if container [cert-deployer] is running on host [192.168.31.131], try #1
INFO[0001] Checking if container [cert-deployer] is running on host [192.168.31.130], try #1
INFO[0001] Checking if container [cert-deployer] is running on host [192.168.31.132], try #1
INFO[0001] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.133]
INFO[0001] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[0001] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[0001] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.134]
INFO[0001] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[0001] Starting container [cert-deployer] on host [192.168.31.132], try #1
INFO[0001] Starting container [cert-deployer] on host [192.168.31.134], try #1
INFO[0002] Starting container [cert-deployer] on host [192.168.31.130], try #1
INFO[0002] Starting container [cert-deployer] on host [192.168.31.133], try #1
INFO[0003] Checking if container [cert-deployer] is running on host [192.168.31.132], try #1
INFO[0004] Checking if container [cert-deployer] is running on host [192.168.31.133], try #1
INFO[0004] Starting container [cert-deployer] on host [192.168.31.131], try #1
INFO[0004] Checking if container [cert-deployer] is running on host [192.168.31.130], try #1
INFO[0004] Checking if container [cert-deployer] is running on host [192.168.31.134], try #1
INFO[0006] Checking if container [cert-deployer] is running on host [192.168.31.131], try #1
INFO[0008] Checking if container [cert-deployer] is running on host [192.168.31.132], try #1
INFO[0008] Removing container [cert-deployer] on host [192.168.31.132], try #1
INFO[0009] Checking if container [cert-deployer] is running on host [192.168.31.133], try #1
INFO[0009] Removing container [cert-deployer] on host [192.168.31.133], try #1
INFO[0009] Checking if container [cert-deployer] is running on host [192.168.31.130], try #1
INFO[0009] Removing container [cert-deployer] on host [192.168.31.130], try #1
INFO[0009] Checking if container [cert-deployer] is running on host [192.168.31.134], try #1
INFO[0009] Removing container [cert-deployer] on host [192.168.31.134], try #1
INFO[0011] Checking if container [cert-deployer] is running on host [192.168.31.131], try #1
INFO[0011] Removing container [cert-deployer] on host [192.168.31.131], try #1
INFO[0011] [reconcile] Rebuilding and updating local kube config
INFO[0011] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
INFO[0011] [reconcile] host [192.168.31.130] is active master on the cluster
INFO[0011] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
INFO[0011] [reconcile] Reconciling cluster state
INFO[0011] [reconcile] Check etcd hosts to be deleted
INFO[0011] [reconcile] Check etcd hosts to be added
INFO[0011] [reconcile] Rebuilding and updating local kube config
INFO[0011] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
INFO[0011] [reconcile] host [192.168.31.130] is active master on the cluster
INFO[0011] [reconcile] Reconciled cluster state successfully
INFO[0011] Pre-pulling kubernetes images
INFO[0011] Pulling image [rancher/hyperkube:v1.17.4-rancher1] on host [192.168.31.133], try #1
INFO[0011] Pulling image [rancher/hyperkube:v1.17.4-rancher1] on host [192.168.31.130], try #1
INFO[0011] Pulling image [rancher/hyperkube:v1.17.4-rancher1] on host [192.168.31.131], try #1
INFO[0011] Pulling image [rancher/hyperkube:v1.17.4-rancher1] on host [192.168.31.132], try #1
INFO[0011] Pulling image [rancher/hyperkube:v1.17.4-rancher1] on host [192.168.31.134], try #1
INFO[0459] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.130]
INFO[0640] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.132]
INFO[0840] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.134]
INFO[0903] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.133]
INFO[1021] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.131]
INFO[1021] Kubernetes images pulled successfully
INFO[1021] [etcd] Building up etcd plane..
INFO[1021] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1023] Starting container [etcd-fix-perm] on host [192.168.31.130], try #1
INFO[1025] Successfully started [etcd-fix-perm] container on host [192.168.31.130]
INFO[1025] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.130]
INFO[1025] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.130]
INFO[1025] Container [etcd-fix-perm] is still running on host [192.168.31.130]
INFO[1026] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.130]
INFO[1026] Removing container [etcd-fix-perm] on host [192.168.31.130], try #1
INFO[1028] [remove/etcd-fix-perm] Successfully removed container on host [192.168.31.130]
INFO[1028] [etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.31.130]
INFO[1028] Removing container [etcd-rolling-snapshots] on host [192.168.31.130], try #1
INFO[1029] [remove/etcd-rolling-snapshots] Successfully removed container on host [192.168.31.130]
INFO[1029] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1029] Starting container [etcd-rolling-snapshots] on host [192.168.31.130], try #1
INFO[1030] [etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.31.130]
INFO[1035] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1035] Starting container [rke-bundle-cert] on host [192.168.31.130], try #1
INFO[1036] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.31.130]
INFO[1036] Waiting for [rke-bundle-cert] container to exit on host [192.168.31.130]
INFO[1036] Container [rke-bundle-cert] is still running on host [192.168.31.130]
INFO[1037] Waiting for [rke-bundle-cert] container to exit on host [192.168.31.130]
INFO[1038] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.31.130]
INFO[1038] Removing container [rke-bundle-cert] on host [192.168.31.130], try #1
INFO[1038] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1038] Starting container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1039] [etcd] Successfully started [rke-log-linker] container on host [192.168.31.130]
INFO[1039] Removing container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1039] [remove/rke-log-linker] Successfully removed container on host [192.168.31.130]
INFO[1040] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1086] Starting container [etcd-fix-perm] on host [192.168.31.131], try #1
INFO[1135] Successfully started [etcd-fix-perm] container on host [192.168.31.131]
INFO[1135] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.131]
INFO[1135] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.131]
INFO[1147] Removing container [etcd-fix-perm] on host [192.168.31.131], try #1
INFO[1160] [remove/etcd-fix-perm] Successfully removed container on host [192.168.31.131]
INFO[1160] [etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.31.131]
INFO[1160] Removing container [etcd-rolling-snapshots] on host [192.168.31.131], try #1
INFO[1170] [remove/etcd-rolling-snapshots] Successfully removed container on host [192.168.31.131]
INFO[1170] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1173] Starting container [etcd-rolling-snapshots] on host [192.168.31.131], try #1
INFO[1176] [etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.31.131]
INFO[1184] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1189] Starting container [rke-bundle-cert] on host [192.168.31.131], try #1
INFO[1194] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.31.131]
INFO[1194] Waiting for [rke-bundle-cert] container to exit on host [192.168.31.131]
INFO[1194] Container [rke-bundle-cert] is still running on host [192.168.31.131]
INFO[1195] Waiting for [rke-bundle-cert] container to exit on host [192.168.31.131]
INFO[1195] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.31.131]
INFO[1195] Removing container [rke-bundle-cert] on host [192.168.31.131], try #1
INFO[1195] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1195] Starting container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1196] [etcd] Successfully started [rke-log-linker] container on host [192.168.31.131]
INFO[1196] Removing container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1197] [remove/rke-log-linker] Successfully removed container on host [192.168.31.131]
INFO[1197] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1197] Starting container [etcd-fix-perm] on host [192.168.31.132], try #1
INFO[1198] Successfully started [etcd-fix-perm] container on host [192.168.31.132]
INFO[1198] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.132]
INFO[1198] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.132]
INFO[1198] Container [etcd-fix-perm] is still running on host [192.168.31.132]
INFO[1199] Waiting for [etcd-fix-perm] container to exit on host [192.168.31.132]
INFO[1199] Removing container [etcd-fix-perm] on host [192.168.31.132], try #1
INFO[1199] [remove/etcd-fix-perm] Successfully removed container on host [192.168.31.132]
INFO[1200] [etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.31.132]
INFO[1200] Removing container [etcd-rolling-snapshots] on host [192.168.31.132], try #1
INFO[1200] [remove/etcd-rolling-snapshots] Successfully removed container on host [192.168.31.132]
INFO[1200] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1200] Starting container [etcd-rolling-snapshots] on host [192.168.31.132], try #1
INFO[1201] [etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.31.132]
INFO[1206] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1206] Starting container [rke-bundle-cert] on host [192.168.31.132], try #1
INFO[1206] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.31.132]
INFO[1206] Waiting for [rke-bundle-cert] container to exit on host [192.168.31.132]
INFO[1206] Container [rke-bundle-cert] is still running on host [192.168.31.132]
INFO[1207] Waiting for [rke-bundle-cert] container to exit on host [192.168.31.132]
INFO[1207] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.31.132]
INFO[1207] Removing container [rke-bundle-cert] on host [192.168.31.132], try #1
INFO[1207] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1208] Starting container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1208] [etcd] Successfully started [rke-log-linker] container on host [192.168.31.132]
INFO[1208] Removing container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1209] [remove/rke-log-linker] Successfully removed container on host [192.168.31.132]
INFO[1209] [etcd] Successfully started etcd plane.. Checking etcd cluster health
INFO[1209] [controlplane] Building up Controller Plane..
INFO[1209] Checking if container [service-sidekick] is running on host [192.168.31.130], try #1
INFO[1209] Checking if container [service-sidekick] is running on host [192.168.31.131], try #1
INFO[1209] Checking if container [service-sidekick] is running on host [192.168.31.132], try #1
INFO[1209] [sidekick] Sidekick container already created on host [192.168.31.132]
INFO[1209] [sidekick] Sidekick container already created on host [192.168.31.131]
INFO[1209] Checking if container [kube-apiserver] is running on host [192.168.31.132], try #1
INFO[1209] Checking if container [kube-apiserver] is running on host [192.168.31.131], try #1
INFO[1209] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.132]
INFO[1209] Checking if container [old-kube-apiserver] is running on host [192.168.31.132], try #1
INFO[1209] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.131]
INFO[1209] Checking if container [old-kube-apiserver] is running on host [192.168.31.131], try #1
INFO[1209] Stopping container [kube-apiserver] on host [192.168.31.132] with stopTimeoutDuration [5s], try #1
INFO[1209] Stopping container [kube-apiserver] on host [192.168.31.131] with stopTimeoutDuration [5s], try #1
INFO[1209] [sidekick] Sidekick container already created on host [192.168.31.130]
INFO[1210] Checking if container [kube-apiserver] is running on host [192.168.31.130], try #1
INFO[1210] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.130]
INFO[1210] Checking if container [old-kube-apiserver] is running on host [192.168.31.130], try #1
INFO[1210] Stopping container [kube-apiserver] on host [192.168.31.130] with stopTimeoutDuration [5s], try #1
INFO[1211] Waiting for [kube-apiserver] container to exit on host [192.168.31.132]
INFO[1211] Renaming container [kube-apiserver] to [old-kube-apiserver] on host [192.168.31.132], try #1
INFO[1211] Starting container [kube-apiserver] on host [192.168.31.132], try #1
INFO[1212] [controlplane] Successfully updated [kube-apiserver] container on host [192.168.31.132]
INFO[1212] Removing container [old-kube-apiserver] on host [192.168.31.132], try #1
INFO[1212] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.31.132]
INFO[1216] Waiting for [kube-apiserver] container to exit on host [192.168.31.130]
INFO[1216] Renaming container [kube-apiserver] to [old-kube-apiserver] on host [192.168.31.130], try #1
INFO[1216] Starting container [kube-apiserver] on host [192.168.31.130], try #1
INFO[1217] Waiting for [kube-apiserver] container to exit on host [192.168.31.131]
INFO[1217] Renaming container [kube-apiserver] to [old-kube-apiserver] on host [192.168.31.131], try #1
INFO[1217] Starting container [kube-apiserver] on host [192.168.31.131], try #1
INFO[1217] [controlplane] Successfully updated [kube-apiserver] container on host [192.168.31.130]
INFO[1217] Removing container [old-kube-apiserver] on host [192.168.31.130], try #1
INFO[1218] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.31.130]
INFO[1218] [controlplane] Successfully updated [kube-apiserver] container on host [192.168.31.131]
INFO[1218] Removing container [old-kube-apiserver] on host [192.168.31.131], try #1
INFO[1219] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.31.131]
INFO[1267] [healthcheck] service [kube-apiserver] on host [192.168.31.132] is healthy
INFO[1267] [healthcheck] service [kube-apiserver] on host [192.168.31.130] is healthy
INFO[1267] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1267] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1267] Starting container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1268] Starting container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1268] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.132]
INFO[1268] Removing container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1269] [remove/rke-log-linker] Successfully removed container on host [192.168.31.132]
INFO[1269] Checking if container [kube-controller-manager] is running on host [192.168.31.132], try #1
INFO[1269] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.132]
INFO[1269] Checking if container [old-kube-controller-manager] is running on host [192.168.31.132], try #1
INFO[1269] Stopping container [kube-controller-manager] on host [192.168.31.132] with stopTimeoutDuration [5s], try #1
INFO[1269] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.130]
INFO[1269] Removing container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1269] Waiting for [kube-controller-manager] container to exit on host [192.168.31.132]
INFO[1269] Renaming container [kube-controller-manager] to [old-kube-controller-manager] on host [192.168.31.132], try #1
INFO[1269] Starting container [kube-controller-manager] on host [192.168.31.132], try #1
INFO[1269] [remove/rke-log-linker] Successfully removed container on host [192.168.31.130]
INFO[1269] Checking if container [kube-controller-manager] is running on host [192.168.31.130], try #1
INFO[1269] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.130]
INFO[1269] Checking if container [old-kube-controller-manager] is running on host [192.168.31.130], try #1
INFO[1270] Stopping container [kube-controller-manager] on host [192.168.31.130] with stopTimeoutDuration [5s], try #1
INFO[1270] [controlplane] Successfully updated [kube-controller-manager] container on host [192.168.31.132]
INFO[1270] Removing container [old-kube-controller-manager] on host [192.168.31.132], try #1
INFO[1270] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.31.132]
INFO[1270] Waiting for [kube-controller-manager] container to exit on host [192.168.31.130]
INFO[1270] Renaming container [kube-controller-manager] to [old-kube-controller-manager] on host [192.168.31.130], try #1
INFO[1270] Starting container [kube-controller-manager] on host [192.168.31.130], try #1
INFO[1270] [healthcheck] service [kube-apiserver] on host [192.168.31.131] is healthy
INFO[1270] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1270] [controlplane] Successfully updated [kube-controller-manager] container on host [192.168.31.130]
INFO[1270] Removing container [old-kube-controller-manager] on host [192.168.31.130], try #1
INFO[1271] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.31.130]
INFO[1271] Starting container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1274] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.131]
INFO[1274] Removing container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1274] [remove/rke-log-linker] Successfully removed container on host [192.168.31.131]
INFO[1274] Checking if container [kube-controller-manager] is running on host [192.168.31.131], try #1
INFO[1274] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.131]
INFO[1274] Checking if container [old-kube-controller-manager] is running on host [192.168.31.131], try #1
INFO[1274] Stopping container [kube-controller-manager] on host [192.168.31.131] with stopTimeoutDuration [5s], try #1
INFO[1275] Waiting for [kube-controller-manager] container to exit on host [192.168.31.131]
INFO[1275] Renaming container [kube-controller-manager] to [old-kube-controller-manager] on host [192.168.31.131], try #1
INFO[1275] Starting container [kube-controller-manager] on host [192.168.31.131], try #1
INFO[1275] [healthcheck] service [kube-controller-manager] on host [192.168.31.132] is healthy
INFO[1275] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1275] Starting container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1275] [controlplane] Successfully updated [kube-controller-manager] container on host [192.168.31.131]
INFO[1275] Removing container [old-kube-controller-manager] on host [192.168.31.131], try #1
INFO[1276] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.31.131]
INFO[1276] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.132]
INFO[1276] Removing container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1276] [remove/rke-log-linker] Successfully removed container on host [192.168.31.132]
INFO[1276] Checking if container [kube-scheduler] is running on host [192.168.31.132], try #1
INFO[1277] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.132]
INFO[1277] Checking if container [old-kube-scheduler] is running on host [192.168.31.132], try #1
INFO[1277] Stopping container [kube-scheduler] on host [192.168.31.132] with stopTimeoutDuration [5s], try #1
INFO[1277] Waiting for [kube-scheduler] container to exit on host [192.168.31.132]
INFO[1277] Renaming container [kube-scheduler] to [old-kube-scheduler] on host [192.168.31.132], try #1
INFO[1277] Starting container [kube-scheduler] on host [192.168.31.132], try #1
INFO[1278] [controlplane] Successfully updated [kube-scheduler] container on host [192.168.31.132]
INFO[1278] Removing container [old-kube-scheduler] on host [192.168.31.132], try #1
INFO[1278] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.31.132]
INFO[1281] [healthcheck] service [kube-controller-manager] on host [192.168.31.130] is healthy
INFO[1281] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1281] Starting container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1284] [healthcheck] service [kube-scheduler] on host [192.168.31.132] is healthy
INFO[1284] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1284] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.130]
INFO[1284] Removing container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1284] Starting container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1285] [healthcheck] service [kube-controller-manager] on host [192.168.31.131] is healthy
INFO[1285] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1285] [remove/rke-log-linker] Successfully removed container on host [192.168.31.130]
INFO[1285] Checking if container [kube-scheduler] is running on host [192.168.31.130], try #1
INFO[1285] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.132]
INFO[1285] Starting container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1285] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.130]
INFO[1285] Checking if container [old-kube-scheduler] is running on host [192.168.31.130], try #1
INFO[1286] Removing container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1286] Stopping container [kube-scheduler] on host [192.168.31.130] with stopTimeoutDuration [5s], try #1
INFO[1286] [remove/rke-log-linker] Successfully removed container on host [192.168.31.132]
INFO[1287] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.131]
INFO[1287] Removing container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1287] Waiting for [kube-scheduler] container to exit on host [192.168.31.130]
INFO[1287] Renaming container [kube-scheduler] to [old-kube-scheduler] on host [192.168.31.130], try #1
INFO[1287] Starting container [kube-scheduler] on host [192.168.31.130], try #1
INFO[1287] [controlplane] Successfully updated [kube-scheduler] container on host [192.168.31.130]
INFO[1287] Removing container [old-kube-scheduler] on host [192.168.31.130], try #1
INFO[1287] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.31.130]
INFO[1288] [remove/rke-log-linker] Successfully removed container on host [192.168.31.131]
INFO[1288] Checking if container [kube-scheduler] is running on host [192.168.31.131], try #1
INFO[1289] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.131]
INFO[1289] Checking if container [old-kube-scheduler] is running on host [192.168.31.131], try #1
INFO[1289] Stopping container [kube-scheduler] on host [192.168.31.131] with stopTimeoutDuration [5s], try #1
INFO[1290] Waiting for [kube-scheduler] container to exit on host [192.168.31.131]
INFO[1290] Renaming container [kube-scheduler] to [old-kube-scheduler] on host [192.168.31.131], try #1
INFO[1290] Starting container [kube-scheduler] on host [192.168.31.131], try #1
INFO[1291] [controlplane] Successfully updated [kube-scheduler] container on host [192.168.31.131]
INFO[1291] Removing container [old-kube-scheduler] on host [192.168.31.131], try #1
INFO[1291] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.31.131]
INFO[1295] [healthcheck] service [kube-scheduler] on host [192.168.31.130] is healthy
INFO[1295] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1295] Starting container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1296] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.130]
INFO[1296] Removing container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1296] [remove/rke-log-linker] Successfully removed container on host [192.168.31.130]
INFO[1298] [healthcheck] service [kube-scheduler] on host [192.168.31.131] is healthy
INFO[1298] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1298] Starting container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1299] [controlplane] Successfully started [rke-log-linker] container on host [192.168.31.131]
INFO[1300] Removing container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1300] [remove/rke-log-linker] Successfully removed container on host [192.168.31.131]
INFO[1300] [controlplane] Successfully started Controller Plane..
INFO[1300] [authz] Creating rke-job-deployer ServiceAccount
INFO[1300] [authz] rke-job-deployer ServiceAccount created successfully
INFO[1300] [authz] Creating system:node ClusterRoleBinding
INFO[1301] [authz] system:node ClusterRoleBinding created successfully
INFO[1301] [authz] Creating kube-apiserver proxy ClusterRole and ClusterRoleBinding
INFO[1301] [authz] kube-apiserver proxy ClusterRole and ClusterRoleBinding created successfully
INFO[1302] Successfully Deployed state file at [./cluster.rkestate]
INFO[1302] [state] Saving full cluster state to Kubernetes
INFO[1302] [state] Successfully Saved full cluster state to Kubernetes ConfigMap: cluster-state
INFO[1302] [worker] Building up Worker Plane..
INFO[1302] Checking if container [service-sidekick] is running on host [192.168.31.130], try #1
INFO[1302] Checking if container [service-sidekick] is running on host [192.168.31.131], try #1
INFO[1302] Checking if container [service-sidekick] is running on host [192.168.31.132], try #1
INFO[1302] [sidekick] Sidekick container already created on host [192.168.31.130]
INFO[1302] [sidekick] Sidekick container already created on host [192.168.31.132]
INFO[1302] Checking if container [kubelet] is running on host [192.168.31.132], try #1
INFO[1302] Checking if container [kubelet] is running on host [192.168.31.130], try #1
INFO[1302] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.132]
INFO[1302] Checking if container [old-kubelet] is running on host [192.168.31.132], try #1
INFO[1302] [sidekick] Sidekick container already created on host [192.168.31.131]
INFO[1302] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.130]
INFO[1302] Checking if container [old-kubelet] is running on host [192.168.31.130], try #1
INFO[1302] Stopping container [kubelet] on host [192.168.31.132] with stopTimeoutDuration [5s], try #1
INFO[1302] Stopping container [kubelet] on host [192.168.31.130] with stopTimeoutDuration [5s], try #1
INFO[1302] Checking if container [kubelet] is running on host [192.168.31.131], try #1
INFO[1302] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.131]
INFO[1302] Checking if container [old-kubelet] is running on host [192.168.31.131], try #1
INFO[1302] Stopping container [kubelet] on host [192.168.31.131] with stopTimeoutDuration [5s], try #1
INFO[1302] Waiting for [kubelet] container to exit on host [192.168.31.132]
INFO[1302] Renaming container [kubelet] to [old-kubelet] on host [192.168.31.132], try #1
INFO[1302] Waiting for [kubelet] container to exit on host [192.168.31.130]
INFO[1302] Renaming container [kubelet] to [old-kubelet] on host [192.168.31.130], try #1
INFO[1302] Starting container [kubelet] on host [192.168.31.130], try #1
INFO[1302] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.133]
INFO[1303] Starting container [kubelet] on host [192.168.31.132], try #1
INFO[1303] [worker] Successfully updated [kubelet] container on host [192.168.31.130]
INFO[1303] Removing container [old-kubelet] on host [192.168.31.130], try #1
INFO[1303] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.134]
INFO[1304] Starting container [rke-log-linker] on host [192.168.31.134], try #1
INFO[1304] [worker] Successfully updated [kubelet] container on host [192.168.31.132]
INFO[1304] Removing container [old-kubelet] on host [192.168.31.132], try #1
INFO[1304] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.31.130]
INFO[1304] Starting container [rke-log-linker] on host [192.168.31.133], try #1
INFO[1304] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.31.132]
INFO[1304] Waiting for [kubelet] container to exit on host [192.168.31.131]
INFO[1304] Renaming container [kubelet] to [old-kubelet] on host [192.168.31.131], try #1
INFO[1305] Starting container [kubelet] on host [192.168.31.131], try #1
INFO[1306] [worker] Successfully updated [kubelet] container on host [192.168.31.131]
INFO[1306] Removing container [old-kubelet] on host [192.168.31.131], try #1
INFO[1306] [worker] Successfully started [rke-log-linker] container on host [192.168.31.133]
INFO[1307] Removing container [rke-log-linker] on host [192.168.31.133], try #1
INFO[1307] [remove/rke-log-linker] Successfully removed container on host [192.168.31.133]
INFO[1307] Checking if container [service-sidekick] is running on host [192.168.31.133], try #1
INFO[1307] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.31.131]
INFO[1307] [sidekick] Sidekick container already created on host [192.168.31.133]
INFO[1307] Checking if container [kubelet] is running on host [192.168.31.133], try #1
INFO[1307] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.133]
INFO[1307] Checking if container [old-kubelet] is running on host [192.168.31.133], try #1
INFO[1307] Stopping container [kubelet] on host [192.168.31.133] with stopTimeoutDuration [5s], try #1
INFO[1308] [worker] Successfully started [rke-log-linker] container on host [192.168.31.134]
INFO[1308] Removing container [rke-log-linker] on host [192.168.31.134], try #1
INFO[1308] Waiting for [kubelet] container to exit on host [192.168.31.133]
INFO[1308] Renaming container [kubelet] to [old-kubelet] on host [192.168.31.133], try #1
INFO[1308] Starting container [kubelet] on host [192.168.31.133], try #1
INFO[1309] [worker] Successfully updated [kubelet] container on host [192.168.31.133]
INFO[1309] Removing container [old-kubelet] on host [192.168.31.133], try #1
INFO[1309] [remove/rke-log-linker] Successfully removed container on host [192.168.31.134]
INFO[1309] Checking if container [service-sidekick] is running on host [192.168.31.134], try #1
INFO[1309] [sidekick] Sidekick container already created on host [192.168.31.134]
INFO[1309] Checking if container [kubelet] is running on host [192.168.31.134], try #1
INFO[1309] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.134]
INFO[1309] Checking if container [old-kubelet] is running on host [192.168.31.134], try #1
INFO[1309] Stopping container [kubelet] on host [192.168.31.134] with stopTimeoutDuration [5s], try #1
INFO[1309] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.31.133]
INFO[1309] Waiting for [kubelet] container to exit on host [192.168.31.134]
INFO[1309] Renaming container [kubelet] to [old-kubelet] on host [192.168.31.134], try #1
INFO[1309] Starting container [kubelet] on host [192.168.31.134], try #1
INFO[1310] [worker] Successfully updated [kubelet] container on host [192.168.31.134]
INFO[1310] Removing container [old-kubelet] on host [192.168.31.134], try #1
INFO[1310] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.31.134]
INFO[1331] [healthcheck] service [kubelet] on host [192.168.31.130] is healthy
INFO[1331] [healthcheck] service [kubelet] on host [192.168.31.132] is healthy
INFO[1331] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1331] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1331] Starting container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1331] Starting container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1332] [worker] Successfully started [rke-log-linker] container on host [192.168.31.132]
INFO[1332] Removing container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1332] [worker] Successfully started [rke-log-linker] container on host [192.168.31.130]
INFO[1332] Removing container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1332] [remove/rke-log-linker] Successfully removed container on host [192.168.31.132]
INFO[1332] Checking if container [kube-proxy] is running on host [192.168.31.132], try #1
INFO[1332] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.132]
INFO[1332] Checking if container [old-kube-proxy] is running on host [192.168.31.132], try #1
INFO[1332] Stopping container [kube-proxy] on host [192.168.31.132] with stopTimeoutDuration [5s], try #1
INFO[1333] [remove/rke-log-linker] Successfully removed container on host [192.168.31.130]
INFO[1333] Checking if container [kube-proxy] is running on host [192.168.31.130], try #1
INFO[1333] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.130]
INFO[1333] Checking if container [old-kube-proxy] is running on host [192.168.31.130], try #1
INFO[1333] Waiting for [kube-proxy] container to exit on host [192.168.31.132]
INFO[1333] Renaming container [kube-proxy] to [old-kube-proxy] on host [192.168.31.132], try #1
INFO[1333] Stopping container [kube-proxy] on host [192.168.31.130] with stopTimeoutDuration [5s], try #1
INFO[1333] Starting container [kube-proxy] on host [192.168.31.132], try #1
INFO[1333] [worker] Successfully updated [kube-proxy] container on host [192.168.31.132]
INFO[1333] Removing container [old-kube-proxy] on host [192.168.31.132], try #1
INFO[1333] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.31.132]
INFO[1334] [healthcheck] service [kubelet] on host [192.168.31.131] is healthy
INFO[1334] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1334] Starting container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1335] Waiting for [kube-proxy] container to exit on host [192.168.31.130]
INFO[1335] Renaming container [kube-proxy] to [old-kube-proxy] on host [192.168.31.130], try #1
INFO[1335] Starting container [kube-proxy] on host [192.168.31.130], try #1
INFO[1335] [healthcheck] service [kubelet] on host [192.168.31.133] is healthy
INFO[1335] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.133]
INFO[1335] Starting container [rke-log-linker] on host [192.168.31.133], try #1
INFO[1335] [worker] Successfully updated [kube-proxy] container on host [192.168.31.130]
INFO[1335] Removing container [old-kube-proxy] on host [192.168.31.130], try #1
INFO[1336] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.31.130]
INFO[1336] [worker] Successfully started [rke-log-linker] container on host [192.168.31.131]
INFO[1336] Removing container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1336] [remove/rke-log-linker] Successfully removed container on host [192.168.31.131]
INFO[1336] Checking if container [kube-proxy] is running on host [192.168.31.131], try #1
INFO[1336] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.131]
INFO[1336] Checking if container [old-kube-proxy] is running on host [192.168.31.131], try #1
INFO[1336] Stopping container [kube-proxy] on host [192.168.31.131] with stopTimeoutDuration [5s], try #1
INFO[1337] Waiting for [kube-proxy] container to exit on host [192.168.31.131]
INFO[1337] Renaming container [kube-proxy] to [old-kube-proxy] on host [192.168.31.131], try #1
INFO[1337] Starting container [kube-proxy] on host [192.168.31.131], try #1
INFO[1337] [worker] Successfully updated [kube-proxy] container on host [192.168.31.131]
INFO[1337] Removing container [old-kube-proxy] on host [192.168.31.131], try #1
INFO[1337] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.31.131]
INFO[1337] [worker] Successfully started [rke-log-linker] container on host [192.168.31.133]
INFO[1337] Removing container [rke-log-linker] on host [192.168.31.133], try #1
INFO[1338] [remove/rke-log-linker] Successfully removed container on host [192.168.31.133]
INFO[1338] [healthcheck] service [kubelet] on host [192.168.31.134] is healthy
INFO[1338] Checking if container [kube-proxy] is running on host [192.168.31.133], try #1
INFO[1338] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.133]
INFO[1338] Checking if container [old-kube-proxy] is running on host [192.168.31.133], try #1
INFO[1338] Stopping container [kube-proxy] on host [192.168.31.133] with stopTimeoutDuration [5s], try #1
INFO[1338] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.134]
INFO[1338] Waiting for [kube-proxy] container to exit on host [192.168.31.133]
INFO[1338] Renaming container [kube-proxy] to [old-kube-proxy] on host [192.168.31.133], try #1
INFO[1338] Starting container [rke-log-linker] on host [192.168.31.134], try #1
INFO[1338] Starting container [kube-proxy] on host [192.168.31.133], try #1
INFO[1339] [worker] Successfully updated [kube-proxy] container on host [192.168.31.133]
INFO[1339] Removing container [old-kube-proxy] on host [192.168.31.133], try #1
INFO[1339] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.31.133]
INFO[1339] [healthcheck] service [kube-proxy] on host [192.168.31.132] is healthy
INFO[1339] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1339] Starting container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1340] [worker] Successfully started [rke-log-linker] container on host [192.168.31.134]
INFO[1340] [worker] Successfully started [rke-log-linker] container on host [192.168.31.132]
INFO[1341] Removing container [rke-log-linker] on host [192.168.31.132], try #1
INFO[1341] [remove/rke-log-linker] Successfully removed container on host [192.168.31.132]
INFO[1341] Removing container [rke-log-linker] on host [192.168.31.134], try #1
INFO[1341] [remove/rke-log-linker] Successfully removed container on host [192.168.31.134]
INFO[1341] Checking if container [kube-proxy] is running on host [192.168.31.134], try #1
INFO[1341] Image [rancher/hyperkube:v1.17.4-rancher1] exists on host [192.168.31.134]
INFO[1341] Checking if container [old-kube-proxy] is running on host [192.168.31.134], try #1
INFO[1341] Stopping container [kube-proxy] on host [192.168.31.134] with stopTimeoutDuration [5s], try #1
INFO[1341] [healthcheck] service [kube-proxy] on host [192.168.31.130] is healthy
INFO[1341] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1341] Starting container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1342] Waiting for [kube-proxy] container to exit on host [192.168.31.134]
INFO[1342] Renaming container [kube-proxy] to [old-kube-proxy] on host [192.168.31.134], try #1
INFO[1342] Starting container [kube-proxy] on host [192.168.31.134], try #1
INFO[1342] [worker] Successfully updated [kube-proxy] container on host [192.168.31.134]
INFO[1342] Removing container [old-kube-proxy] on host [192.168.31.134], try #1
INFO[1342] [healthcheck] service [kube-proxy] on host [192.168.31.131] is healthy
INFO[1343] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.31.134]
INFO[1343] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1343] Starting container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1343] [healthcheck] service [kube-proxy] on host [192.168.31.134] is healthy
INFO[1343] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.134]
INFO[1343] [worker] Successfully started [rke-log-linker] container on host [192.168.31.130]
INFO[1343] Removing container [rke-log-linker] on host [192.168.31.130], try #1
INFO[1343] Starting container [rke-log-linker] on host [192.168.31.134], try #1
INFO[1344] [worker] Successfully started [rke-log-linker] container on host [192.168.31.134]
INFO[1344] Removing container [rke-log-linker] on host [192.168.31.134], try #1
INFO[1344] [worker] Successfully started [rke-log-linker] container on host [192.168.31.131]
INFO[1344] Removing container [rke-log-linker] on host [192.168.31.131], try #1
INFO[1344] [remove/rke-log-linker] Successfully removed container on host [192.168.31.134]
INFO[1344] [healthcheck] service [kube-proxy] on host [192.168.31.133] is healthy
INFO[1344] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.133]
INFO[1344] Starting container [rke-log-linker] on host [192.168.31.133], try #1
INFO[1344] [remove/rke-log-linker] Successfully removed container on host [192.168.31.131]
INFO[1345] [remove/rke-log-linker] Successfully removed container on host [192.168.31.130]
INFO[1345] [worker] Successfully started [rke-log-linker] container on host [192.168.31.133]
INFO[1345] Removing container [rke-log-linker] on host [192.168.31.133], try #1
INFO[1345] [remove/rke-log-linker] Successfully removed container on host [192.168.31.133]
INFO[1345] [worker] Successfully started Worker Plane..
INFO[1345] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.133]
INFO[1345] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.131]
INFO[1345] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.130]
INFO[1346] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.132]
INFO[1346] Image [rancher/rke-tools:v0.1.52] exists on host [192.168.31.134]
INFO[1346] Starting container [rke-log-cleaner] on host [192.168.31.133], try #1
INFO[1346] Starting container [rke-log-cleaner] on host [192.168.31.132], try #1
INFO[1346] Starting container [rke-log-cleaner] on host [192.168.31.130], try #1
INFO[1346] Starting container [rke-log-cleaner] on host [192.168.31.131], try #1
INFO[1346] Starting container [rke-log-cleaner] on host [192.168.31.134], try #1
INFO[1346] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.31.133]
INFO[1346] Removing container [rke-log-cleaner] on host [192.168.31.133], try #1
INFO[1347] [remove/rke-log-cleaner] Successfully removed container on host [192.168.31.133]
INFO[1347] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.31.131]
INFO[1347] Removing container [rke-log-cleaner] on host [192.168.31.131], try #1
INFO[1347] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.31.132]
INFO[1347] Removing container [rke-log-cleaner] on host [192.168.31.132], try #1
INFO[1348] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.31.130]
INFO[1348] Removing container [rke-log-cleaner] on host [192.168.31.130], try #1
INFO[1348] [remove/rke-log-cleaner] Successfully removed container on host [192.168.31.130]
INFO[1349] [remove/rke-log-cleaner] Successfully removed container on host [192.168.31.131]
INFO[1349] [remove/rke-log-cleaner] Successfully removed container on host [192.168.31.132]
INFO[1350] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.31.134]
INFO[1350] Removing container [rke-log-cleaner] on host [192.168.31.134], try #1
INFO[1353] [remove/rke-log-cleaner] Successfully removed container on host [192.168.31.134]
INFO[1353] [sync] Syncing nodes Labels and Taints
INFO[1354] [sync] Successfully synced nodes Labels and Taints
INFO[1354] [network] Setting up network plugin: canal
INFO[1354] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[1354] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[1354] [addons] Executing deploy job rke-network-plugin
INFO[1383] [addons] Setting up coredns
INFO[1383] [addons] Saving ConfigMap for addon rke-coredns-addon to Kubernetes
INFO[1383] [addons] Successfully saved ConfigMap for addon rke-coredns-addon to Kubernetes
INFO[1383] [addons] Executing deploy job rke-coredns-addon
INFO[1383] [addons] CoreDNS deployed successfully..
INFO[1383] [dns] DNS provider coredns deployed successfully
INFO[1383] [addons] Setting up Metrics Server
INFO[1383] [addons] Saving ConfigMap for addon rke-metrics-addon to Kubernetes
INFO[1383] [addons] Successfully saved ConfigMap for addon rke-metrics-addon to Kubernetes
INFO[1383] [addons] Executing deploy job rke-metrics-addon
INFO[1383] [addons] Metrics Server deployed successfully
INFO[1383] [ingress] Setting up nginx ingress controller
INFO[1383] [addons] Saving ConfigMap for addon rke-ingress-controller to Kubernetes
INFO[1383] [addons] Successfully saved ConfigMap for addon rke-ingress-controller to Kubernetes
INFO[1383] [addons] Executing deploy job rke-ingress-controller
INFO[1383] [ingress] ingress controller nginx deployed successfully
INFO[1383] [addons] Setting up user addons
INFO[1383] [addons] no user addons defined
INFO[1383] Finished building Kubernetes cluster successfully
[rancher@rmaster01 ~]$
[rancher@rmaster01 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01 Ready worker 21d v1.17.4
node02 Ready worker 21d v1.17.4
rmaster01 Ready controlplane,etcd 21d v1.17.4
rmaster02 Ready controlplane,etcd 21d v1.17.4
rmaster03 Ready controlplane,etcd 21d v1.17.4
[rancher@rmaster01 ~]$
2. 集群备份与恢复
# etcd数据备份,备份文件保存在/opt/rke/etcd-snapshots
rke etcd snapshot-save —config cluster.yml —name snapshot-test
etcd数据恢复
rke etcd snapshot-restore —config cluster.yml —name snapshot-test
- 定时备份
当集群启用了etcd-snapshot服务时,可以查看etcd-roll-snapshot容器日志,以确认是否自动创建备份。
docker logs etcd-rolling-snapshots
3. 证书管理
默认情况下,Kubernetes集群需要证书,RKE将自动为集群生成证书。在证书过期之前以及证书受到破坏时,轮换些证书非常重要。
证书轮换之后,Kubernetes组件将自动重新启动。证书轮换可用于下列服务:
- etcd
- kubelet
- kube-apiserver
- kube-proxy
- kube-scheduler
- kube-controller-manager
RKE可以通过一些简单的命令轮换自动生成的证书:
- 使用相同的CA轮换所有服务证书
- 使用相同的CA为单个服务轮换证书
- 轮换CA和所有服务证书
当您准备轮换证书时, RKE 配置文件 cluster.yml是必须的。运行rke cert rotate命令时,可通过—config指定配置路径。
使用相同CA轮换所有服务证书
rke cert rotate —config cluster.yml
使用相同CA轮换单个服务证书
rke cert rotate —service kubelet —config cluster.yml
轮换CA和所有服务证书
rke cert rotate —rotate-ca —config cluster.yml
rke部署默认证书10年
[root@uat-rancher-node01 ~]# openssl x509 -in /etc/kubernetes/ssl/kube-apiserver.pem -noout -dates
notBefore=Aug 29 09:26:10 2020 GMT
notAfter=Aug 27 09:26:11 2030 GMT
[root@uat-rancher-node01 ~]#
- 集群节点管理
# 添加删除节点需修改cluster.yml文件重新执行rke up
如果仅增加或者删除Work node节点可以添加 —update-only参数,表明只更新Work node的资源
rke up —update-only —config cluster.yml
4. 删除Kubernetes集群
4.1 删除Kubernetes集群
rke remove —config cluster.yml
4.2 执行脚本,清理残留信息.注意:使用root用户执行
#!/bin/bash
# 停止服务
systemctl disable kubelet.service
systemctl disable kube-scheduler.service
systemctl disable kube-proxy.service
systemctl disable kube-controller-manager.service
systemctl disable kube-apiserver.service
systemctl stop kubelet.service
systemctl stop kube-scheduler.service
systemctl stop kube-proxy.service
systemctl stop kube-controller-manager.service
systemctl stop kube-apiserver.service
# 删除所有容器
docker rm -f $(docker ps -qa)
# 删除所有容器卷
docker volume rm $(docker volume ls -q)
# 卸载mount目录
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done
# 备份目录
mv /etc/kubernetes /etc/kubernetes-bak-$(date +"%Y%m%d%H%M")
mv /var/lib/etcd /var/lib/etcd-bak-$(date +"%Y%m%d%H%M")
mv /var/lib/rancher /var/lib/rancher-bak-$(date +"%Y%m%d%H%M")
mv /opt/rke /opt/rke-bak-$(date +"%Y%m%d%H%M")
# 删除残留路径
rm -rf /etc/ceph \
/etc/cni \
/opt/cni \
/run/secrets/kubernetes.io \
/run/calico \
/run/flannel \
/var/lib/calico \
/var/lib/cni \
/var/lib/kubelet \
/var/log/containers \
/var/log/pods \
/var/run/calico
# 清理网络接口
network_interface=`ls /sys/class/net`
for net_inter in $network_interface;
do
if ! echo $net_inter | grep -qiE 'lo|docker0|eth*|ens*';then
ip link delete $net_inter
fi
done
# 清理残留进程
port_list='80 443 6443 2376 2379 2380 8472 9099 10250 10254'
for port in $port_list
do
pid=`netstat -atlnup|grep $port |awk '{print $7}'|awk -F '/' '{print $1}'|grep -v -|sort -rnk2|uniq`
if [[ -n $pid ]];then
kill -9 $pid
fi
done
pro_pid=`ps -ef |grep -v grep |grep kube|awk '{print $2}'`
if [[ -n $pro_pid ]];then
kill -9 $pro_pid
fi
# 清理Iptables表
## 注意:如果节点Iptables有特殊配置,以下命令请谨慎操作
sudo iptables --flush
sudo iptables --flush --table nat
sudo iptables --flush --table filter
sudo iptables --table nat --delete-chain
sudo iptables --table filter --delete-chain
systemctl restart docker
### 注意:清理完后请重启机器
5. 恢复Kubectl配置文件
# 编写脚本
#!/bin/bash
help ()
{
echo ' ================================================================ '
echo ' --master-ip: 指定Master节点IP,任意一个K8S Master节点IP即可。'
echo ' 使用示例:bash restore-kube-config.sh --master-ip=1.1.1.1 '
echo ' ================================================================'
}
case "$1" in
-h|--help) help; exit;;
esac
if [[ $1 == '' ]];then
help;
exit;
fi
CMDOPTS="$*"
for OPTS in $CMDOPTS;
do
key=$(echo ${OPTS} | awk -F"=" '{print $1}' )
value=$(echo ${OPTS} | awk -F"=" '{print $2}' )
case "$key" in
--master-ip) K8S_MASTER_NODE_IP=$value ;;
esac
done
# 获取Rancher Agent镜像
RANCHER_IMAGE=$( docker images --filter=label=io.cattle.agent=true |grep 'v2.' | \
grep -v -E 'rc|alpha|<none>' | head -n 1 | awk '{print $3}' )
if [ -d /opt/rke/etc/kubernetes/ssl ]; then
K8S_SSLDIR=/opt/rke/etc/kubernetes/ssl
else
K8S_SSLDIR=/etc/kubernetes/ssl
fi
CHECK_CLUSTER_STATE_CONFIGMAP=$( docker run --rm --entrypoint bash --net=host \
-v $K8S_SSLDIR:/etc/kubernetes/ssl:ro $RANCHER_IMAGE -c '\
if kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml \
-n kube-system get configmap full-cluster-state | grep full-cluster-state > /dev/null; then \
echo 'yes'; else echo 'no'; fi' )
if [ $CHECK_CLUSTER_STATE_CONFIGMAP != 'yes' ]; then
docker run --rm --net=host \
--entrypoint bash \
-e K8S_MASTER_NODE_IP=$K8S_MASTER_NODE_IP \
-v $K8S_SSLDIR:/etc/kubernetes/ssl:ro \
$RANCHER_IMAGE \
-c '\
kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml \
-n kube-system \
get secret kube-admin -o jsonpath={.data.Config} | base64 --decode | \
sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://${K8S_MASTER_NODE_IP}:6443\"_"' > kubeconfig_admin.yaml
if [ -s kubeconfig_admin.yaml ]; then
echo '恢复成功,执行以下命令测试:'
echo ''
echo "kubectl --kubeconfig kubeconfig_admin.yaml get nodes"
else
echo "kubeconfig恢复失败。"
fi
else
docker run --rm --entrypoint bash --net=host \
-e K8S_MASTER_NODE_IP=$K8S_MASTER_NODE_IP \
-v $K8S_SSLDIR:/etc/kubernetes/ssl:ro \
$RANCHER_IMAGE \
-c '\
kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml \
-n kube-system \
get configmap full-cluster-state -o json | \
jq -r .data.\"full-cluster-state\" | \
jq -r .currentState.certificatesBundle.\"kube-admin\".config | \
sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://${K8S_MASTER_NODE_IP}:6443\"_"' > kubeconfig_admin.yaml
if [ -s kubeconfig_admin.yaml ]; then
echo '恢复成功,执行以下命令测试:'
echo ''
echo "kubectl --kubeconfig kubeconfig_admin.yaml get nodes"
else
echo "kubeconfig恢复失败。"
fi
fi
添加执行权限
chmod +x estore-kube-config.sh
恢复文件
./estore-kube-config.sh —master-ip=<任意一台master节点IP>
测试
kubectl —kubeconfig ./kubeconfig_admin.yaml get nodes
6. 恢复rkestate状态文件
RKE在集群配置文件cluster.yml的同一目录中创建cluster.rkestate文件。该.rkestate文件包含集群的当前状态,包括RKE配置和证书。需要保留此文件以更新集群或通过RKE对集群执行任何操作。
安装jq工具
yum -y install jq
恢复文件
kubectl —kubeconfig kube_config_cluster.yml get configmap -n kube-system \
full-cluster-state -o json | jq -r .data.\”full-cluster-state\” | jq -r . > cluster.rkestate
7. 节点维护
# 设置维护模式
kubectl cordon
取消维护模式
kubectl uncordon
8. 节点应用迁移
kubectl drain