背景
- version: 1.9.4,3台master;
- 其中一台master经常宕机。本文致力于将此NotReady的master热替换为新master;
- 本集群使用kubeadm安装官方架构。其中etcd和master同生命周期,即master节点有
master组件 && etcd pod; - etcd未采用kubeadm自动生成yaml文件模式;初始化新master时,会首先访问既有etcd集群。等运行正常后,修正etcd member;
- 下文切换步骤存在本集群特有case。各位看官需要透过现象看本质;
步骤
从已有master复制kubeadm/kubectl/kubelet的安装包
cd /optscp admin@<MASTER_IP>:/opt/opt.tar .tar -xvf opt.tarsudo cp -r bin /opt/sudo cp -r cni /opt/vim ~/.bash_profile将/opt/bin加入到PATH中source ~/.bash_profile
从已有master复制kubelet的配置文件
cd /optscp admin@<MASTER_IP>:/opt/config.tar .tar -xvf config.tarsudo cp kubelet.service /etc/systemd/system/sudo cp -r kubelet.service.d/ /etc/systemd/system/
关闭firewalld/iptables/selinux
sudo systemctl stop firewalld && sudo systemctl disable firewalldsudo systemctl stop iptables && sudo systemctl disable iptablessudo vim /etc/selinux/configSELINUX=disabled通过getenforce验证
内核参数调整
临时生效sudo sysctl -w net.bridge.bridge-nf-call-iptables=1sudo sysctl -w net.bridge.bridge-nf-call-ip6tables=1sudo swapoff -a永久生效sudo vim /etc/sysctl.confnet.bridge.bridge-nf-call-iptables=1net.bridge.bridge-nf-call-ip6tables=1vm.swappiness=0sudo sysctl -p
从已有master复制kubeadm的配置文件
scp admin@<MASTER_IP>:/home/admin/kubeadm-debug.conf .
将新master添加到etcd cluster中
在新master执行 sudo rm -rf /var/lib/etcd在已有etcd pod中执行export ETCDCTL_API=3etcdctl member add etcd3 --peer-urls="http://<NEW_MASTER_IP>:2380"
执行kubeadm
NOTE: 由于访问既有etcd集群,故etcd.yaml中ETCD_INITIAL_CLUSTER_STATE为existing
sudo kubeadm resetsudo scp admin@<MASTER_IP>:/etc/kubernetes/pki/* /etc/kubernetes/pki/sudo rm -f /etc/kubernetes/pki/apiserver.crtsudo rm -f /etc/kubernetes/pki/apiserver.keysudo cp etcd.yaml /etc/kubernetes/manifests/sudo ip link set cni0 downsudo ip link del cni0sudo ip link set flannel.1 downsudo ip link del flannel.1sudo kubeadm init --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml,ExternalEtcdVersion --config=kubeadm-debug.conf
修改kubelet数据目录的配置
sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf Environment="KUBELET_EXTRA_ARGS=$KUBELET_EXTRA_ARGS --root-dir=/home/admin/kubelet"sudo systemctl daemon-reloadsudo systemctl restart kubeletsudo systemctl enable kubelet
重启机器
sudo reboot
验证
kubectl get nodeskubectl get po -n kube-system
新master参与业务pod的调度
kubectl taint nodes <NEW_MASTER_HOSTNAME> node-role.kubernetes.io/master:NoSchedule-
将旧master从集群中剔除
kubectl drain <OLD_MASTER_HOSTNAME> --delete-local-data --force --ignore-daemonsetskubectl delete node <OLD_MASTER_HOSTNAME>
将集群当前master成员修改为etcd的初创成员
vim /etc/kubernetes/manifests/etcd.yaml
etcd TroubleShooting
- 无法往3-node etcd集群里添加member
- 上下文
3 node = 2 node(member nameis same&& alive) + 1 node(OLD_MASTER, dead) - 原因
4(3+新master)个node的集群需要(4/2)+1=3个法定人数,当前集群只有2个alive node < 法定人数, 故无法添加; - 方法
先将dead node从etcd集群删除;
- 上下文
- 无法往2-node etcd集群里添加member
- 上下文
2 member name is same && alive - 原因
通过分析源码, etcd会从ETCD_INITIAL_CLUSTER环境变量读取成员列表,其中会使用member_name作为map_key来存储成员信息; - 方法
在ETCD_INITIAL_CLUSTER环境变量中, 手动将member_name修改为不一致;
- 上下文
参考文章
- https://tonybai.com/2017/05/15/setup-a-ha-kubernetes-cluster-based-on-kubeadm-part2
后记
api-server通过LB访问etcd
``` sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml http://{VIP}:2379
sudo systemctl daemon-reload sudo systemctl restart kubelet
<a name="f444e893"></a>#### worker-kubelet通过LB访问api-server
sudo vim /etc/hosts {VIP} kubernetes.default.svc
sudo vim /etc/kubernetes/kubelet.conf server: https://kubernetes.default.svc:6443 sudo systemctl daemon-reload sudo systemctl restart kubelet
<a name="9486d114"></a>#### kube-proxy通过LB访问api-server
更改集群内所有节点的hosts sudo vim /etc/hosts {VIP} kubernetes.default.svc
kubectl get configmap -n kube-system kube-proxy -o yaml > kube-proxy-cm.yaml vim kube-proxy-cm.yaml https://kubernetes.default.svc:6443
kubectl apply -f kube-proxy-cm.yaml —force kubectl delete pod -n kube-system -l k8s-app=kube-proxy ```
