1. 问题描述
- 断电后,重启 kubernetes 集群,发现有个节点处于 notready 状态,且该节点的所有容器都处于 Exited 状态,重启 docker 服务也无法恢复。
- 在 master 节点上查看失败节点的信息: ```shell $ kubectl describe node server-1
output
… Type Status Reason Message
MemoryPressure Unknown NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown NodeStatusUnknown Kubelet stopped posting node status. …
- 发现可能是节点上的 kubelet 有问题,查看 kubelet 是 exited 状态:
```shell
server-1:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-03-31 15:52:19 CST; 4s ago
Docs: https://kubernetes.io/docs/home/
Process: 7420 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255 Main PID: 7420 (code=exited, status=255)
- 重启 kubelet 后仍然是 exited 状态
2. 解决办法
关闭交换分区(临时关闭)
sudo swapoff -a
重启 kubelet
$ systemctl restart kubelet $ systemctl status kubelet
kubelet 恢复正常后,该节点上的容器自动恢复 Up 状态,节点恢复 ready 状态。