问题现象
流水线执行到docker build 的RUN步骤时卡住:
到流水线执行的节点:node01上docker run启动容器会卡住、build执行到RUN也会卡住:(node02同样的问题)
[root@node01]# lltotal 12-rwxr-xr-x 1 root root 46 May 26 18:28 build.sh-rw-r--r-- 1 root root 74 May 26 18:27 Dockerfile-rw-r--r-- 1 root root 42 May 26 18:27 start.sh[root@node01]#[root@node01]# cat DockerfileFROM busybox:1.26.2ADD start.sh /start.shRUN chmod +x /start.sh[root@node01]# cat start.sh#!/bin/bashset -execho "-------------"[root@node01]# cat build.sh#!/bin/bashdocker build -t test-images:v1 .[root@node01]# ./build.shSending build context to Docker daemon 4.096kBStep 1/3 : FROM busybox:1.26.2---> c30178c5239fStep 2/3 : ADD start.sh /start.sh---> Using cache---> dd4696798259Step 3/3 : RUN chmod +x /start.sh---> Running in 4aba829ca844
重启docker问题依旧不能解决。
定位
docker日志中报错:
May 26 18:30:19 node01 dockerd[39925]: time="2021-05-26T18:30:19.551633194+08:00" level=warning msg="e8f604d9c7abd8957f599bdba46fa217c2dd23612ede438af48236ceab38e1ae cleanup: failed to unmount IPC: umount /var/lib/docker/containers/e8f604d9c7abd8957f599bdba46fa217c2dd23612ede438af48236ceab38e1ae/mounts/shm, flags: 0x2: no such file or directory"May 26 18:30:43 node01 dockerd[39925]: time="2021-05-26T18:30:43.427128125+08:00" level=warning msg="failed to retrieve runc version: unknown output format: runc version 1.0.0-rc93\ncommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec\nspec: 1.0.2-dev\ngo: go1.13.15\nlibseccomp: 2.3.1\n"
第一个报错源码位置:
正常节点也有这个报错,所以不用管。
第二个报错源码位于:
正常节点看runc版本:
[root@master02 ~]#[root@master02 ~]# docker info | grep runcRuntimes: runcDefault Runtime: runcrunc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
异常节点runc版本:
[root@node01 ~]# docker info | grep runcRuntimes: runcDefault Runtime: runcrunc version: N/A[root@node02 ~]# docker info | grep runcRuntimes: runcDefault Runtime: runcrunc version: N/A
异常节点看到的runc版本不对,这里显示N/A,正好是上面代码截图处docker daemon代码中赋的值(这也是异常节点docker日志中报错的原因)。
[root@node01 ~]# runc --versionrunc version 1.0.0-rc93commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdecspec: 1.0.2-devgo: go1.13.15libseccomp: 2.3.1
正常的runc版本应该为dc9208a3303feef5b3839f4323d9beb36df0a9dd。
[root@master02 ~]# runc --versionrunc version 1.0.0-rc10commit: dc9208a3303feef5b3839f4323d9beb36df0a9ddspec: 1.0.1-dev
解决
直接替换二进制
**
把正常节点的runc二进制拷贝到异常节点即可。
也可以使用编译安装的方式。拷贝编译好的runc二进制到有问题的其他机器之后,其他机器也正常了。
编译安装
**
找到runc源码:https://github.com/opencontainers/runc
需要先安装golang环境(关闭go module):
# pwd/root/go/src/github.com/opencontainers# git clone https://github.com/opencontainers/runc.git# cd runc# git checkout v1.0.0-rc10
参考官方文档:
https://github.com/opencontainers/runc
[root@node02 runc]# export GO111MODULE=off[root@node02 runc]# yum install -y libseccomp-develLoaded plugins: fastestmirrorLoading mirror speeds from cached hostfile* base: mirrors.aliyun.com* extras: mirrors.aliyun.com* updates: mirrors.aliyun.comResolving Dependencies--> Running transaction check---> Package libseccomp-devel.x86_64 0:2.3.1-4.el7 will be installed--> Finished Dependency ResolutionDependencies Resolved==================================================================================================================================================================================Package Arch Version Repository Size==================================================================================================================================================================================Installing:libseccomp-devel x86_64 2.3.1-4.el7 base 64 kTransaction Summary==================================================================================================================================================================================Install 1 PackageTotal download size: 64 kInstalled size: 198 kDownloading packages:libseccomp-devel-2.3.1-4.el7.x86_64.rpm | 64 kB 00:00:00Running transaction checkRunning transaction testTransaction test succeededRunning transactionInstalling : libseccomp-devel-2.3.1-4.el7.x86_64 1/1Verifying : libseccomp-devel-2.3.1-4.el7.x86_64 1/1Installed:libseccomp-devel.x86_64 0:2.3.1-4.el7Complete![root@node02 runc]# makego build -buildmode=pie -ldflags "-X main.gitCommit="dc9208a3303feef5b3839f4323d9beb36df0a9dd" -X main.version=1.0.0-rc10 " -tags "seccomp" -o runc .[root@node02 runc]# make installinstall -D -m0755 runc /usr/local/sbin/runc[root@node02 runc]# runc -versionrunc version 1.0.0-rc10commit: dc9208a3303feef5b3839f4323d9beb36df0a9ddspec: 1.0.1-dev
安装好之后(不用重启docker),docker run正常:
[root@node02 runc]# docker info | grep runcRuntimes: runcDefault Runtime: runcrunc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd[root@node02 runc]# docker run -ti busybox:1.26.2 sh/ #
拷贝编译好的runc二进制到有问题的其他机器:scp /usr/local/sbin/runc 172.24.248.46:/usr/local/sbin/
拷贝之后其他机器也正常了:
[root@node01]# docker info | grep runcRuntimes: runcDefault Runtime: runcrunc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd[root@node01]# docker run -ti busybox:1.26.2 sh/ # ^C/ # exit
流水线正常

