问题现象

流水线执行到docker build 的RUN步骤时卡住:
image.png

到流水线执行的节点:node01上docker run启动容器会卡住、build执行到RUN也会卡住:(node02同样的问题)

  1. [root@node01]# ll
  2. total 12
  3. -rwxr-xr-x 1 root root 46 May 26 18:28 build.sh
  4. -rw-r--r-- 1 root root 74 May 26 18:27 Dockerfile
  5. -rw-r--r-- 1 root root 42 May 26 18:27 start.sh
  6. [root@node01]#
  7. [root@node01]# cat Dockerfile
  8. FROM busybox:1.26.2
  9. ADD start.sh /start.sh
  10. RUN chmod +x /start.sh
  11. [root@node01]# cat start.sh
  12. #!/bin/bash
  13. set -ex
  14. echo "-------------"
  15. [root@node01]# cat build.sh
  16. #!/bin/bash
  17. docker build -t test-images:v1 .
  18. [root@node01]# ./build.sh
  19. Sending build context to Docker daemon 4.096kB
  20. Step 1/3 : FROM busybox:1.26.2
  21. ---> c30178c5239f
  22. Step 2/3 : ADD start.sh /start.sh
  23. ---> Using cache
  24. ---> dd4696798259
  25. Step 3/3 : RUN chmod +x /start.sh
  26. ---> Running in 4aba829ca844

重启docker问题依旧不能解决。

定位

docker日志中报错:

  1. May 26 18:30:19 node01 dockerd[39925]: time="2021-05-26T18:30:19.551633194+08:00" level=warning msg="e8f604d9c7abd8957f599bdba46fa217c2dd23612ede438af48236ceab38e1ae cleanup: failed to unmount IPC: umount /var/lib/docker/containers/e8f604d9c7abd8957f599bdba46fa217c2dd23612ede438af48236ceab38e1ae/mounts/shm, flags: 0x2: no such file or directory"
  2. May 26 18:30:43 node01 dockerd[39925]: time="2021-05-26T18:30:43.427128125+08:00" level=warning msg="failed to retrieve runc version: unknown output format: runc version 1.0.0-rc93\ncommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec\nspec: 1.0.2-dev\ngo: go1.13.15\nlibseccomp: 2.3.1\n"

第一个报错源码位置:
image.png
正常节点也有这个报错,所以不用管。

第二个报错源码位于:
image.png
正常节点看runc版本:

  1. [root@master02 ~]#
  2. [root@master02 ~]# docker info | grep runc
  3. Runtimes: runc
  4. Default Runtime: runc
  5. runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd

异常节点runc版本:

  1. [root@node01 ~]# docker info | grep runc
  2. Runtimes: runc
  3. Default Runtime: runc
  4. runc version: N/A
  5. [root@node02 ~]# docker info | grep runc
  6. Runtimes: runc
  7. Default Runtime: runc
  8. runc version: N/A

异常节点看到的runc版本不对,这里显示N/A,正好是上面代码截图处docker daemon代码中赋的值(这也是异常节点docker日志中报错的原因)。

  1. [root@node01 ~]# runc --version
  2. runc version 1.0.0-rc93
  3. commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
  4. spec: 1.0.2-dev
  5. go: go1.13.15
  6. libseccomp: 2.3.1

正常的runc版本应该为dc9208a3303feef5b3839f4323d9beb36df0a9dd。

  1. [root@master02 ~]# runc --version
  2. runc version 1.0.0-rc10
  3. commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
  4. spec: 1.0.1-dev

异常节点的runc版本高于正常节点。所以需要降级。
**

解决

直接替换二进制

**
把正常节点的runc二进制拷贝到异常节点即可。

也可以使用编译安装的方式。拷贝编译好的runc二进制到有问题的其他机器之后,其他机器也正常了。

编译安装

**
找到runc源码:https://github.com/opencontainers/runc

需要先安装golang环境(关闭go module):

  1. # pwd
  2. /root/go/src/github.com/opencontainers
  3. # git clone https://github.com/opencontainers/runc.git
  4. # cd runc
  5. # git checkout v1.0.0-rc10

参考官方文档:
https://github.com/opencontainers/runc

  1. [root@node02 runc]# export GO111MODULE=off
  2. [root@node02 runc]# yum install -y libseccomp-devel
  3. Loaded plugins: fastestmirror
  4. Loading mirror speeds from cached hostfile
  5. * base: mirrors.aliyun.com
  6. * extras: mirrors.aliyun.com
  7. * updates: mirrors.aliyun.com
  8. Resolving Dependencies
  9. --> Running transaction check
  10. ---> Package libseccomp-devel.x86_64 0:2.3.1-4.el7 will be installed
  11. --> Finished Dependency Resolution
  12. Dependencies Resolved
  13. ==================================================================================================================================================================================
  14. Package Arch Version Repository Size
  15. ==================================================================================================================================================================================
  16. Installing:
  17. libseccomp-devel x86_64 2.3.1-4.el7 base 64 k
  18. Transaction Summary
  19. ==================================================================================================================================================================================
  20. Install 1 Package
  21. Total download size: 64 k
  22. Installed size: 198 k
  23. Downloading packages:
  24. libseccomp-devel-2.3.1-4.el7.x86_64.rpm | 64 kB 00:00:00
  25. Running transaction check
  26. Running transaction test
  27. Transaction test succeeded
  28. Running transaction
  29. Installing : libseccomp-devel-2.3.1-4.el7.x86_64 1/1
  30. Verifying : libseccomp-devel-2.3.1-4.el7.x86_64 1/1
  31. Installed:
  32. libseccomp-devel.x86_64 0:2.3.1-4.el7
  33. Complete!
  34. [root@node02 runc]# make
  35. go build -buildmode=pie -ldflags "-X main.gitCommit="dc9208a3303feef5b3839f4323d9beb36df0a9dd" -X main.version=1.0.0-rc10 " -tags "seccomp" -o runc .
  36. [root@node02 runc]# make install
  37. install -D -m0755 runc /usr/local/sbin/runc
  38. [root@node02 runc]# runc -version
  39. runc version 1.0.0-rc10
  40. commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
  41. spec: 1.0.1-dev

安装好之后(不用重启docker),docker run正常:

  1. [root@node02 runc]# docker info | grep runc
  2. Runtimes: runc
  3. Default Runtime: runc
  4. runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
  5. [root@node02 runc]# docker run -ti busybox:1.26.2 sh
  6. / #

拷贝编译好的runc二进制到有问题的其他机器:scp /usr/local/sbin/runc 172.24.248.46:/usr/local/sbin/
拷贝之后其他机器也正常了:

  1. [root@node01]# docker info | grep runc
  2. Runtimes: runc
  3. Default Runtime: runc
  4. runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
  5. [root@node01]# docker run -ti busybox:1.26.2 sh
  6. / # ^C
  7. / # exit

流水线正常

image.png