Install

安装依赖

为 seccomp 安装依赖:

  1. root@i-gqqsb62d:~# apt install libseccomp-dev libseccomp2 seccomp
  2. ...
  3. root@i-gqqsb62d:~# apt show seccomp
  4. Package: seccomp
  5. Version: 2.4.3-1ubuntu3.18.04.3
  6. Priority: optional
  7. Section: utils
  8. Source: libseccomp
  9. Origin: Ubuntu
  10. Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
  11. Original-Maintainer: Kees Cook <kees@debian.org>
  12. Bugs: https://bugs.launchpad.net/ubuntu/+filebug
  13. Installed-Size: 26.6 kB
  14. Depends: libc6 (>= 2.3.4), libseccomp2 (>= 2.2.1)
  15. Suggests: libseccomp-dev
  16. Homepage: https://github.com/seccomp/libseccomp
  17. Task: ubuntu-core
  18. Supported: 5y
  19. Download-Size: 5256 B
  20. APT-Manual-Installed: yes
  21. APT-Sources: http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
  22. Description: helper tools for high level interface to Linux seccomp filter
  23. Provides helper tools for interacting with libseccomp. Currently, only
  24. a single tool exists, providing a way to easily enumerate syscalls across
  25. the supported architectures.

seccomp是一种内核中的安全机制,正常情况下,程序可以使用所有的syscall,这是不安全的,比如劫持程序流后通过execve的syscall来getshell.通过seccomp我们可以在程序中禁用掉某些syscall,这样就算劫持了程序流也只能调用部分的syscall了

安装Containerd程序

URL:https://github.com/containerd/containerd

Containerd 提供了两个压缩包,一个叫 containerd-${VERSION}.${OS}-${ARCH}.tar.gz,另一个叫 cri-containerd-${VERSION}.${OS}-${ARCH}.tar.gz。其中 cri-containerd-${VERSION}.${OS}-${ARCH}.tar.gz 包含了所有 Kubernetes 需要的二进制文件。如果你只是本地测试,可以选择前一个压缩包;如果是作为 Kubernetes 的容器运行时,需要选择后一个压缩包。

Containerd 是需要调用 runc 的,而第一个压缩包是不包含 runc 二进制文件的,如果你选择第一个压缩包,还需要提前安装 runc。所以我建议直接使用 cri-containerd 压缩包。

  • 下载
  1. # wget https://github.com/containerd/containerd/releases/download/v1.4.4/cri-containerd-cni-1.4.4-linux-amd64.tar.gz
  • 查看文件内容
  1. root@i-gqqsb62d:~# ls -ltr
  2. total 96928
  3. -rw-r--r-- 1 root root 99252755 Mar 5 14:03 cri-containerd-cni-1.4.4-linux-amd64.tar.gz
  4. root@i-gqqsb62d:~# tar -tf cri-containerd-cni-1.4.4-linux-amd64.tar.gz
  5. etc/
  6. etc/systemd/
  7. etc/systemd/system/
  8. etc/systemd/system/containerd.service
  9. etc/crictl.yaml
  10. etc/cni/
  11. etc/cni/net.d/
  12. etc/cni/net.d/10-containerd-net.conflist
  13. usr/
  14. usr/local/
  15. usr/local/bin/
  16. usr/local/bin/containerd
  17. usr/local/bin/containerd-shim
  18. usr/local/bin/crictl
  19. usr/local/bin/containerd-shim-runc-v2
  20. usr/local/bin/critest
  21. usr/local/bin/containerd-shim-runc-v1
  22. usr/local/bin/ctr
  23. usr/local/sbin/
  24. usr/local/sbin/runc
  25. opt/
  26. opt/containerd/
  27. opt/containerd/cluster/
  28. opt/containerd/cluster/gce/
  29. opt/containerd/cluster/gce/env
  30. opt/containerd/cluster/gce/cni.template
  31. opt/containerd/cluster/gce/configure.sh
  32. opt/containerd/cluster/gce/cloud-init/
  33. opt/containerd/cluster/gce/cloud-init/node.yaml
  34. opt/containerd/cluster/gce/cloud-init/master.yaml
  35. opt/containerd/cluster/version
  36. opt/cni/
  37. opt/cni/bin/
  38. opt/cni/bin/bandwidth
  39. opt/cni/bin/host-device
  40. opt/cni/bin/flannel
  41. opt/cni/bin/static
  42. opt/cni/bin/loopback
  43. opt/cni/bin/dhcp
  44. opt/cni/bin/ptp
  45. opt/cni/bin/ipvlan
  46. opt/cni/bin/vlan
  47. opt/cni/bin/host-local
  48. opt/cni/bin/firewall
  49. opt/cni/bin/tuning
  50. opt/cni/bin/sbr
  51. opt/cni/bin/bridge
  52. opt/cni/bin/portmap
  53. opt/cni/bin/macvlan
  • 解压
  1. # tar -C / -zxvf cri-containerd-cni-1.4.4-linux-amd64.tar.gz
  • 添加变量到PATH(可选)

/usr/local/bin/usr/local/sbin 追加到 ~/.bashrc 文件的 $PATH 环境变量中:

  1. export PATH=$PATH:/usr/local/bin:/usr/local/sbin
  2. source ~/.bashrc
  • 启动
  1. # systemctl start containerd.service
  2. # systemctl enable containerd.service
  3. # systemctl status containerd.service
  4. containerd.service - containerd container runtime
  5. Loaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: enabled)
  6. Active: active (running) since Tue 2021-03-23 13:14:47 CST; 7s ago
  7. Docs: https://containerd.io
  8. Process: 4101 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  9. Main PID: 4107 (containerd)
  10. Tasks: 9
  11. CGroup: /system.slice/containerd.service
  12. └─4107 /usr/local/bin/containerd
  • 查看版本
  1. root@i-gqqsb62d:~/etc/systemd/system# ctr version
  2. Client:
  3. Version: v1.4.4
  4. Revision: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
  5. Go version: go1.15.8
  6. Server:
  7. Version: v1.4.4
  8. Revision: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
  9. UUID: 063144b5-b13e-429c-ba9c-82bb2ab15aae
  10. root@i-gqqsb62d:~/etc/systemd/system# systemctl enable containerd.service
  11. Created symlink /etc/systemd/system/multi-user.target.wants/containerd.service /etc/systemd/system/containerd.service.

生成配置文件

Containerd 的默认配置文件为 /etc/containerd/config.toml,我们可以通过命令来生成一个默认的配置:

  1. mkdir /etc/containerd
  2. containerd config default> /etc/containerd/config.toml

镜像加速

由于某些不可描述的因素,在国内拉取公共镜像仓库的速度是极慢的,为了节约拉取时间,需要为 Containerd 配置镜像仓库的 mirror。Containerd 的镜像仓库 mirror 与 Docker 相比有两个区别:

  • Containerd 只支持通过 CRI 拉取镜像的 mirror,也就是说,只有通过 crictl 或者 Kubernetes 调用时 mirror 才会生效,通过 ctr 拉取是不会生效的。
  • Docker 只支持为 Docker Hub 配置 mirror,而 Containerd 支持为任意镜像仓库配置 mirror。

配置镜像加速之前,先来看下 Containerd 的配置结构,乍一看可能会觉得很复杂,复杂就复杂在 plugin 的配置部分

  1. root@i-gqqsb62d:/etc/containerd# cat config.toml
  2. version = 2
  3. root = "/var/lib/containerd"
  4. state = "/run/containerd"
  5. plugin_dir = ""
  6. disabled_plugins = []
  7. required_plugins = []
  8. oom_score = 0
  9. [grpc]
  10. address = "/run/containerd/containerd.sock"
  11. tcp_address = ""
  12. tcp_tls_cert = ""
  13. tcp_tls_key = ""
  14. uid = 0
  15. gid = 0
  16. max_recv_message_size = 16777216
  17. max_send_message_size = 16777216
  18. [ttrpc]
  19. address = ""
  20. uid = 0
  21. gid = 0
  22. [debug]
  23. address = ""
  24. uid = 0
  25. gid = 0
  26. level = ""
  27. [metrics]
  28. address = ""
  29. grpc_histogram = false
  30. [cgroup]
  31. path = ""
  32. [timeouts]
  33. "io.containerd.timeout.shim.cleanup" = "5s"
  34. "io.containerd.timeout.shim.load" = "5s"
  35. "io.containerd.timeout.shim.shutdown" = "3s"
  36. "io.containerd.timeout.task.state" = "2s"
  37. [plugins]
  38. [plugins."io.containerd.gc.v1.scheduler"]
  39. pause_threshold = 0.02
  40. deletion_threshold = 0
  41. mutation_threshold = 100
  42. schedule_delay = "0s"
  43. startup_delay = "100ms"
  44. [plugins."io.containerd.grpc.v1.cri"]
  45. disable_tcp_service = true
  46. stream_server_address = "127.0.0.1"
  47. stream_server_port = "0"
  48. stream_idle_timeout = "4h0m0s"
  49. enable_selinux = false
  50. selinux_category_range = 1024
  51. sandbox_image = "k8s.gcr.io/pause:3.2"
  52. stats_collect_period = 10
  53. systemd_cgroup = false
  54. enable_tls_streaming = false
  55. max_container_log_line_size = 16384
  56. disable_cgroup = false
  57. disable_apparmor = false
  58. restrict_oom_score_adj = false
  59. max_concurrent_downloads = 3
  60. disable_proc_mount = false
  61. unset_seccomp_profile = ""
  62. tolerate_missing_hugetlb_controller = true
  63. disable_hugetlb_controller = true
  64. ignore_image_defined_volumes = false
  65. [plugins."io.containerd.grpc.v1.cri".containerd]
  66. snapshotter = "overlayfs"
  67. default_runtime_name = "runc"
  68. no_pivot = false
  69. disable_snapshot_annotations = true
  70. discard_unpacked_layers = false
  71. [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
  72. runtime_type = ""
  73. runtime_engine = ""
  74. runtime_root = ""
  75. privileged_without_host_devices = false
  76. base_runtime_spec = ""
  77. [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
  78. runtime_type = ""
  79. runtime_engine = ""
  80. runtime_root = ""
  81. privileged_without_host_devices = false
  82. base_runtime_spec = ""
  83. [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
  84. [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  85. runtime_type = "io.containerd.runc.v2"
  86. runtime_engine = ""
  87. runtime_root = ""
  88. privileged_without_host_devices = false
  89. base_runtime_spec = ""
  90. [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  91. [plugins."io.containerd.grpc.v1.cri".cni]
  92. bin_dir = "/opt/cni/bin"
  93. conf_dir = "/etc/cni/net.d"
  94. max_conf_num = 1
  95. conf_template = ""
  96. [plugins."io.containerd.grpc.v1.cri".registry]
  97. [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
  98. [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
  99. endpoint = ["https://registry-1.docker.io"]
  100. [plugins."io.containerd.grpc.v1.cri".image_decryption]
  101. key_model = ""
  102. [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
  103. tls_cert_file = ""
  104. tls_key_file = ""
  105. [plugins."io.containerd.internal.v1.opt"]
  106. path = "/opt/containerd"
  107. [plugins."io.containerd.internal.v1.restart"]
  108. interval = "10s"
  109. [plugins."io.containerd.metadata.v1.bolt"]
  110. content_sharing_policy = "shared"
  111. [plugins."io.containerd.monitor.v1.cgroups"]
  112. no_prometheus = false
  113. [plugins."io.containerd.runtime.v1.linux"]
  114. shim = "containerd-shim"
  115. runtime = "runc"
  116. runtime_root = ""
  117. no_shim = false
  118. shim_debug = false
  119. [plugins."io.containerd.runtime.v2.task"]
  120. platforms = ["linux/amd64"]
  121. [plugins."io.containerd.service.v1.diff-service"]
  122. default = ["walking"]
  123. [plugins."io.containerd.snapshotter.v1.devmapper"]
  124. root_path = ""
  125. pool_name = ""
  126. base_image_size = ""
  127. async_remove = false

每一个顶级配置块的命名都是 plugins."io.containerd.xxx.vx.xxx" 这种形式,其实每一个顶级配置块都代表一个插件,其中 io.containerd.xxx.vx 表示插件的类型,vx 后面的 xxx 表示插件的 ID。可以通过 ctr 一览无余:

  1. root@i-gqqsb62d:/etc/containerd# ctr plugin ls
  2. TYPE ID PLATFORMS STATUS
  3. io.containerd.content.v1 content - ok
  4. io.containerd.snapshotter.v1 aufs linux/amd64 ok
  5. io.containerd.snapshotter.v1 btrfs linux/amd64 error
  6. io.containerd.snapshotter.v1 devmapper linux/amd64 error
  7. io.containerd.snapshotter.v1 native linux/amd64 ok
  8. io.containerd.snapshotter.v1 overlayfs linux/amd64 ok
  9. io.containerd.snapshotter.v1 zfs linux/amd64 error
  10. io.containerd.metadata.v1 bolt - ok
  11. io.containerd.differ.v1 walking linux/amd64 ok
  12. io.containerd.gc.v1 scheduler - ok
  13. io.containerd.service.v1 introspection-service - ok
  14. io.containerd.service.v1 containers-service - ok
  15. io.containerd.service.v1 content-service - ok
  16. io.containerd.service.v1 diff-service - ok
  17. io.containerd.service.v1 images-service - ok
  18. io.containerd.service.v1 leases-service - ok
  19. io.containerd.service.v1 namespaces-service - ok
  20. io.containerd.service.v1 snapshots-service - ok
  21. io.containerd.runtime.v1 linux linux/amd64 ok
  22. io.containerd.runtime.v2 task linux/amd64 ok
  23. io.containerd.monitor.v1 cgroups linux/amd64 ok
  24. io.containerd.service.v1 tasks-service - ok
  25. io.containerd.internal.v1 restart - ok
  26. io.containerd.grpc.v1 containers - ok
  27. io.containerd.grpc.v1 content - ok
  28. io.containerd.grpc.v1 diff - ok
  29. io.containerd.grpc.v1 events - ok
  30. io.containerd.grpc.v1 healthcheck - ok
  31. io.containerd.grpc.v1 images - ok
  32. io.containerd.grpc.v1 leases - ok
  33. io.containerd.grpc.v1 namespaces - ok
  34. io.containerd.internal.v1 opt - ok
  35. io.containerd.grpc.v1 snapshots - ok
  36. io.containerd.grpc.v1 tasks - ok
  37. io.containerd.grpc.v1 version - ok
  38. io.containerd.grpc.v1 cri linux/amd64 ok

顶级配置块下面的子配置块表示该插件的各种配置,比如 cri 插件下面就分为 containerdcniregistry 的配置,而 containerd 下面又可以配置各种 runtime,还可以配置默认的 runtime。

镜像加速的配置就在 cri 插件配置块下面的 registry 配置块,所以需要修改的部分如下:

  1. [plugins."io.containerd.grpc.v1.cri".registry]
  2. [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
  3. [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
  4. endpoint = ["https://registry-1.docker.io"]
  • registry.mirrors.”xxx” : 表示需要配置 mirror 的镜像仓库。例如,registry.mirrors."docker.io" 表示配置 docker.io 的 mirror。
  • endpoint : 表示提供 mirror 的镜像加速服务。例如,这里推荐使用西北农林科技大学提供的镜像加速服务作为 docker.io 的 mirror。

  1. [plugins."io.containerd.grpc.v1.cri".registry]
  2. [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
  3. [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
  4. endpoint = ["https://dockerhub.mirrors.nwafu.edu.cn"]

存储配置

Containerd 有两个不同的存储路径,一个用来保存持久化数据,一个用来保存运行时状态

  1. version = 2
  2. root = "/var/lib/containerd"
  3. state = "/run/containerd"

root用来保存持久化数据,包括 Snapshots, Content, Metadata 以及各种插件的数据。每一个插件都有自己单独的目录,Containerd 本身不存储任何数据,它的所有功能都来自于已加载的插件,真是太机智了。

  1. root@i-gqqsb62d:/etc/containerd# tree -L 2 /var/lib/containerd/
  2. /var/lib/containerd/
  3. ├── io.containerd.content.v1.content
  4. └── ingest
  5. ├── io.containerd.grpc.v1.introspection
  6. └── uuid
  7. ├── io.containerd.metadata.v1.bolt
  8. └── meta.db
  9. ├── io.containerd.runtime.v1.linux
  10. ├── io.containerd.runtime.v2.task
  11. ├── io.containerd.snapshotter.v1.aufs
  12. └── snapshots
  13. ├── io.containerd.snapshotter.v1.btrfs
  14. ├── io.containerd.snapshotter.v1.native
  15. └── snapshots
  16. ├── io.containerd.snapshotter.v1.overlayfs
  17. └── snapshots
  18. └── tmpmounts

state 用来保存临时数据,包括 sockets、pid、挂载点、运行时状态以及不需要持久化保存的插件数据。

  1. root@i-gqqsb62d:/etc/containerd# tree -L 2 /run/containerd/
  2. /run/containerd/
  3. ├── containerd.sock
  4. ├── containerd.sock.ttrpc
  5. ├── io.containerd.runtime.v1.linux
  6. └── io.containerd.runtime.v2.task

OOM

还有一项配置需要留意:

Containerd 是容器的守护者,一旦发生内存不足的情况,理想的情况应该是先杀死容器,而不是杀死 Containerd。所以需要调整 Containerd 的 OOM 权重,减少其被 OOM Kill 的几率。最好是将 oom_score 的值调整为比其他守护进程略低的值。这里的 oom_socre 其实对应的是 /proc//oom_socre_adj,在早期的 Linux 内核版本里使用 oom_adj 来调整权重, 后来改用 oom_socre_adj 了。该文件描述如下:

  1. oom_score = 0

在计算最终的 badness score 时,会在计算结果是中加上 oom_score_adj ,这样用户就可以通过该在值来保护某个进程不被杀死或者每次都杀某个进程。其取值范围为 -10001000

如果将该值设置为 -1000,则进程永远不会被杀死,因为此时 badness score 永远返回0。

建议 Containerd 将该值设置为 -9990 之间。如果作为 Kubernetes 的 Worker 节点,可以考虑设置为 -999

Systemd配置

建议通过 systemd 配置 Containerd 作为守护进程运行,配置文件在上文已经被解压出来了:

  1. cat /etc/systemd/system/containerd.service
  2. [Unit]
  3. Description=containerd container runtime
  4. Documentation=https://containerd.io
  5. After=network.target local-fs.target
  6. [Service]
  7. ExecStartPre=-/sbin/modprobe overlay
  8. ExecStart=/usr/local/bin/containerd
  9. Type=notify
  10. Delegate=yes
  11. KillMode=process
  12. Restart=always
  13. RestartSec=5
  14. # Having non-zero Limit*s causes performance problems due to accounting overhead
  15. # in the kernel. We recommend using cgroups to do container-local accounting.
  16. LimitNPROC=infinity
  17. LimitCORE=infinity
  18. LimitNOFILE=1048576
  19. # Comment TasksMax if your systemd version does not supports it.
  20. # Only systemd 226 and above support this version.
  21. TasksMax=infinity
  22. OOMScoreAdjust=-999
  23. [Install]
  24. WantedBy=multi-user.target

这里有两个重要的参数:

  • Delegate : 这个选项允许 Containerd 以及运行时自己管理自己创建的容器的 cgroups。如果不设置这个选项,systemd 就会将进程移到自己的 cgroups 中,从而导致 Containerd 无法正确获取容器的资源使用情况。
  • KillMode : 这个选项用来处理 Containerd 进程被杀死的方式。默认情况下,systemd 会在进程的 cgroup 中查找并杀死 Containerd 的所有子进程,这肯定不是我们想要的。KillMode字段可以设置的值如下。
    我们需要将 KillMode 的值设置为 process,这样可以确保升级或重启 Containerd 时不杀死现有的容器。
    • control-group(默认值):当前控制组里面的所有子进程,都会被杀掉
    • process:只杀主进程
    • mixed:主进程将收到 SIGTERM 信号,子进程收到 SIGKILL 信号
    • none:没有进程会被杀掉,只是执行服务的 stop 命令。