Install
安装依赖
为 seccomp 安装依赖:
root@i-gqqsb62d:~# apt install libseccomp-dev libseccomp2 seccomp
...
root@i-gqqsb62d:~# apt show seccomp
Package: seccomp
Version: 2.4.3-1ubuntu3.18.04.3
Priority: optional
Section: utils
Source: libseccomp
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Kees Cook <kees@debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 26.6 kB
Depends: libc6 (>= 2.3.4), libseccomp2 (>= 2.2.1)
Suggests: libseccomp-dev
Homepage: https://github.com/seccomp/libseccomp
Task: ubuntu-core
Supported: 5y
Download-Size: 5256 B
APT-Manual-Installed: yes
APT-Sources: http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
Description: helper tools for high level interface to Linux seccomp filter
Provides helper tools for interacting with libseccomp. Currently, only
a single tool exists, providing a way to easily enumerate syscalls across
the supported architectures.
seccomp是一种内核中的安全机制,正常情况下,程序可以使用所有的syscall,这是不安全的,比如劫持程序流后通过execve的syscall来getshell.通过seccomp我们可以在程序中禁用掉某些syscall,这样就算劫持了程序流也只能调用部分的syscall了
安装Containerd程序
URL:https://github.com/containerd/containerd
Containerd 提供了两个压缩包,一个叫
containerd-${VERSION}.${OS}-${ARCH}.tar.gz
,另一个叫cri-containerd-${VERSION}.${OS}-${ARCH}.tar.gz
。其中cri-containerd-${VERSION}.${OS}-${ARCH}.tar.gz
包含了所有 Kubernetes 需要的二进制文件。如果你只是本地测试,可以选择前一个压缩包;如果是作为 Kubernetes 的容器运行时,需要选择后一个压缩包。Containerd 是需要调用
runc
的,而第一个压缩包是不包含runc
二进制文件的,如果你选择第一个压缩包,还需要提前安装 runc。所以我建议直接使用cri-containerd
压缩包。
- 下载
# wget https://github.com/containerd/containerd/releases/download/v1.4.4/cri-containerd-cni-1.4.4-linux-amd64.tar.gz
- 查看文件内容
root@i-gqqsb62d:~# ls -ltr
total 96928
-rw-r--r-- 1 root root 99252755 Mar 5 14:03 cri-containerd-cni-1.4.4-linux-amd64.tar.gz
root@i-gqqsb62d:~# tar -tf cri-containerd-cni-1.4.4-linux-amd64.tar.gz
etc/
etc/systemd/
etc/systemd/system/
etc/systemd/system/containerd.service
etc/crictl.yaml
etc/cni/
etc/cni/net.d/
etc/cni/net.d/10-containerd-net.conflist
usr/
usr/local/
usr/local/bin/
usr/local/bin/containerd
usr/local/bin/containerd-shim
usr/local/bin/crictl
usr/local/bin/containerd-shim-runc-v2
usr/local/bin/critest
usr/local/bin/containerd-shim-runc-v1
usr/local/bin/ctr
usr/local/sbin/
usr/local/sbin/runc
opt/
opt/containerd/
opt/containerd/cluster/
opt/containerd/cluster/gce/
opt/containerd/cluster/gce/env
opt/containerd/cluster/gce/cni.template
opt/containerd/cluster/gce/configure.sh
opt/containerd/cluster/gce/cloud-init/
opt/containerd/cluster/gce/cloud-init/node.yaml
opt/containerd/cluster/gce/cloud-init/master.yaml
opt/containerd/cluster/version
opt/cni/
opt/cni/bin/
opt/cni/bin/bandwidth
opt/cni/bin/host-device
opt/cni/bin/flannel
opt/cni/bin/static
opt/cni/bin/loopback
opt/cni/bin/dhcp
opt/cni/bin/ptp
opt/cni/bin/ipvlan
opt/cni/bin/vlan
opt/cni/bin/host-local
opt/cni/bin/firewall
opt/cni/bin/tuning
opt/cni/bin/sbr
opt/cni/bin/bridge
opt/cni/bin/portmap
opt/cni/bin/macvlan
- 解压
# tar -C / -zxvf cri-containerd-cni-1.4.4-linux-amd64.tar.gz
- 添加变量到PATH(可选)
将 /usr/local/bin
和 /usr/local/sbin
追加到 ~/.bashrc
文件的 $PATH
环境变量中:
export PATH=$PATH:/usr/local/bin:/usr/local/sbin
→ source ~/.bashrc
- 启动
# systemctl start containerd.service
# systemctl enable containerd.service
# systemctl status containerd.service
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2021-03-23 13:14:47 CST; 7s ago
Docs: https://containerd.io
Process: 4101 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 4107 (containerd)
Tasks: 9
CGroup: /system.slice/containerd.service
└─4107 /usr/local/bin/containerd
- 查看版本
root@i-gqqsb62d:~/etc/systemd/system# ctr version
Client:
Version: v1.4.4
Revision: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
Go version: go1.15.8
Server:
Version: v1.4.4
Revision: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
UUID: 063144b5-b13e-429c-ba9c-82bb2ab15aae
root@i-gqqsb62d:~/etc/systemd/system# systemctl enable containerd.service
Created symlink /etc/systemd/system/multi-user.target.wants/containerd.service → /etc/systemd/system/containerd.service.
生成配置文件
Containerd 的默认配置文件为 /etc/containerd/config.toml
,我们可以通过命令来生成一个默认的配置:
mkdir /etc/containerd
containerd config default> /etc/containerd/config.toml
镜像加速
由于某些不可描述的因素,在国内拉取公共镜像仓库的速度是极慢的,为了节约拉取时间,需要为 Containerd 配置镜像仓库的 mirror
。Containerd 的镜像仓库 mirror 与 Docker 相比有两个区别:
- Containerd 只支持通过
CRI
拉取镜像的 mirror,也就是说,只有通过crictl
或者 Kubernetes 调用时 mirror 才会生效,通过ctr
拉取是不会生效的。 Docker
只支持为Docker Hub
配置 mirror,而Containerd
支持为任意镜像仓库配置 mirror。
配置镜像加速之前,先来看下 Containerd 的配置结构,乍一看可能会觉得很复杂,复杂就复杂在 plugin 的配置部分
root@i-gqqsb62d:/etc/containerd# cat config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0
[grpc]
address = "/run/containerd/containerd.sock"
tcp_address = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
[ttrpc]
address = ""
uid = 0
gid = 0
[debug]
address = ""
uid = 0
gid = 0
level = ""
[metrics]
address = ""
grpc_histogram = false
[cgroup]
path = ""
[timeouts]
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"
[plugins]
[plugins."io.containerd.gc.v1.scheduler"]
pause_threshold = 0.02
deletion_threshold = 0
mutation_threshold = 100
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
disable_tcp_service = true
stream_server_address = "127.0.0.1"
stream_server_port = "0"
stream_idle_timeout = "4h0m0s"
enable_selinux = false
selinux_category_range = 1024
sandbox_image = "k8s.gcr.io/pause:3.2"
stats_collect_period = 10
systemd_cgroup = false
enable_tls_streaming = false
max_container_log_line_size = 16384
disable_cgroup = false
disable_apparmor = false
restrict_oom_score_adj = false
max_concurrent_downloads = 3
disable_proc_mount = false
unset_seccomp_profile = ""
tolerate_missing_hugetlb_controller = true
disable_hugetlb_controller = true
ignore_image_defined_volumes = false
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
default_runtime_name = "runc"
no_pivot = false
disable_snapshot_annotations = true
discard_unpacked_layers = false
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
runtime_type = ""
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
runtime_type = ""
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
max_conf_num = 1
conf_template = ""
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".image_decryption]
key_model = ""
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"
[plugins."io.containerd.internal.v1.restart"]
interval = "10s"
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
[plugins."io.containerd.monitor.v1.cgroups"]
no_prometheus = false
[plugins."io.containerd.runtime.v1.linux"]
shim = "containerd-shim"
runtime = "runc"
runtime_root = ""
no_shim = false
shim_debug = false
[plugins."io.containerd.runtime.v2.task"]
platforms = ["linux/amd64"]
[plugins."io.containerd.service.v1.diff-service"]
default = ["walking"]
[plugins."io.containerd.snapshotter.v1.devmapper"]
root_path = ""
pool_name = ""
base_image_size = ""
async_remove = false
每一个顶级配置块的命名都是 plugins."io.containerd.xxx.vx.xxx"
这种形式,其实每一个顶级配置块都代表一个插件,其中 io.containerd.xxx.vx
表示插件的类型,vx 后面的 xxx 表示插件的 ID
。可以通过 ctr
一览无余:
root@i-gqqsb62d:/etc/containerd# ctr plugin ls
TYPE ID PLATFORMS STATUS
io.containerd.content.v1 content - ok
io.containerd.snapshotter.v1 aufs linux/amd64 ok
io.containerd.snapshotter.v1 btrfs linux/amd64 error
io.containerd.snapshotter.v1 devmapper linux/amd64 error
io.containerd.snapshotter.v1 native linux/amd64 ok
io.containerd.snapshotter.v1 overlayfs linux/amd64 ok
io.containerd.snapshotter.v1 zfs linux/amd64 error
io.containerd.metadata.v1 bolt - ok
io.containerd.differ.v1 walking linux/amd64 ok
io.containerd.gc.v1 scheduler - ok
io.containerd.service.v1 introspection-service - ok
io.containerd.service.v1 containers-service - ok
io.containerd.service.v1 content-service - ok
io.containerd.service.v1 diff-service - ok
io.containerd.service.v1 images-service - ok
io.containerd.service.v1 leases-service - ok
io.containerd.service.v1 namespaces-service - ok
io.containerd.service.v1 snapshots-service - ok
io.containerd.runtime.v1 linux linux/amd64 ok
io.containerd.runtime.v2 task linux/amd64 ok
io.containerd.monitor.v1 cgroups linux/amd64 ok
io.containerd.service.v1 tasks-service - ok
io.containerd.internal.v1 restart - ok
io.containerd.grpc.v1 containers - ok
io.containerd.grpc.v1 content - ok
io.containerd.grpc.v1 diff - ok
io.containerd.grpc.v1 events - ok
io.containerd.grpc.v1 healthcheck - ok
io.containerd.grpc.v1 images - ok
io.containerd.grpc.v1 leases - ok
io.containerd.grpc.v1 namespaces - ok
io.containerd.internal.v1 opt - ok
io.containerd.grpc.v1 snapshots - ok
io.containerd.grpc.v1 tasks - ok
io.containerd.grpc.v1 version - ok
io.containerd.grpc.v1 cri linux/amd64 ok
顶级配置块下面的子配置块表示该插件的各种配置,比如 cri 插件下面就分为 containerd
、cni
和 registry
的配置,而 containerd 下面又可以配置各种 runtime,还可以配置默认的 runtime。
镜像加速的配置就在 cri 插件配置块下面的 registry 配置块,所以需要修改的部分如下:
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
- registry.mirrors.”xxx” : 表示需要配置 mirror 的镜像仓库。例如,
registry.mirrors."docker.io"
表示配置 docker.io 的 mirror。 - endpoint : 表示提供 mirror 的镜像加速服务。例如,这里推荐使用西北农林科技大学提供的镜像加速服务作为
docker.io
的 mirror。
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://dockerhub.mirrors.nwafu.edu.cn"]
存储配置
Containerd 有两个不同的存储路径,一个用来保存持久化数据,一个用来保存运行时状态
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
root
用来保存持久化数据,包括 Snapshots
, Content
, Metadata
以及各种插件的数据。每一个插件都有自己单独的目录,Containerd 本身不存储任何数据,它的所有功能都来自于已加载的插件,真是太机智了。
root@i-gqqsb62d:/etc/containerd# tree -L 2 /var/lib/containerd/
/var/lib/containerd/
├── io.containerd.content.v1.content
│ └── ingest
├── io.containerd.grpc.v1.introspection
│ └── uuid
├── io.containerd.metadata.v1.bolt
│ └── meta.db
├── io.containerd.runtime.v1.linux
├── io.containerd.runtime.v2.task
├── io.containerd.snapshotter.v1.aufs
│ └── snapshots
├── io.containerd.snapshotter.v1.btrfs
├── io.containerd.snapshotter.v1.native
│ └── snapshots
├── io.containerd.snapshotter.v1.overlayfs
│ └── snapshots
└── tmpmounts
state
用来保存临时数据,包括 sockets、pid、挂载点、运行时状态以及不需要持久化保存的插件数据。
root@i-gqqsb62d:/etc/containerd# tree -L 2 /run/containerd/
/run/containerd/
├── containerd.sock
├── containerd.sock.ttrpc
├── io.containerd.runtime.v1.linux
└── io.containerd.runtime.v2.task
OOM
还有一项配置需要留意:
Containerd 是容器的守护者,一旦发生内存不足的情况,理想的情况应该是先杀死容器,而不是杀死 Containerd。所以需要调整 Containerd 的 OOM
权重,减少其被 OOM Kill 的几率。最好是将 oom_score
的值调整为比其他守护进程略低的值。这里的 oom_socre 其实对应的是 /proc//oom_socre_adj
,在早期的 Linux 内核版本里使用 oom_adj
来调整权重, 后来改用 oom_socre_adj
了。该文件描述如下:
oom_score = 0
在计算最终的 badness score
时,会在计算结果是中加上 oom_score_adj
,这样用户就可以通过该在值来保护某个进程不被杀死或者每次都杀某个进程。其取值范围为 -1000
到 1000
。
如果将该值设置为 -1000
,则进程永远不会被杀死,因为此时 badness score
永远返回0。
建议 Containerd 将该值设置为 -999
到 0
之间。如果作为 Kubernetes 的 Worker 节点,可以考虑设置为 -999
。
Systemd配置
建议通过 systemd 配置 Containerd 作为守护进程运行,配置文件在上文已经被解压出来了:
cat /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=1048576
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
这里有两个重要的参数:
- Delegate : 这个选项允许 Containerd 以及运行时自己管理自己创建的容器的
cgroups
。如果不设置这个选项,systemd 就会将进程移到自己的cgroups
中,从而导致 Containerd 无法正确获取容器的资源使用情况。 - KillMode : 这个选项用来处理 Containerd 进程被杀死的方式。默认情况下,systemd 会在进程的 cgroup 中查找并杀死 Containerd 的所有子进程,这肯定不是我们想要的。
KillMode
字段可以设置的值如下。
我们需要将 KillMode 的值设置为process
,这样可以确保升级或重启 Containerd 时不杀死现有的容器。- control-group(默认值):当前控制组里面的所有子进程,都会被杀掉
- process:只杀主进程
- mixed:主进程将收到 SIGTERM 信号,子进程收到 SIGKILL 信号
- none:没有进程会被杀掉,只是执行服务的 stop 命令。