Install
安装依赖
为 seccomp 安装依赖:
root@i-gqqsb62d:~# apt install libseccomp-dev libseccomp2 seccomp...root@i-gqqsb62d:~# apt show seccompPackage: seccompVersion: 2.4.3-1ubuntu3.18.04.3Priority: optionalSection: utilsSource: libseccompOrigin: UbuntuMaintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>Original-Maintainer: Kees Cook <kees@debian.org>Bugs: https://bugs.launchpad.net/ubuntu/+filebugInstalled-Size: 26.6 kBDepends: libc6 (>= 2.3.4), libseccomp2 (>= 2.2.1)Suggests: libseccomp-devHomepage: https://github.com/seccomp/libseccompTask: ubuntu-coreSupported: 5yDownload-Size: 5256 BAPT-Manual-Installed: yesAPT-Sources: http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 PackagesDescription: helper tools for high level interface to Linux seccomp filterProvides helper tools for interacting with libseccomp. Currently, onlya single tool exists, providing a way to easily enumerate syscalls acrossthe supported architectures.
seccomp是一种内核中的安全机制,正常情况下,程序可以使用所有的syscall,这是不安全的,比如劫持程序流后通过execve的syscall来getshell.通过seccomp我们可以在程序中禁用掉某些syscall,这样就算劫持了程序流也只能调用部分的syscall了
安装Containerd程序
URL:https://github.com/containerd/containerd
Containerd 提供了两个压缩包,一个叫
containerd-${VERSION}.${OS}-${ARCH}.tar.gz,另一个叫cri-containerd-${VERSION}.${OS}-${ARCH}.tar.gz。其中cri-containerd-${VERSION}.${OS}-${ARCH}.tar.gz包含了所有 Kubernetes 需要的二进制文件。如果你只是本地测试,可以选择前一个压缩包;如果是作为 Kubernetes 的容器运行时,需要选择后一个压缩包。Containerd 是需要调用
runc的,而第一个压缩包是不包含runc二进制文件的,如果你选择第一个压缩包,还需要提前安装 runc。所以我建议直接使用cri-containerd压缩包。
- 下载
# wget https://github.com/containerd/containerd/releases/download/v1.4.4/cri-containerd-cni-1.4.4-linux-amd64.tar.gz
- 查看文件内容
root@i-gqqsb62d:~# ls -ltrtotal 96928-rw-r--r-- 1 root root 99252755 Mar 5 14:03 cri-containerd-cni-1.4.4-linux-amd64.tar.gzroot@i-gqqsb62d:~# tar -tf cri-containerd-cni-1.4.4-linux-amd64.tar.gzetc/etc/systemd/etc/systemd/system/etc/systemd/system/containerd.serviceetc/crictl.yamletc/cni/etc/cni/net.d/etc/cni/net.d/10-containerd-net.conflistusr/usr/local/usr/local/bin/usr/local/bin/containerdusr/local/bin/containerd-shimusr/local/bin/crictlusr/local/bin/containerd-shim-runc-v2usr/local/bin/critestusr/local/bin/containerd-shim-runc-v1usr/local/bin/ctrusr/local/sbin/usr/local/sbin/runcopt/opt/containerd/opt/containerd/cluster/opt/containerd/cluster/gce/opt/containerd/cluster/gce/envopt/containerd/cluster/gce/cni.templateopt/containerd/cluster/gce/configure.shopt/containerd/cluster/gce/cloud-init/opt/containerd/cluster/gce/cloud-init/node.yamlopt/containerd/cluster/gce/cloud-init/master.yamlopt/containerd/cluster/versionopt/cni/opt/cni/bin/opt/cni/bin/bandwidthopt/cni/bin/host-deviceopt/cni/bin/flannelopt/cni/bin/staticopt/cni/bin/loopbackopt/cni/bin/dhcpopt/cni/bin/ptpopt/cni/bin/ipvlanopt/cni/bin/vlanopt/cni/bin/host-localopt/cni/bin/firewallopt/cni/bin/tuningopt/cni/bin/sbropt/cni/bin/bridgeopt/cni/bin/portmapopt/cni/bin/macvlan
- 解压
# tar -C / -zxvf cri-containerd-cni-1.4.4-linux-amd64.tar.gz
- 添加变量到PATH(可选)
将 /usr/local/bin 和 /usr/local/sbin 追加到 ~/.bashrc 文件的 $PATH 环境变量中:
export PATH=$PATH:/usr/local/bin:/usr/local/sbin→ source ~/.bashrc
- 启动
# systemctl start containerd.service# systemctl enable containerd.service# systemctl status containerd.service● containerd.service - containerd container runtimeLoaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: enabled)Active: active (running) since Tue 2021-03-23 13:14:47 CST; 7s agoDocs: https://containerd.ioProcess: 4101 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)Main PID: 4107 (containerd)Tasks: 9CGroup: /system.slice/containerd.service└─4107 /usr/local/bin/containerd
- 查看版本
root@i-gqqsb62d:~/etc/systemd/system# ctr versionClient:Version: v1.4.4Revision: 05f951a3781f4f2c1911b05e61c160e9c30eaa8eGo version: go1.15.8Server:Version: v1.4.4Revision: 05f951a3781f4f2c1911b05e61c160e9c30eaa8eUUID: 063144b5-b13e-429c-ba9c-82bb2ab15aaeroot@i-gqqsb62d:~/etc/systemd/system# systemctl enable containerd.serviceCreated symlink /etc/systemd/system/multi-user.target.wants/containerd.service → /etc/systemd/system/containerd.service.
生成配置文件
Containerd 的默认配置文件为 /etc/containerd/config.toml,我们可以通过命令来生成一个默认的配置:
mkdir /etc/containerdcontainerd config default> /etc/containerd/config.toml
镜像加速
由于某些不可描述的因素,在国内拉取公共镜像仓库的速度是极慢的,为了节约拉取时间,需要为 Containerd 配置镜像仓库的 mirror。Containerd 的镜像仓库 mirror 与 Docker 相比有两个区别:
- Containerd 只支持通过
CRI拉取镜像的 mirror,也就是说,只有通过crictl或者 Kubernetes 调用时 mirror 才会生效,通过ctr拉取是不会生效的。 Docker只支持为Docker Hub配置 mirror,而Containerd支持为任意镜像仓库配置 mirror。
配置镜像加速之前,先来看下 Containerd 的配置结构,乍一看可能会觉得很复杂,复杂就复杂在 plugin 的配置部分
root@i-gqqsb62d:/etc/containerd# cat config.tomlversion = 2root = "/var/lib/containerd"state = "/run/containerd"plugin_dir = ""disabled_plugins = []required_plugins = []oom_score = 0[grpc]address = "/run/containerd/containerd.sock"tcp_address = ""tcp_tls_cert = ""tcp_tls_key = ""uid = 0gid = 0max_recv_message_size = 16777216max_send_message_size = 16777216[ttrpc]address = ""uid = 0gid = 0[debug]address = ""uid = 0gid = 0level = ""[metrics]address = ""grpc_histogram = false[cgroup]path = ""[timeouts]"io.containerd.timeout.shim.cleanup" = "5s""io.containerd.timeout.shim.load" = "5s""io.containerd.timeout.shim.shutdown" = "3s""io.containerd.timeout.task.state" = "2s"[plugins][plugins."io.containerd.gc.v1.scheduler"]pause_threshold = 0.02deletion_threshold = 0mutation_threshold = 100schedule_delay = "0s"startup_delay = "100ms"[plugins."io.containerd.grpc.v1.cri"]disable_tcp_service = truestream_server_address = "127.0.0.1"stream_server_port = "0"stream_idle_timeout = "4h0m0s"enable_selinux = falseselinux_category_range = 1024sandbox_image = "k8s.gcr.io/pause:3.2"stats_collect_period = 10systemd_cgroup = falseenable_tls_streaming = falsemax_container_log_line_size = 16384disable_cgroup = falsedisable_apparmor = falserestrict_oom_score_adj = falsemax_concurrent_downloads = 3disable_proc_mount = falseunset_seccomp_profile = ""tolerate_missing_hugetlb_controller = truedisable_hugetlb_controller = trueignore_image_defined_volumes = false[plugins."io.containerd.grpc.v1.cri".containerd]snapshotter = "overlayfs"default_runtime_name = "runc"no_pivot = falsedisable_snapshot_annotations = truediscard_unpacked_layers = false[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]runtime_type = ""runtime_engine = ""runtime_root = ""privileged_without_host_devices = falsebase_runtime_spec = ""[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]runtime_type = ""runtime_engine = ""runtime_root = ""privileged_without_host_devices = falsebase_runtime_spec = ""[plugins."io.containerd.grpc.v1.cri".containerd.runtimes][plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]runtime_type = "io.containerd.runc.v2"runtime_engine = ""runtime_root = ""privileged_without_host_devices = falsebase_runtime_spec = ""[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options][plugins."io.containerd.grpc.v1.cri".cni]bin_dir = "/opt/cni/bin"conf_dir = "/etc/cni/net.d"max_conf_num = 1conf_template = ""[plugins."io.containerd.grpc.v1.cri".registry][plugins."io.containerd.grpc.v1.cri".registry.mirrors][plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]endpoint = ["https://registry-1.docker.io"][plugins."io.containerd.grpc.v1.cri".image_decryption]key_model = ""[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]tls_cert_file = ""tls_key_file = ""[plugins."io.containerd.internal.v1.opt"]path = "/opt/containerd"[plugins."io.containerd.internal.v1.restart"]interval = "10s"[plugins."io.containerd.metadata.v1.bolt"]content_sharing_policy = "shared"[plugins."io.containerd.monitor.v1.cgroups"]no_prometheus = false[plugins."io.containerd.runtime.v1.linux"]shim = "containerd-shim"runtime = "runc"runtime_root = ""no_shim = falseshim_debug = false[plugins."io.containerd.runtime.v2.task"]platforms = ["linux/amd64"][plugins."io.containerd.service.v1.diff-service"]default = ["walking"][plugins."io.containerd.snapshotter.v1.devmapper"]root_path = ""pool_name = ""base_image_size = ""async_remove = false
每一个顶级配置块的命名都是 plugins."io.containerd.xxx.vx.xxx" 这种形式,其实每一个顶级配置块都代表一个插件,其中 io.containerd.xxx.vx 表示插件的类型,vx 后面的 xxx 表示插件的 ID。可以通过 ctr 一览无余:
root@i-gqqsb62d:/etc/containerd# ctr plugin lsTYPE ID PLATFORMS STATUSio.containerd.content.v1 content - okio.containerd.snapshotter.v1 aufs linux/amd64 okio.containerd.snapshotter.v1 btrfs linux/amd64 errorio.containerd.snapshotter.v1 devmapper linux/amd64 errorio.containerd.snapshotter.v1 native linux/amd64 okio.containerd.snapshotter.v1 overlayfs linux/amd64 okio.containerd.snapshotter.v1 zfs linux/amd64 errorio.containerd.metadata.v1 bolt - okio.containerd.differ.v1 walking linux/amd64 okio.containerd.gc.v1 scheduler - okio.containerd.service.v1 introspection-service - okio.containerd.service.v1 containers-service - okio.containerd.service.v1 content-service - okio.containerd.service.v1 diff-service - okio.containerd.service.v1 images-service - okio.containerd.service.v1 leases-service - okio.containerd.service.v1 namespaces-service - okio.containerd.service.v1 snapshots-service - okio.containerd.runtime.v1 linux linux/amd64 okio.containerd.runtime.v2 task linux/amd64 okio.containerd.monitor.v1 cgroups linux/amd64 okio.containerd.service.v1 tasks-service - okio.containerd.internal.v1 restart - okio.containerd.grpc.v1 containers - okio.containerd.grpc.v1 content - okio.containerd.grpc.v1 diff - okio.containerd.grpc.v1 events - okio.containerd.grpc.v1 healthcheck - okio.containerd.grpc.v1 images - okio.containerd.grpc.v1 leases - okio.containerd.grpc.v1 namespaces - okio.containerd.internal.v1 opt - okio.containerd.grpc.v1 snapshots - okio.containerd.grpc.v1 tasks - okio.containerd.grpc.v1 version - okio.containerd.grpc.v1 cri linux/amd64 ok
顶级配置块下面的子配置块表示该插件的各种配置,比如 cri 插件下面就分为 containerd、cni 和 registry 的配置,而 containerd 下面又可以配置各种 runtime,还可以配置默认的 runtime。
镜像加速的配置就在 cri 插件配置块下面的 registry 配置块,所以需要修改的部分如下:
[plugins."io.containerd.grpc.v1.cri".registry][plugins."io.containerd.grpc.v1.cri".registry.mirrors][plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]endpoint = ["https://registry-1.docker.io"]
- registry.mirrors.”xxx” : 表示需要配置 mirror 的镜像仓库。例如,
registry.mirrors."docker.io"表示配置 docker.io 的 mirror。 - endpoint : 表示提供 mirror 的镜像加速服务。例如,这里推荐使用西北农林科技大学提供的镜像加速服务作为
docker.io的 mirror。
[plugins."io.containerd.grpc.v1.cri".registry][plugins."io.containerd.grpc.v1.cri".registry.mirrors][plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]endpoint = ["https://dockerhub.mirrors.nwafu.edu.cn"]
存储配置
Containerd 有两个不同的存储路径,一个用来保存持久化数据,一个用来保存运行时状态
version = 2root = "/var/lib/containerd"state = "/run/containerd"
root用来保存持久化数据,包括 Snapshots, Content, Metadata 以及各种插件的数据。每一个插件都有自己单独的目录,Containerd 本身不存储任何数据,它的所有功能都来自于已加载的插件,真是太机智了。
root@i-gqqsb62d:/etc/containerd# tree -L 2 /var/lib/containerd//var/lib/containerd/├── io.containerd.content.v1.content│ └── ingest├── io.containerd.grpc.v1.introspection│ └── uuid├── io.containerd.metadata.v1.bolt│ └── meta.db├── io.containerd.runtime.v1.linux├── io.containerd.runtime.v2.task├── io.containerd.snapshotter.v1.aufs│ └── snapshots├── io.containerd.snapshotter.v1.btrfs├── io.containerd.snapshotter.v1.native│ └── snapshots├── io.containerd.snapshotter.v1.overlayfs│ └── snapshots└── tmpmounts
state 用来保存临时数据,包括 sockets、pid、挂载点、运行时状态以及不需要持久化保存的插件数据。
root@i-gqqsb62d:/etc/containerd# tree -L 2 /run/containerd//run/containerd/├── containerd.sock├── containerd.sock.ttrpc├── io.containerd.runtime.v1.linux└── io.containerd.runtime.v2.task
OOM
还有一项配置需要留意:
Containerd 是容器的守护者,一旦发生内存不足的情况,理想的情况应该是先杀死容器,而不是杀死 Containerd。所以需要调整 Containerd 的 OOM 权重,减少其被 OOM Kill 的几率。最好是将 oom_score 的值调整为比其他守护进程略低的值。这里的 oom_socre 其实对应的是 /proc//oom_socre_adj,在早期的 Linux 内核版本里使用 oom_adj 来调整权重, 后来改用 oom_socre_adj 了。该文件描述如下:
oom_score = 0
在计算最终的 badness score 时,会在计算结果是中加上 oom_score_adj ,这样用户就可以通过该在值来保护某个进程不被杀死或者每次都杀某个进程。其取值范围为 -1000 到 1000。
如果将该值设置为 -1000,则进程永远不会被杀死,因为此时 badness score 永远返回0。
建议 Containerd 将该值设置为 -999 到 0 之间。如果作为 Kubernetes 的 Worker 节点,可以考虑设置为 -999。
Systemd配置
建议通过 systemd 配置 Containerd 作为守护进程运行,配置文件在上文已经被解压出来了:
cat /etc/systemd/system/containerd.service[Unit]Description=containerd container runtimeDocumentation=https://containerd.ioAfter=network.target local-fs.target[Service]ExecStartPre=-/sbin/modprobe overlayExecStart=/usr/local/bin/containerdType=notifyDelegate=yesKillMode=processRestart=alwaysRestartSec=5# Having non-zero Limit*s causes performance problems due to accounting overhead# in the kernel. We recommend using cgroups to do container-local accounting.LimitNPROC=infinityLimitCORE=infinityLimitNOFILE=1048576# Comment TasksMax if your systemd version does not supports it.# Only systemd 226 and above support this version.TasksMax=infinityOOMScoreAdjust=-999[Install]WantedBy=multi-user.target
这里有两个重要的参数:
- Delegate : 这个选项允许 Containerd 以及运行时自己管理自己创建的容器的
cgroups。如果不设置这个选项,systemd 就会将进程移到自己的cgroups中,从而导致 Containerd 无法正确获取容器的资源使用情况。 - KillMode : 这个选项用来处理 Containerd 进程被杀死的方式。默认情况下,systemd 会在进程的 cgroup 中查找并杀死 Containerd 的所有子进程,这肯定不是我们想要的。
KillMode字段可以设置的值如下。
我们需要将 KillMode 的值设置为process,这样可以确保升级或重启 Containerd 时不杀死现有的容器。- control-group(默认值):当前控制组里面的所有子进程,都会被杀掉
- process:只杀主进程
- mixed:主进程将收到 SIGTERM 信号,子进程收到 SIGKILL 信号
- none:没有进程会被杀掉,只是执行服务的 stop 命令。
