containerd

  • Following slides outline the role containerd plays including what kind of services it provides.

First look at the internals of containerd and runc - 图1
First look at the internals of containerd and runc - 图2

  • Understand what is and isn’t provide inside containerd. This document provide the full scope of the project
  • History background on the reason why networking was left out from containerd
  • containerd-shim – After runc runs the container, it exits (allowing us to not have any long-running processes responsible for our containers). The shim is the component which sits between containerd and runc to facilitate this. Containers does not died when dockerd orcontainerd died as it is ‘attached’ to the containerd-shim process. The containerd-shim process job is to monitor stdin(out) and report back the error code returned from exiting the container
  • Some of containerd Makefile task:
    • make bin/containerd-shim – building the containerd-shim app
  • Following are some explanation about containerd source code:
    • cmd/containerd – contains the containerd daemon source code
    • cmd/containerd-shim – containerd-shim code
    • cmd/containerd-shim-runc-v1 – containerd-shim-v1 code
    • cmd/containerd-shim-runc-v2 – containerd-shim-v2 code
  • Examples how to use containerd

    • Running ubuntu interactively

      • Make sure image is pulled using ‘ctr image pull’
      • Run the following commands to run it interactively and kill

        • sudo ./ctr run -t docker.io/library/ubuntu:latest u2 [ run the image ]
        • sudo ./ctr container info ubuntulatest

          1. {
          2. "ID": "ubuntulatest",
          3. "Labels": {
          4. "io.containerd.image.config.stop-signal": "SIGTERM"
          5. },
          6. "Image": "docker.io/library/ubuntu:latest",
          7. "Runtime": {
          8. "Name": "io.containerd.runc.v2",
          9. "Options": {
          10. "type_url": "containerd.runc.v1.Options"
          11. }
          12. },
          13. "SnapshotKey": "ubuntulatest",
          14. "Snapshotter": "overlayfs",
          15. "CreatedAt": "2020-01-01T00:24:30.509643667Z",
          16. "UpdatedAt": "2020-01-01T00:24:30.509643667Z",
          17. "Extensions": null,
          18. "Spec": {
          19. "ociVersion": "1.0.1-dev",
          20. "process": {
          21. "user": {
          22. "uid": 0,
          23. "gid": 0
          24. },
          25. "args": [
          26. "/bin/bash"
          27. ],
          28. "env": [
          29. "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
          30. ],
          31. "cwd": "/",
          32. "capabilities": {
          33. "bounding": [
          34. "CAP_CHOWN",
          35. "CAP_DAC_OVERRIDE",
          36. "CAP_FSETID",
          37. "CAP_FOWNER",
          38. "CAP_MKNOD",
          39. "CAP_NET_RAW",
          40. "CAP_SETGID",
          41. "CAP_SETUID",
          42. "CAP_SETFCAP",
          43. "CAP_SETPCAP",
          44. "CAP_NET_BIND_SERVICE",
          45. "CAP_SYS_CHROOT",
          46. "CAP_KILL",
          47. "CAP_AUDIT_WRITE"
          48. ],
          49. "effective": [
          50. "CAP_CHOWN",
          51. "CAP_DAC_OVERRIDE",
          52. "CAP_FSETID",
          53. "CAP_FOWNER",
          54. "CAP_MKNOD",
          55. "CAP_NET_RAW",
          56. "CAP_SETGID",
          57. "CAP_SETUID",
          58. "CAP_SETFCAP",
          59. "CAP_SETPCAP",
          60. "CAP_NET_BIND_SERVICE",
          61. "CAP_SYS_CHROOT",
          62. "CAP_KILL",
          63. "CAP_AUDIT_WRITE"
          64. ],
          65. "inheritable": [
          66. "CAP_CHOWN",
          67. "CAP_DAC_OVERRIDE",
          68. "CAP_FSETID",
          69. "CAP_FOWNER",
          70. "CAP_MKNOD",
          71. "CAP_NET_RAW",
          72. "CAP_SETGID",
          73. "CAP_SETUID",
          74. "CAP_SETFCAP",
          75. "CAP_SETPCAP",
          76. "CAP_NET_BIND_SERVICE",
          77. "CAP_SYS_CHROOT",
          78. "CAP_KILL",
          79. "CAP_AUDIT_WRITE"
          80. ],
          81. "permitted": [
          82. "CAP_CHOWN",
          83. "CAP_DAC_OVERRIDE",
          84. "CAP_FSETID",
          85. "CAP_FOWNER",
          86. "CAP_MKNOD",
          87. "CAP_NET_RAW",
          88. "CAP_SETGID",
          89. "CAP_SETUID",
          90. "CAP_SETFCAP",
          91. "CAP_SETPCAP",
          92. "CAP_NET_BIND_SERVICE",
          93. "CAP_SYS_CHROOT",
          94. "CAP_KILL",
          95. "CAP_AUDIT_WRITE"
          96. ]
          97. },
          98. "rlimits": [
          99. {
          100. "type": "RLIMIT_NOFILE",
          101. "hard": 1024,
          102. "soft": 1024
          103. }
          104. ],
          105. "noNewPrivileges": true
          106. },
          107. "root": {
          108. "path": "rootfs"
          109. },
          110. "mounts": [
          111. {
          112. "destination": "/proc",
          113. "type": "proc",
          114. "source": "proc",
          115. "options": [
          116. "nosuid",
          117. "noexec",
          118. "nodev"
          119. ]
          120. },
          121. {
          122. "destination": "/dev",
          123. "type": "tmpfs",
          124. "source": "tmpfs",
          125. "options": [
          126. "nosuid",
          127. "strictatime",
          128. "mode=755",
          129. "size=65536k"
          130. ]
          131. },
          132. {
          133. "destination": "/dev/pts",
          134. "type": "devpts",
          135. "source": "devpts",
          136. "options": [
          137. "nosuid",
          138. "noexec",
          139. "newinstance",
          140. "ptmxmode=0666",
          141. "mode=0620",
          142. "gid=5"
          143. ]
          144. },
          145. {
          146. "destination": "/dev/shm",
          147. "type": "tmpfs",
          148. "source": "shm",
          149. "options": [
          150. "nosuid",
          151. "noexec",
          152. "nodev",
          153. "mode=1777",
          154. "size=65536k"
          155. ]
          156. },
          157. {
          158. "destination": "/dev/mqueue",
          159. "type": "mqueue",
          160. "source": "mqueue",
          161. "options": [
          162. "nosuid",
          163. "noexec",
          164. "nodev"
          165. ]
          166. },
          167. {
          168. "destination": "/sys",
          169. "type": "sysfs",
          170. "source": "sysfs",
          171. "options": [
          172. "nosuid",
          173. "noexec",
          174. "nodev",
          175. "ro"
          176. ]
          177. },
          178. {
          179. "destination": "/run",
          180. "type": "tmpfs",
          181. "source": "tmpfs",
          182. "options": [
          183. "nosuid",
          184. "strictatime",
          185. "mode=755",
          186. "size=65536k"
          187. ]
          188. }
          189. ],
          190. "linux": {
          191. "resources": {
          192. "devices": [
          193. {
          194. "allow": false,
          195. "access": "rwm"
          196. },
          197. {
          198. "allow": true,
          199. "type": "c",
          200. "major": 1,
          201. "minor": 3,
          202. "access": "rwm"
          203. },
          204. {
          205. "allow": true,
          206. "type": "c",
          207. "major": 1,
          208. "minor": 8,
          209. "access": "rwm"
          210. },
          211. {
          212. "allow": true,
          213. "type": "c",
          214. "major": 1,
          215. "minor": 7,
          216. "access": "rwm"
          217. },
          218. {
          219. "allow": true,
          220. "type": "c",
          221. "major": 5,
          222. "minor": 0,
          223. "access": "rwm"
          224. },
          225. {
          226. "allow": true,
          227. "type": "c",
          228. "major": 1,
          229. "minor": 5,
          230. "access": "rwm"
          231. },
          232. {
          233. "allow": true,
          234. "type": "c",
          235. "major": 1,
          236. "minor": 9,
          237. "access": "rwm"
          238. },
          239. {
          240. "allow": true,
          241. "type": "c",
          242. "major": 5,
          243. "minor": 1,
          244. "access": "rwm"
          245. },
          246. {
          247. "allow": true,
          248. "type": "c",
          249. "major": 136,
          250. "access": "rwm"
          251. },
          252. {
          253. "allow": true,
          254. "type": "c",
          255. "major": 5,
          256. "minor": 2,
          257. "access": "rwm"
          258. },
          259. {
          260. "allow": true,
          261. "type": "c",
          262. "major": 10,
          263. "minor": 200,
          264. "access": "rwm"
          265. }
          266. ]
          267. },
          268. "cgroupsPath": "/default/ubuntulatest",
          269. "namespaces": [
          270. {
          271. "type": "pid"
          272. },
          273. {
          274. "type": "ipc"
          275. },
          276. {
          277. "type": "uts"
          278. },
          279. {
          280. "type": "mount"
          281. },
          282. {
          283. "type": "network"
          284. }
          285. ],
          286. "maskedPaths": [
          287. "/proc/acpi",
          288. "/proc/asound",
          289. "/proc/kcore",
          290. "/proc/keys",
          291. "/proc/latency_stats",
          292. "/proc/timer_list",
          293. "/proc/timer_stats",
          294. "/proc/sched_debug",
          295. "/sys/firmware",
          296. "/proc/scsi"
          297. ],
          298. "readonlyPaths": [
          299. "/proc/bus",
          300. "/proc/fs",
          301. "/proc/irq",
          302. "/proc/sys",
          303. "/proc/sysrq-trigger"
          304. ]
          305. }
          306. }
          307. }

        • sudo ./ctr t metrics ubuntulatest

          1. ID TIMESTAMP
          2. ubuntulatest 2020-01-01 00:29:16.295322149 +0000 UTC
          3. METRIC VALUE
          4. memory.usage_in_bytes 1433600
          5. memory.limit_in_bytes 9223372036854771712
          6. memory.stat.cache 0
          7. cpuacct.usage 19273664
          8. cpuacct.usage_percpu [205817 2386548 4771841 148926 5529908 227097 2037362 0 2107441 0 1017213 841511]
          9. pids.current 1
          10. pids.limit 0

        • sudo ./ctr shim –id ubuntulatest state

          1. {
          2. "id": "ubuntulatest",
          3. "bundle": "/run/containerd/io.containerd.runtime.v2.task/default/ubuntulatest",
          4. "pid": 16032,
          5. "status": 2,
          6. "stdin": "/run/containerd/fifo/249557939/ubuntulatest-stdin",
          7. "stdout": "/run/containerd/fifo/249557939/ubuntulatest-stdout",
          8. "stderr": "/run/containerd/fifo/249557939/ubuntulatest-stderr",
          9. "exited_at": "0001-01-01T00:00:00Z"
          10. }
        • sudo ./ctr t kill u2 [ kill the container labelled u2 ]

    • Another example to download hello world

      • sudo ./ctr image pull docker.io/library/hello-world:latest

        1. docker.io/library/hello-world:latest: resolved |++++++++++++++++++++++++++++++++++++++|
        2. index-sha256:4fe721ccc2e8dc7362278a29dc660d833570ec2682f4e4194f4ee23e415e1064: done |++++++++++++++++++++++++++++++++++++++|
        3. manifest-sha256:92c7f9c92844bbbb5d0a101b22f7c2a7949e40f8ea90c8b3bc396879d95e899a: done |++++++++++++++++++++++++++++++++++++++|
        4. layer-sha256:1b930d010525941c1d56ec53b97bd057a67ae1865eebf042686d2a2d18271ced: done |++++++++++++++++++++++++++++++++++++++|
        5. config-sha256:fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e: done |++++++++++++++++++++++++++++++++++++++|
        6. elapsed: 3.7 s total: 4.8 Ki (1.3 KiB/s)
        7. unpacking linux/amd64 sha256:4fe721ccc2e8dc7362278a29dc660d833570ec2682f4e4194f4ee23e415e1064...

      • sudo ./ctr container create docker.io/library/hello-world:latest demo
      • sudo ./ctr task start demo
  • containerd utilise kernel feature called ‘reaper’ to reparent the container proces to the shim First look at the internals of containerd and runc - 图3 Following shows the process structure when running a container using containerd

    1. nanik 7741 3230 0 2019 ? 00:02:54 \_ /usr/libexec/gnome-terminal-server
    2. nanik 7750 7741 0 2019 pts/1 00:00:00 | \_ bash
    3. .....
    4. .....
    5. nanik 19294 7741 0 13:45 pts/5 00:00:00 | \_ bash
    6. root 5123 19294 0 17:51 pts/5 00:00:00 | | \_ sudo ./ctr run -t docker.io/library/ubuntu:latest u13
    7. root 5124 5123 0 17:51 pts/5 00:00:00 | | \_ ./ctr run -t docker.io/library/ubuntu:latest u13
    8. nanik 18313 7741 0 16:11 pts/11 00:00:00 | \_ bash
    9. .....
    10. .....
    11. .....
    12. .....
    13. .....
    14. root 5884 3230 0 17:52 ? 00:00:00 \_ /usr/bin/containerd-shim-runc-v2 -namespace default -id u13 -address /run/containerd/containerd.sock
    15. root 5906 5884 0 17:52 ? 00:00:00 \_ /bin/bash
    16. .....
    17. .....
  • As can be seen containerd uses shim called ‘containerd-shim-run-v2’. Runc has been terminated after running the container and the shim takes over as the parent of the container. Containerd supports shim v2
    First look at the internals of containerd and runc - 图4The shim is executed out-of-process (executed with exec(..)) and the following are used to execute it:

    1. 0 = {string} "-namespace"
    2. 1 = {string} "default"
    3. 2 = {string} "-address"
    4. 3 = {string} "/run/containerd/containerd.sock"
    5. 4 = {string} "-publish-binary"
    6. 5 = {string} "/tmp/___containerd"
    7. 6 = {string} "-id"
    8. 7 = {string} "u8"
    9. 8 = {string} "-debug"
    10. 9 = {string} "start"
  • The /tmp/__containerid contains the containerd executable.
    Comment by Michael Crosby about shim

    1. The shim allows for daemonless containers. It basically sits as the parent of the container's process to facilitate a few things.
    2. First it allows the runtimes, i.e. runc,to exit after it starts the container. This way we don't have to have the long running runtime processes for containers. When you start mysql you should only see the mysql process and the shim.
    3. Second it keeps the STDIO and other fds open for the container incase containerd and/or docker both die. If the shim was not running then the parent side of the pipes or the TTY master would be closed and the container would exit.
    4. Finally it allows the container's exit status to be reported back to a higher level tool like docker without having the be the actual parent of the container's process and do a wait.
  • containerd uses FIFO for reporting event and exit code and also for stdout and stdin

    1. /run/containerd/fifo/195093460/<something_something>_stdout
    2. /run/containerd/fifo/195093460/<something_something>_stdin

    runc

  • A lightweight binary that supports the OCI runtime-spec for running containers. Deals with the low-level interfacing with Linux capabilities like cgroups, namespaces, etc…

  • runc looked for temp directory using “XDG_RUNTIME_DIR” eg:/run/user/1000/runc/
  • runc have heavy dependencies on libcontainer.
  • How ‘runc’ is used/executed inside containerd ?. Following are some explanation:

    • Containerd runs as a server where it receive GRPC command. The ctr CLI tool is the way to send command to run, stop, etc containers in containerd
    • When containerd receive command as such sudo ./ctr run -t docker.io/library/ubuntu:latest u67, it will go through services/tasks/service.go source code to prepare all the necessary data to spin off ‘containerd-shim-runc-v2’ executable.
    • Following is an example of the executable argument prepared when executing ‘containerd-shim-runc-v2’

      1. 0 = {string} "-namespace"
      2. 1 = {string} "default"
      3. 2 = {string} "-address"
      4. 3 = {string} "/run/containerd/containerd.sock"
      5. 4 = {string} "-publish-binary"
      6. 5 = {string} "/tmp/___containerd"
      7. 6 = {string} "-id"
      8. 7 = {string} "u8"
      9. 8 = {string} "-debug"
      10. 9 = {string} "start"
    • Full command used – /usr/bin/containerd-shim-runc-v2 -namespace default -id u67 -address /run/containerd/containerd.sock

    • Following are the log output (debug log were added to trace soure code) when ‘containerd-shim-runc-v2’ is running:

      1. time="2020-01-02T23:38:00.438682328+11:00" level=info msg=setupDumpStacks...
      2. time="2020-01-02T23:38:00.438927931+11:00" level=info msg="calling newServer..."
      3. time="2020-01-02T23:38:00.439033554+11:00" level=info msg="registering ttrpc server"
      4. time="2020-01-02T23:38:00.439097725+11:00" level=info msg="calling serve..."
      5. time="2020-01-02T23:38:00.439198636+11:00" level=info msg="calling handleSignals..."
      6. time="2020-01-02T23:38:00.439220268+11:00" level=info msg="nanik starting signal loop" namespace=default path=/run/containerd/io.containerd.runtime.v2.task/default/u67 pid=17162
      7. time="2020-01-02T23:38:00.439861913+11:00" level=info msg="Create is called inside RegisterTaskService"
      8. time="2020-01-02T23:38:00.439974576+11:00" level=info msg="container NANIK "
      9. time="2020-01-02T23:38:00.440214591+11:00" level=info msg="--- CreateTaskRequest &CreateTaskRequest{ID:u67,Bundle:/run/containerd/io.containerd.runtime.v2.task/default/u67,Rootfs:[&types.Mo
      10. unt{Type:overlay,Source:overlay,Target:,Options:[workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/133/work upperdir=/var/lib/containerd/io.containerd.snapshotter.
      11. v1.overlayfs/snapshots/133/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/4/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/3/fs:/va
      12. r/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs],XXX_unrecognized:[],}],Terminal:true,Stdin:/
      13. run/containerd/fifo/383837931/u67-stdin,Stdout:/run/containerd/fifo/383837931/u67-stdout,Stderr:,Checkpoint:,ParentCheckpoint:,Options:&types1.Any{TypeUrl:containerd.runc.v1.Options,Value:[]
      14. ,XXX_unrecognized:[],},XXX_unrecognized:[],}"
      15. time="2020-01-02T23:38:00.440329033+11:00" level=info msg="--- rootfs /run/containerd/io.containerd.runtime.v2.task/default/u67/rootfs"
      16. time="2020-01-02T23:38:00.440350982+11:00" level=info msg="--- opts.BinaryName "
      17. time="2020-01-02T23:38:00.440367236+11:00" level=info msg="--- opts.Bundle /run/containerd/io.containerd.runtime.v2.task/default/u67"
      18. time="2020-01-02T23:38:00.441364775+11:00" level=info msg="--- calling p.create with context.Background.WithValue(type namespaces.namespaceKey, val default).WithValue(type metadata.mdOutgoin
      19. gKey, val <not Stringer>).WithValue(type ttrpc.metadataKey, val <not Stringer>).WithValue(type shim.OptsKey, val <not Stringer>).WithValue(type log.loggerKey, val <not Stringer>).WithCancel.
      20. WithCancel AND ... "
      21. time="2020-01-02T23:38:00.441809346+11:00" level=info msg="------- inside init.gocontext.Background.WithValue(type namespaces.namespaceKey, val default).WithValue(type metadata.mdOutgoingKey
      22. , val <not Stringer>).WithValue(type ttrpc.metadataKey, val <not Stringer>).WithValue(type shim.OptsKey, val <not Stringer>).WithValue(type log.loggerKey, val <not Stringer>).WithCancel.With
      23. Cancel/run/containerd/io.containerd.runtime.v2.task/default/u67&{<nil> /run/containerd/io.containerd.runtime.v2.task/default/u67/init.pid 0xc000154520 false false false []}" runtime=io.conta
      24. inerd.runc.v2
      25. time="2020-01-02T23:38:00.503607036+11:00" level=info msg="Start is called inside RegisterTaskService"
      26. time="2020-01-02T23:38:00.503640676+11:00" level=info msg="v2/service Start"
      27. ime="2020-01-02T17:33:35.454812846+11:00" level=info msg="v2/service Delete"
    • The final function that will execute ‘runc’ is inside containerd/go-runc/runc.go

      1. func (r *Runc) Create(context context.Context, id, bundle string, opts *CreateOpts) error {}
    • Logging code was added inside the Create(..) function and following is the output:

      1. --- args [create --bundle /run/containerd/io.containerd.runtime.v2.task/default/u67]
      2. --- cmd /home/nanik/AndroidProjects/docker/docker/runc --root /run/containerd/runc/default --log /run/containerd/io.containerd.runtime.v2.task/default/u67/log.json --log-format json create
      3. --bundle /run/containerd/io.containerd.runtime.v2.task/default/u67 --pid-file /run/containerd/io.containerd.runtime.v2.task/default/u67/init.pid --console-socket /tmp/pty415594316/pty.sock u
      4. 67
      5. The command used to execute 'runc' is as follows
      6. "/home/nanik/AndroidProjects/docker/docker/runc --root /run/containerd/runc/default --log /run/containerd/io.containerd.runtime.v2.task/default/u67/log.json --log-format json create --bundle /run/containerd/io.containerd.runtime.v2.task/default/u67 --pid-file /run/containerd/io.containerd.runtime.v2.task/default/u67/init.pid --console-socket /tmp/pty415594316/pty.sock u67"
  • To use runc to see docker containers that are running ``` sudo ./runc —root /run/docker/runtime-runc/moby list

ID PID STATUS BUNDLE CREATED OWNER f182f95645673b94af95495ea4c2a7c0f58dcce523f3d4e7174d7e482e136e08 12212 running /run/containerd/io.containerd.runtime.v1.linux/moby/f182f95645673b94af95495ea4c2a7c0f58dcce523f3d4e7174d7e482e136e08 2020-01-05T21:28:05.371871841Z root

  1. - [Good explanation from here](https://stackoverflow.com/questions/57009928/runc-and-ctr-commands-do-not-show-docker-images-and-containers)

The runtime (runc) uses so-called runtime root directory to store and obtain the information about containers. Under this root directory, runc places sub-directories (one per container), and each of them contains the state.json file, where the container state description resides. The default location for runtime root directory is either /run/runc (for non-rootless containers) or $XDG_RUNTIME_DIR/runc (for rootless containers) - the latter also usually points to somewhere under /run (e.g. /run/user/$UID/runc). When the container engine invokes runc, it may override the default runtime root directory and specify the custom one (—root option of runc). Docker uses this possibility, e.g. on my box, it specifies /run/docker/runtime-runc/moby as the runtime root. That said, to make runc list see your Docker containers, you have to point it to Docker’s runtime root directory by specifying —root option. Also, given that Docker containers are not rootless by default, you will need the appropriate privileges to access the runtime root (e.g. with sudo). So, that’s how this should work: $ docker run -d alpine sleep 1000 4acd4af5ba8da324b7a902618aeb3fd0b8fce39db5285546e1f80169f157fc69 $ sudo runc —root /run/docker/runtime-runc/moby/ list ID PID STATUS BUNDLE CREATED OWNER 4acd4af5ba8da324b7a902618aeb3fd0b8fce39db5285546e1f80169f157fc69 18372 running /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/4acd4af5ba8da324b7a902618aeb3fd0b8fce39db5285546e1f80169f157fc69 2019-07-12T17:33:23.401746168Z root As to images, you can not make runc see them, as it has no notion of image at all - instead, it operates on bundles. Creating the bundle (e.g. based on image) is responsibility of the caller (in your case - containerd).

  1. <a name="wmWi1"></a>
  2. # libcontainer
  3. - Library used inside runc for container operation
  4. - [Explanation about libcontainer, runc and nsenter](https://stackoverflow.com/questions/42696589/libcontainer-runc-and-nsenter-bootstrap/42697174). More [in-depth explanation](https://groups.google.com/a/opencontainers.org/forum/#!msg/dev/CC1XH92oMrE/G1GRnBDGCAAJ)
  5. <a name="hBKqR"></a>
  6. # Docker and lower level
  7. - **Docker CLI (docker) - /usr/bin/docker**<br />Docker is used as a reference to the whole set of docker tools and at the beginning it was a monolith. But now docker-cli is only responsible for user friendly communication with docker.<br />So the command’s like docker build … docker run … are handled by Docker CLI and result in the invocation of dockerd API.<br />
  8. - **Dockerd - /usr/bin/dockerd**<br />The Docker daemon - dockerd listens for Docker API requests and manages host’s Container life-cycles by utilizing contanerd<br />dockerd can listen for Docker Engine API requests via three different types of Socket: unix, tcp, and fd. By default, a unix domain socket is created at /var/run/docker.sock, requiring either root permission, or docker group membership. On Systemd based systems, you can communicate with the daemon via Systemd socket activation, use dockerd -H fd://.<br />There are many configuration options for the daemon, which are worth to check if you work with docker (dockerd).<br />My impression is that dockerd is here to serve all the features of Docker (or Docker EE) platform, while actual container life-cycle management is “outsourced” to containerd. Containerd<br />
  9. - **containerd - /usr/bin/docker-containerd**<br />containerd was introduced in Docker 1.11 and since then took main responsibilty of managing containers life-cycle. containerd is the executor for containers, but has a wider scope than just executing containers. So it also take care of:

Image push and pull Managing of storage Of course executing of Containers by calling runc with the right parameters to run containers… Managing of network primitives for interfaces Management of network namespaces containers to join existing namespaces

  1. - containerd fully leverages the OCI runtime specification1, image format specifications and OCI reference implementation (runc). Because of its massive adoption, containerd is the industry standard for implementing OCI. It is currently available for Linux and Windows.
  2. - **RunC - /usr/bin/docker-runc runc (OCI runtime) can be seen as component of containerd.**<br />runc is a command line client for running applications packaged according to the OCI format and is a compliant implementation of the OCI spec.<br />Containers are configured using bundles. A bundle for a container is a directory that includes a specification file named config.json and a root filesystem. The root filesystem contains the contents of the container.<br />Assuming you have an OCI bundle you can execute the container<br />
  3. - **containerd-ctr - /usr/bin/docker-containerd-ctr (docker-)containerd-ctr**<br />its barebone CLI (ctr) designed specifically for development and debugging purpose for direct communication with containerd. Its included in the releases of containerd. By that less interesting for docker users.<br />
  4. - **containerd-shim - /usr/bin/docker-containerd-shim**<br />The shim allows for daemonless containers. According to Michael Crosby its basically sits as the parent of the containers process to facilitate a few things.

First it allows the runtimes, i.e. runc,to exit after it starts the container. This way we don’t have to have the long running runtime processes for containers. Second it keeps the STDIO and other fds open for the container in case containerd and/or docker both die. If the shim was not running then the parent side of the pipes or the TTY master would be closed and the container would exit. Finally it allows the container’s exit status to be reported back to a higher level tool like docker without having the be the actual parent of the container’s process and do a wait.

  1. - Complete interaction between docker cli, dockerd, containerd, containerd-shim and runc

dockerd is sent POST Containers Create ↳ dockerd finds the requested image ↳ A container object is created and stored for future use ↳ Directories on the file system are setup for use by the container dockerd is sent a POST Containers Start ↳ An OCI spec is created for the container ↳ containerd is contacted to create the container ↳ containerd stores the container spec in a database ↳ containerd is contacted to start the container ↳ containerd creates a task for the container ↳ The task uses a shim to call runc create ↳ containerd starts the task ↳ The task uses the shim to call runc start ↳ The shim / containerd continue to monitor the container until completion

  1. <a name="TBDnj"></a>
  2. # Running container with runc
  3. This following is step-by-step example on how to run OCI compliant image using runc. We going to use docker in this example.
  4. - Checkout the **runc** project from [https://github.com/opencontainers/runc](https://github.com/opencontainers/runc) and build it by running **make**
  5. - Download **exportrootfs.sh** from [https://github.com/estesp/utils/blob/master/exportrootfs.sh](https://github.com/estesp/utils/blob/master/exportrootfs.sh) and add it to your PATH. Make sure follow the instruction inside the script to compile uidmapshift.c and include that too in the PATH
  6. - Make sure you have pull ubuntu:latest image using docker
  7. - Run the image using the docker run command:

docker run -it ubuntu:latest /bin/bash

  1. - <br />
  2. - Get the container id using **docker ps**

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ebfbbecaf715 ubuntu:latest “/bin/bash” 38 minutes ago Up 38 minutes zen_kirch

  1. - <br />
  2. - Create another separate directory and cd into that directory. Run the following command

sudo env “PATH=$PATH” exportrootfs.sh -u 0 -r 65536 ebf

  1. - ebf is the container id shown in the output of **docker ps**
  2. - You will have a **roootfs** directory in your current directory and it will look like the following

rootfs ├── bin ├── boot ├── dev ├── etc ├── home ├── lib ├── lib64 ├── media ├── mnt ├── opt ├── proc ├── root ├── run ├── sbin ├── srv ├── sys ├── tmp ├── usr └── var

  1. - <br />
  2. - Create the runtime spec using the command

runc spec

  1. - You will see a new file called **config.json**
  2. - Open config.json and modify the **args** to the following

“args”: [ “/bin/bash” ],

  1. - Execute the image using the following

sudo env “PATH=$PATH” runc run anycontainername

  1. - You will see bash running

root@runc:/# cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION=”Ubuntu 18.04.3 LTS” root@runc:/#

  1. As can be seen the runc does not know how to pull, prepare, etc the image. It just knows that there is a root fileysystem with the config.json that it needs to run. The ubuntu container ran by the above example does not have network as this will be taken care by some other project and not by runc.<br />runc utilize prestart hooks to run some other application required as part of the setup of the containers, as shown in [here](https://stackoverflow.com/questions/55064917/network-setup-for-rootless-runc-containers). The config.json

. . . “hooks”: { “prestart” : [ { “path” : “/path/to/netns”, “args” : [ “”, “—state-dir”, “/path/to/netns/netns-state” ] } ] }, . . . ``` shows the prestart hook that will be executed to setup the networking state using the netns executable. The netns tool is part of the genuinetools project