namespace是通过在fork新的子进程的时候加入各种namespace参数来进行隔离
cgroup是通过在/sys/fs/cgroup/下的各个subsystem中创建目录,指定配额参数,并将pid放入tasks中就可以生效
但对于rootfs隔离,只知道使用的是pivot_root切换了进程的根文件系统目录,但一直不太明白具体实现原理,挺好奇就研究一下

pivot_root

首先我们来看pivot_root的man手册
http://man7.org/linux/man-pages/man2/pivot_root.2.htmlman7.org

  1. DESCRIPTION
  2. pivot_root() changes the root mount in the mount namespace of the
  3. calling process. More precisely, it moves the root mount to the
  4. directory put_old and makes new_root the new root mount. The calling
  5. process must have the CAP_SYS_ADMIN capability in the user namespace
  6. that owns the caller's mount namespace.
  7. pivot_root() changes the root directory and the current working
  8. directory of each process or thread in the same mount namespace to
  9. new_root if they point to the old root directory. (See also NOTES.)
  10. On the other hand, pivot_root() does not change the caller's current
  11. working directory (unless it is on the old root directory), and thus
  12. it should be followed by a chdir("/") call.
  13. The following restrictions apply:
  14. - new_root and put_old must be directories.
  15. - new_root and put_old must not be on the same mount as the current
  16. root.
  17. - put_old must be at or underneath new_root; that is, adding some
  18. nonnegative number of "/.." prefixes to the pathname pointed to by
  19. put_old must yield the same directory as new_root.
  20. - new_root must be a path to a mount point, but can't be "/". A
  21. path that is not already a mount point can be converted into one
  22. by bind mounting the path onto itself.
  23. - The propagation type of the parent mount of new_root and the
  24. parent mount of the current root directory must not be MS_SHARED;
  25. similarly, if put_old is an existing mount point, its propagation
  26. type must not be MS_SHARED. These restrictions ensure that
  27. pivot_root() never propagates any changes to another mount
  28. namespace.
  29. - The current root directory must be a mount point.
  1. pivot_root改变当前进程所在mount namespace内的所有进程的root mount移到put_old,然后将new_root作为新的root mount;
  2. pivot_root并没有修改当前调用进程的工作目录,通常需要使用chdir(“/“)来实现切换到新的root mount的根目录。

    rount mount可以理解为rootfs,也就是“/”,pivot_root将所在mount namespace的“/”改为了new_root注意,pivotroot没有改变当前调用进程的工作目录注意,pivotroot的调用前提需要明确在fork进程时指定mount namespace参数 主要约束条件:

  3. new_root和put_old都必须是目录

  4. new_root和put_old不在同一个mount namespace中
  5. put_old必须是new_root,或者是new_root的子目录
  6. new_root必须是mount point,且不能是当前mount namespace的“/”

    注意,pivot_root(new_root, put_old),且chdir(“/“)后,put_old是“/”的子目录,可以unmount 在docker中,使用pivot_root实现rootfs切换和隔离,也遵循pivot_root的使用约束

  7. 首先创建一个new_root的临时子目录作为put_old,然后调用pivot_root实现切换

  8. chdir(“/“)
  9. umount put_old and clear

    pivot_root(“.”,”.”)

    在实际使用时,发现一个特殊的case,可以避免使用临时目录就是pivot_root(“.”, “.”)
    pivot_root(“.”, “.”)将当前“/”作为rootfs,将/proc/self/cwd作为put_old,从而避免必须创建子目录,但这时需要先unmount(“.”)再chdir(“/“)
    https://github.com/opencontainers/runc/commit/f8e6b5af5e120ab7599885bd13a932d970ccc748github.com

    chroot vs pivot_root vs switch_root

    chroot只改变当前进程的“/”
    pivot_root改变当前mount namespace的“/”
    switch_root和chroot类似,但是专门用来初始化系统时候使用的(initramfs),不仅会chroot,而且会删除旧根下的所有内容,释放内存,只能由pid=1的进程使用,其他地方用不到