Design

image.png

  • Sandbox: 协议栈,可包含多个 Endpoint,可通过 Namespace、Jail 等实现
  • Endpoint: 将 Sandbox 与 Network 连接
  • Network: 可直接通信的 Endpoint 的集合,可使用 Bridge、VLAN 等实现


Docker Architecture

docker-network-arch.svg

Network Controller

docker-network-network-controller.svg

Initialize Network Controllers

Docker Daemon 管理可用的 NetworkController。在启动 Daemon 时,会创建当前操作系统下全部可用的 NetworkController,以 daemon_unix.go 为例,创建了 none、host、bridge 三种模式的网络控制器。

  1. func (daemon *Daemon) initNetworkController(config *config.Config, activeSandboxes map[string]interface{}) (libnetwork.NetworkController, error) {
  2. netOptions, err := daemon.networkOptions(config, daemon.PluginStore, activeSandboxes)
  3. if err != nil {
  4. return nil, err
  5. }
  6. controller, err := libnetwork.New(netOptions...)
  7. if err != nil {
  8. return nil, fmt.Errorf("error obtaining controller instance: %v", err)
  9. }
  10. if len(activeSandboxes) > 0 {
  11. logrus.Info("There are old running containers, the network config will not take affect")
  12. return controller, nil
  13. }
  14. // Initialize default network on "null"
  15. if n, _ := controller.NetworkByName("none"); n == nil {
  16. if _, err := controller.NewNetwork("null", "none", "", libnetwork.NetworkOptionPersist(true)); err != nil {
  17. return nil, fmt.Errorf("Error creating default \"null\" network: %v", err)
  18. }
  19. }
  20. // Initialize default network on "host"
  21. if n, _ := controller.NetworkByName("host"); n == nil {
  22. if _, err := controller.NewNetwork("host", "host", "", libnetwork.NetworkOptionPersist(true)); err != nil {
  23. return nil, fmt.Errorf("Error creating default \"host\" network: %v", err)
  24. }
  25. }
  26. // Clear stale bridge network
  27. if n, err := controller.NetworkByName("bridge"); err == nil {
  28. if err = n.Delete(); err != nil {
  29. return nil, fmt.Errorf("could not delete the default bridge network: %v", err)
  30. }
  31. if len(config.NetworkConfig.DefaultAddressPools.Value()) > 0 && !daemon.configStore.LiveRestoreEnabled {
  32. removeDefaultBridgeInterface()
  33. }
  34. }
  35. if !config.DisableBridge {
  36. // Initialize default driver "bridge"
  37. if err := initBridgeDriver(controller, config); err != nil {
  38. return nil, err
  39. }
  40. } else {
  41. removeDefaultBridgeInterface()
  42. }
  43. // Set HostGatewayIP to the default bridge's IP if it is empty
  44. if daemon.configStore.HostGatewayIP == nil && controller != nil {
  45. if n, err := controller.NetworkByName("bridge"); err == nil {
  46. v4Info, v6Info := n.Info().IpamInfo()
  47. var gateway net.IP
  48. if len(v4Info) > 0 {
  49. gateway = v4Info[0].Gateway.IP
  50. } else if len(v6Info) > 0 {
  51. gateway = v6Info[0].Gateway.IP
  52. }
  53. daemon.configStore.HostGatewayIP = gateway
  54. }
  55. }
  56. return controller, nil
  57. }


NetworkController Implementation

docker-network-network-controller-impl.svg
controller 是 libnetwork 中对 NetworkController 的实现。可以看到,controller 通过驱动表来区分不同类型的网络,使用驱动创建 Network 及 Endpoint,并将 Endpoint 加入 Sandbox 或移除出 Sandbox。
Container 通过 SandboxID 以及 SandboxKey 来找到对应的 Sandbox。Sandbox 可以使用 containerID 来确定是否归属于某个 Container。

OS Layer Sandbox

Namespace

docker-network-sandbox.svg
Sandbox 接口没有列举出全部功能,只是能看出其能力边界的部分功能。后续以 Namespace 方式实现的 Sandbox 为例。
通过上图,并不难看出,路由、接口等功能应该是由 netlink 提供的,Namespace 获取 netlink 方式如下,需要注意,Namespace 内 netlink 配置,仅在 Namespace 内有效。根据 Namespace 获取 netlink 的关键方法如下

  1. func GetFromPath(path string) (NsHandle, error) {
  2. fd, err := syscall.Open(path, syscall.O_RDONLY, 0)
  3. if err != nil {
  4. return -1, err
  5. }
  6. return NsHandle(fd), nil
  7. }

使用返回的 NsHandle 就可以创建具体的 SocketHandle 了,方法如下

  1. func NewHandleAt(ns netns.NsHandle, nlFamilies ...int) (*Handle, error) {
  2. return newHandle(ns, netns.None(), nlFamilies...)
  3. }
  4. // NewHandleAtFrom works as NewHandle but allows client to specify the
  5. // new and the origin netns Handle.
  6. func NewHandleAtFrom(newNs, curNs netns.NsHandle) (*Handle, error) {
  7. return newHandle(newNs, curNs)
  8. }
  9. func newHandle(newNs, curNs netns.NsHandle, nlFamilies ...int) (*Handle, error) {
  10. h := &Handle{sockets: map[int]*nl.SocketHandle{}}
  11. fams := nl.SupportedNlFamilies
  12. if len(nlFamilies) != 0 {
  13. fams = nlFamilies
  14. }
  15. for _, f := range fams {
  16. s, err := nl.GetNetlinkSocketAt(newNs, curNs, f)
  17. if err != nil {
  18. return nil, err
  19. }
  20. h.sockets[f] = &nl.SocketHandle{Socket: s}
  21. }
  22. return h, nil
  23. }

Add Interface

docker-network-add-interface.svg

Bridge Network

docker-network-bridge-overview.svg

Create Network

根据配置文件中 BridgeName 查找系统中已存在的 Link 实例,如果 BridgeName 为空,使用默认网桥 docker0。

  1. func newInterface(nlh *netlink.Handle, config *networkConfiguration) (*bridgeInterface, error) {
  2. var err error
  3. i := &bridgeInterface{nlh: nlh}
  4. // Initialize the bridge name to the default if unspecified.
  5. if config.BridgeName == "" {
  6. config.BridgeName = DefaultBridgeName
  7. }
  8. // Attempt to find an existing bridge named with the specified name.
  9. i.Link, err = nlh.LinkByName(config.BridgeName)
  10. if err != nil {
  11. logrus.Debugf("Did not find any interface with name %s: %v", config.BridgeName, err)
  12. } else if _, ok := i.Link.(*netlink.Bridge); !ok {
  13. return nil, fmt.Errorf("existing interface %s is not a bridge", i.Link.Attrs().Name)
  14. }
  15. return i, nil
  16. }

创建 bridgeNetwork 实例,并存入 networks

  1. // Create and set network handler in driver
  2. network := &bridgeNetwork{
  3. id: config.ID,
  4. endpoints: make(map[string]*bridgeEndpoint),
  5. config: config,
  6. portMapper: portmapper.New(d.config.UserlandProxyPath),
  7. bridge: bridgeIface,
  8. driver: d,
  9. }
  10. d.Lock()
  11. d.networks[config.ID] = network
  12. d.Unlock()

如果获取的 bridgeInterface 中不存在有效网桥设备,则将创建设备、sysctl 方法加入设置队列;如果使用 docker0,仅将 sysctl 方法加入设置队列

  1. bridgeAlreadyExists := bridgeIface.exists()
  2. if !bridgeAlreadyExists {
  3. bridgeSetup.queueStep(setupDevice)
  4. bridgeSetup.queueStep(setupDefaultSysctl)
  5. }
  6. // For the default bridge, set expected sysctls
  7. if config.DefaultBridge {
  8. bridgeSetup.queueStep(setupDefaultSysctl)
  9. }

根据配置文件参数,将对应的设置方法加入设置队列

  1. for _, step := range []struct {
  2. Condition bool
  3. Fn setupStep
  4. }{
  5. // Enable IPv6 on the bridge if required. We do this even for a
  6. // previously existing bridge, as it may be here from a previous
  7. // installation where IPv6 wasn't supported yet and needs to be
  8. // assigned an IPv6 link-local address.
  9. {config.EnableIPv6, setupBridgeIPv6},
  10. // We ensure that the bridge has the expectedIPv4 and IPv6 addresses in
  11. // the case of a previously existing device.
  12. {bridgeAlreadyExists && !config.InhibitIPv4, setupVerifyAndReconcile},
  13. // Enable IPv6 Forwarding
  14. {enableIPv6Forwarding, setupIPv6Forwarding},
  15. // Setup Loopback Addresses Routing
  16. {!d.config.EnableUserlandProxy, setupLoopbackAddressesRouting},
  17. // Setup IPTables.
  18. {d.config.EnableIPTables, network.setupIPTables},
  19. //We want to track firewalld configuration so that
  20. //if it is started/reloaded, the rules can be applied correctly
  21. {d.config.EnableIPTables, network.setupFirewalld},
  22. // Setup DefaultGatewayIPv4
  23. {config.DefaultGatewayIPv4 != nil, setupGatewayIPv4},
  24. // Setup DefaultGatewayIPv6
  25. {config.DefaultGatewayIPv6 != nil, setupGatewayIPv6},
  26. // Add inter-network communication rules.
  27. {d.config.EnableIPTables, setupNetworkIsolationRules},
  28. //Configure bridge networking filtering if ICC is off and IP tables are enabled
  29. {!config.EnableICC && d.config.EnableIPTables, setupBridgeNetFiltering},
  30. } {
  31. if step.Condition {
  32. bridgeSetup.queueStep(step.Fn)
  33. }
  34. }

将设备启动设置方法加入设置队列,并返回执行结果

  1. bridgeSetup.queueStep(setupDeviceUp)
  2. return bridgeSetup.apply()

Setup Device

创建 netlink.Bridge 结构体,LinkAttrs 中使用配置中的 BridgeName,然后,使用 netlink 方法创建网桥设备,如果需要设置 MAC 则随机生成 MAC 地址。

  1. func setupDevice(config *networkConfiguration, i *bridgeInterface) error {
  2. var setMac bool
  3. // We only attempt to create the bridge when the requested device name is
  4. // the default one.
  5. if config.BridgeName != DefaultBridgeName && config.DefaultBridge {
  6. return NonDefaultBridgeExistError(config.BridgeName)
  7. }
  8. // Set the bridgeInterface netlink.Bridge.
  9. i.Link = &netlink.Bridge{
  10. LinkAttrs: netlink.LinkAttrs{
  11. Name: config.BridgeName,
  12. },
  13. }
  14. // Only set the bridge's MAC address if the kernel version is > 3.3, as it
  15. // was not supported before that.
  16. kv, err := kernel.GetKernelVersion()
  17. if err != nil {
  18. logrus.Errorf("Failed to check kernel versions: %v. Will not assign a MAC address to the bridge interface", err)
  19. } else {
  20. setMac = kv.Kernel > 3 || (kv.Kernel == 3 && kv.Major >= 3)
  21. }
  22. if setMac {
  23. hwAddr := netutils.GenerateRandomMAC()
  24. i.Link.Attrs().HardwareAddr = hwAddr
  25. logrus.Debugf("Setting bridge mac address to %s", hwAddr)
  26. }
  27. if err = i.nlh.LinkAdd(i.Link); err != nil {
  28. logrus.Debugf("Failed to create bridge %s via netlink. Trying ioctl", config.BridgeName)
  29. return ioctlCreateBridge(config.BridgeName, setMac)
  30. }
  31. return err
  32. }

Bridge 设备创建、配置等,最终均通过 netlink 接口完成。

Networking Configuration

  • System Control
    • /proc/sys/net/ipv6/conf/BridgeName/accept_ra -> 0:不接受路由建议
    • /proc/sys/net/ipv4/conf/BridgeName/route_localnet -> 1:将外部流量重定向至 loopback,需要配合 iptables 使用


IPTABLES

  • INTERNAL
    • filter
      • DOCKER-ISOLATION-STAGE-1 -i BridgeInterface ! -d Network -j DROP
      • DOCKER-ISOLATION-STAGE-1 -o BridgeInterface ! -s Network -j DROP
  • NON INTERNAL
    • nat
      • DOCKER -t nat -i BridgeInterface -j RETURN
    • filter
      • FORWARD -i BridgeInterface ! -o BridgeInterface -j ACCEPT
    • HOST IP != nil
      • nat
        • POSTROUTING -t nat -s BridgeSubnet ! -o BridgeInterface -j SNAT —to-source HOSTIP
        • POSTROUTING -t nat -m addrtype —src-type LOCAL -o BridgeInterface -j SNAT —to-source HOSTIP
    • HOST IP == nil
      • nat
        • POSTROUTING -t nat -s BridgeSubnet ! -o BridgeInterface -j MASQUERADE
        • POSTROUTING -t nat -m addrtype —src-type LOCAL -o BridgeInterface -j MASQUERADE
    • Inter Container Communication Enabled
      • filter
        • FORWARD -i BridgeInterface -o __BridgeInterface -j ACCEPT
    • Inter Container Communication Disabled
      • filter
        • FORWARD -i BridgeInterface -o __BridgeInterface -j DROP
    • nat
      • PREROUTING -m addrtype —dst-type LOCAL -j DOCKER
      • OUTPUT -m addrtype —dst-type LOCAL -j DOCKER
    • filter
      • FORWARD -o BridgeInterface -j DOCKER
      • FORWARD -o BridgeInterface -m conntrack —ctstate RELATED,ESTABLISHED -j ACCEPT
    • filter
      • -I FORWARD -j DOCKER-ISOLATION-STAGE-1


Create Endpoint

docker-network-bridge-network.svg
全局有一个默认 Bridge 设备 docker0,每个 Container 有自己独立的网络协议栈,容器网络和通过 veth 对与 Bridge 设备互通。
同一节点上不同 Container 间,通过 ARP 协议,即可进行 3 层通信;Container 出 Node 网络可以通过默认网关设备 docker0,再经过 IPTABLES 重定向至 eth0。