1.同节点,同Pod,不同容器之间通信

  1. 采用container 模式
  2. 这个模式指定新创建的容器和已经存在的一个容器共享一个 Network Namespace,而不是和宿主机共享。新创建的容器不会创建自己的网卡,配置自己的 IP,而是和一个指定的容器共享 IP、端口范围等。同样,两个容器除了网络方面,其他的如文件系统、进程列表等还是隔离的。# [两个容器的进程可以通过 lo 网卡设备通信。]

2、同节点,不同Pod之间通信

  1. # 同节点不同Pod之间的通信:
  2. /#/ 有一个前提,我们在iconfig中看到的网卡都是在内核级别的,或是说在内核这个层面。
  3. Pod - Pod[Same Node]:通常情况下,在Flannel上解决同节点Pod之间的通信依赖的是Linux Bridge,和我们在Docker中不同的是,在Kubernetes Flannel的环境中使用的Linux Bridgecni0,而不是原来的docker0
  4. 可通过:brctl show查看对应的Linux Bridgebridge nameinterfaces
  5. [root@k8s-1 ~]# brctl show
  6. bridge name bridge id STP enabled interfaces
  7. cni0 8000.1edb12e1c079 no veth54a9e98a # cni0 Linux下的一个虚拟Bridge
  8. docker0 8000.0242da0ca579 no
  9. [root@k8s-1 ~]#
  10. 此时我们看到一端在ROOT NS中的一个接口,而Pod中的eth0中的接口在Pod所在的namespace中。此时两者之间有通信的需求,何种方案比较合适呢?--- #[veth pair]
  11. 其中veth devicepair)的定义为:
  12. # The veth devices are virtual Ethernet devices. They can act as tunnels between network namespacestopo like below:
  13. APP APP
  14. | |
  15. | |
  16. kernel ------------------------------ kernel
  17. | |
  18. network stack network stack
  19. | |
  20. |______________________________|
  21. veth-m veth-n
  22. #
  23. 对于此种模式我们普通Linux中是怎么实现呢?
  24. #5.2.1:创建 namespace
  25. ip netns a ns1
  26. ip netns a ns2
  27. #5.2.2:创建一对 veth-pair veth0 veth1
  28. ip l a veth0 type veth peer name veth1
  29. #5.2.3:将 veth0 veth1 分别加入两个 ns
  30. ip l s veth0 netns ns1
  31. ip l s veth1 netns ns2
  32. #5.2.4:给两个 veth0 veth1 配上 IP 并启用
  33. ip netns exec ns1 ip a a 10.1.1.2/24 dev veth0
  34. ip netns exec ns1 ip l s veth0 up
  35. ip netns exec ns2 ip a a 10.1.1.3/24 dev veth1
  36. ip netns exec ns2 ip l s veth1 up
  37. # veth0 ping veth1
  38. [root@k8s-1 ~]# ip netns exec ns1 ping 10.1.1.3
  39. PING 10.1.1.3 (10.1.1.3) 56(84) bytes of data.
  40. 64 bytes from 10.1.1.3: icmp_seq=1 ttl=64 time=0.060 ms
  41. /
  42. 此时需要弄清楚两个问题:
  43. 1.此时如何知道Pod中的eth0pair是谁?
  44. 2.此时由Pod-1进入内核,如果想要把数据包转发给另外一个Pod-2
  45. #1.使用ethtool -S eth0
  46. [root@k8s-1 ~]# kubectl exec -it cni-59h6g bash
  47. kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
  48. bash-5.1# ethtool -S eth0
  49. NIC statistics:
  50. peer_ifindex: 6 ##此时在Pod中查看peerindex6.我们可以在ROOT NS中查看,ifindex6的网卡。
  51. rx_queue_0_xdp_packets: 0
  52. rx_queue_0_xdp_bytes: 0
  53. rx_queue_0_drops: 0
  54. rx_queue_0_xdp_redirect: 0
  55. rx_queue_0_xdp_drops: 0
  56. rx_queue_0_xdp_tx: 0
  57. rx_queue_0_xdp_tx_errors: 0
  58. tx_queue_0_xdp_xmit: 0
  59. tx_queue_0_xdp_xmit_errors: 0
  60. bash-5.1# exit
  61. exit
  62. [root@k8s-1 ~]# ip a
  63. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
  64. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  65. inet 127.0.0.1/8 scope host lo
  66. valid_lft forever preferred_lft forever
  67. inet6 ::1/128 scope host
  68. valid_lft forever preferred_lft forever
  69. 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
  70. link/ether 00:0c:29:bd:fb:4a brd ff:ff:ff:ff:ff:ff
  71. inet 172.12.1.11/24 brd 172.12.1.255 scope global noprefixroute ens33
  72. valid_lft forever preferred_lft forever
  73. inet6 fe80::e222:32bb:f400:f0c3/64 scope link noprefixroute
  74. valid_lft forever preferred_lft forever
  75. 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
  76. link/ether 02:42:da:0c:a5:79 brd ff:ff:ff:ff:ff:ff
  77. inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
  78. valid_lft forever preferred_lft forever
  79. 4.flannel.ipip: flags=193<UP,RUNNING,NOARP> mtu 1480
  80. inet 10.244.0.0 netmask 255.255.255.255
  81. inet6 fe80::200:5efe:ac0c:10b prefixlen 64 scopeid 0x20<link>
  82. tunnel txqueuelen 0 (IPIP Tunnel)
  83. RX packets 0 bytes 0 (0.0 B)
  84. RX errors 0 dropped 0 overruns 0 frame 0
  85. TX packets 0 bytes 0 (0.0 B)
  86. TX errors 1 dropped 0 overruns 0 carrier 0 collisions 0
  87. 5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue state UP group default qlen 1000
  88. link/ether 1e:db:12:e1:c0:79 brd ff:ff:ff:ff:ff:ff
  89. inet 10.244.0.1/24 brd 10.244.0.255 scope global cni0
  90. valid_lft forever preferred_lft forever
  91. inet6 fe80::1cdb:12ff:fee1:c079/64 scope link
  92. valid_lft forever preferred_lft forever
  93. 6: veth54a9e98a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue master cni0 state UP group default # 这里的ifindex6.
  94. link/ether 66:c5:ab:c8:03:4e brd ff:ff:ff:ff:ff:ff link-netnsid 0
  95. inet6 fe80::64c5:abff:fec8:34e/64 scope link
  96. valid_lft forever preferred_lft forever
  97. [root@k8s-1 ~]#
  98. [root@k8s-1 ~]# brctl show
  99. bridge name bridge id STP enabled interfaces
  100. cni0 8000.1edb12e1c079 no veth54a9e98a # 此时该接口是在cni0这个bridge上。
  101. docker0 8000.0242da0ca579 no
  102. [root@k8s-1 ~]#
  103. 下边以两个在同一个节点上的两个Pod的情况分析:
  104. # network topo:
  105. 10.244.1.10 10.244.1.7
  106. [ns1] [ns2]
  107. | |
  108. -- [cni0] --
  109. #
  110. [root@k8s-1 ~]# kubectl get pods -o wide | grep k8s-2
  111. cc 1/1 Running 0 32h 10.244.1.10 k8s-2 <none> <none>
  112. cni-svtwf 1/1 Running 1 109d 10.244.1.7 k8s-2 <none> <none>
  113. [root@k8s-1 ~]#
  114. pod cni-svtwf中去ping cc这个pod。抓包显示为:
  115. [root@k8s-1 ~]# kubectl exec -it cni-svtwf bash
  116. kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
  117. bash-5.1# tcpdump -n -e -i eth0
  118. tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
  119. listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
  120. 14:31:06.751041 9a:16:45:30:3a:c6 > ee:6f:75:01:ed:bb, ethertype IPv4 (0x0800), length 98: 10.244.1.7 > 10.244.1.10: ICMP echo request, id 17920, seq 0, length 64
  121. 14:31:06.751096 ee:6f:75:01:ed:bb > 9a:16:45:30:3a:c6, ethertype IPv4 (0x0800), length 98: 10.244.1.10 > 10.244.1.7: ICMP echo reply, id 17920, seq 0, length 64
  122. ^C
  123. #从抓包可以看出,cni-svtwf eth0的网卡的MAC地址为:9a:16:45:30:3a:c6
  124. cc eth0的网卡的MAC地址为:ee:6f:75:01:ed:bb
  125. 此时我们在cni0 bridge中查看MAC地址表:
  126. [root@k8s-2 ~]# brctl showmacs cni0
  127. port no mac addr is local? ageing timer
  128. 1 1e:89:b9:2c:44:b1 yes 0.00
  129. 1 1e:89:b9:2c:44:b1 yes 0.00
  130. 3 32:2e:01:1d:a1:53 yes 0.00
  131. 3 32:2e:01:1d:a1:53 yes 0.00
  132. 2 4a:bc:c1:08:30:04 no 1.70
  133. 2 5e:14:4c:e2:2b:22 yes 0.00
  134. 2 5e:14:4c:e2:2b:22 yes 0.00
  135. 3 9a:16:45:30:3a:c6 no 32.42 # 此地址对应 cni-svtwfeht0 MAC地址,对应bridge上的端口3.
  136. 4 b2:8e:21:90:45:39 yes 0.00
  137. 4 b2:8e:21:90:45:39 yes 0.00
  138. 1 ce:22:d7:ee:59:7d no 1.70
  139. 4 ee:6f:75:01:ed:bb no 32.42 # 此地址对应cc eth0 MAC地址,对应bridge上的端口4.
  140. [root@k8s-2 ~]#
  141. #此种Bridge模式把相应的peer建立在podbridge之间。此时Linux中又是如何实现呢:
  142. # network topo
  143. 10.1.1.2 10.1.1.3
  144. [ns1] [ns2]
  145. | |
  146. -- [br0] --
  147. #
  148. #创建ns
  149. ip netns a ns1
  150. ip netns a ns2
  151. #首先创建 bridge br0
  152. ip l a br0 type bridge
  153. ip l s br0 up
  154. #然后创建两对 veth-pair
  155. ip l a veth0 type veth peer name br-veth0
  156. ip l a veth1 type veth peer name br-veth1
  157. #分别将两对 veth-pair 加入两个 ns br0
  158. ip l s veth0 netns ns1
  159. ip l s br-veth0 master br0
  160. ip l s br-veth0 up
  161. ip l s veth1 netns ns2
  162. ip l s br-veth1 master br0
  163. ip l s br-veth1 up
  164. #给两个 ns 中的 veth 配置 IP 并启用
  165. ip netns exec ns1 ip a a 10.1.1.2/24 dev veth0
  166. ip netns exec ns1 ip l s veth0 up
  167. ip netns exec ns2 ip a a 10.1.1.3/24 dev veth1
  168. ip netns exec ns2 ip l s veth1 up
  169. # ping 测:
  170. [root@k8s-2 ~]# ip netns exec ns1 ping 10.1.1.3
  171. PING 10.1.1.3 (10.1.1.3) 56(84) bytes of data.
  172. 64 bytes from 10.1.1.3: icmp_seq=1 ttl=64 time=0.173 ms
  173. 64 bytes from 10.1.1.3: icmp_seq=2 ttl=64 time=0.068 ms
  174. ^C
  175. --- 10.1.1.3 ping statistics ---
  176. 2 packets transmitted, 2 received, 0% packet loss, time 1020ms
  177. rtt min/avg/max/mdev = 0.068/0.120/0.173/0.053 ms
  178. # 查看bridge br0上的端口:
  179. [root@k8s-2 ~]# brctl show
  180. bridge name bridge id STP enabled interfaces
  181. br0 8000.8e495ef498dd no br-veth0
  182. br-veth1
  183. # 查看对应的MAC地址和不同ns中的MAC
  184. [root@k8s-2 ~]# brctl showmacs br0
  185. port no mac addr is local? ageing timer
  186. 2 06:b5:78:d9:7b:75 no 16.28 # ns2
  187. 1 1a:7e:30:46:72:11 no 12.18 # ns1
  188. 1 8e:49:5e:f4:98:dd yes 0.00
  189. 1 8e:49:5e:f4:98:dd yes 0.00
  190. 2 ca:b6:7e:6:c2:bf yes 0.00
  191. 2 ca:b6:7e:06:c2:bf yes 0.00
  192. [root@k8s-2 ~]#

3.Linux 内核转发

  1. "IP forwarding" is a synonym for "routing." It is called "kernel IP forwarding" because it is a feature of the Linux kernel.
  2. A router has multiple network interfaces. If traffic comes in on one interface that matches a subnet of another network interface, a router then forwards that traffic to the other network interface.
  3. # 重点看这里:
  4. # So, let's say you have two NICs, one (NIC 1) is at address 192.168.2.1/24, and the other (NIC 2) is 192.168.3.1/24. If forwarding is enabled, and a packet comes in on NIC 1 with a "destination address" of 192.168.3.8, the router will resend that packet out of the NIC 2.
  5. It's common for routers functioning as gateways to the Internet to have a default route whereby any traffic that doesn't match any NICs will go through the default route's NIC. So in the above example, if you have an internet connection on NIC 2, you'd set NIC 2 as your default route and then any traffic coming in from NIC 1 that isn't destined for something on 192.168.2.0/24 will go through NIC 2. Hopefully there's other routers past NIC 2 that can further route it (in the case of the Internet, the next hop would be your ISP's router, and then their providers upstream router, etc.)
  6. Enabling ip_forward tells your Linux system to do this. For it to be meaningful, you need two network interfaces (any 2 or more of wired NIC cards, Wifi cards or chipsets, PPP links over a 56k modem or serial, etc.).
  7. When doing routing, security is important and that's where Linux's packet filter, iptables, gets involved. So you will need an iptables configuration consistent with your needs.
  8. Note that enabling forwarding with iptables disabled and/or without taking firewalling and security into account could leave you open to vulnerabilites if one of the NICs is facing the Internet or a subnet you don't have control over.


05- Kubernetes-Flannel IPIP Mode - 图1
image.png

  1. # topo:
  2. v1 v2
  3. 10.1.1.2 10.1.2.2
  4. [ns1] [ns2]
  5. | |
  6. - [kernel] - # echo 1 > /proc/sys/net/ipv4/ip_forward # 开启内核转发
  7. v1_r| |v2_r
  8. 10.1.1.1 10.1.2.1
IP Forward 转发配置逻辑:
#
1.创建两个 namespace:
ip netns add ns1
ip netns add ns2
2.创建两对 veth-pair,一端分别挂在两个 namespace 中:
ip link add v1 type veth peer name v1_r
ip link add v2 type veth peer name v2_r
ip link set v1 netns ns1
ip link set v2 netns ns2
3.分别给两对 veth-pair 端点配上 IP 并启用:
ip a a 10.1.1.1/24 dev v1_r
ip l s v1_r up
ip a a 10.1.2.1/24 dev v2_r
ip l s v2_r up
ip netns exec ns1 ip a a 10.1.1.2/24 dev v1
ip netns exec ns1 ip l s v1 up
ip netns exec ns2 ip a a 10.1.2.2/24 dev v2
ip netns exec ns2 ip l s v2 up
4.添加路由:
ip netns exec ns1 route add -net 10.1.2.0 netmask 255.255.255.0 gw 10.1.1.1  # 注意和Docker逻辑中添加默认路由方式思考对比.(Docker 环境会自动添加默认路由)
ip netns exec ns2 route add -net 10.1.1.0 netmask 255.255.255.0 gw 10.1.2.1  # 注意和Docker逻辑中添加默认路由方式思考对比
5.修改内核转发:
echo 1 > /proc/sys/net/ipv4/ip_forward
#路由表:
[root@k8s-1 ~]# ip netns exec ns1 route -n 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 v1
10.1.2.0        10.1.1.1        255.255.255.0   UG    0      0        0 v1  # 匹配该路由
[root@k8s-1 ~]# 
# 抓包:
[root@k8s-1 ~]# ip netns exec ns1 ping 10.1.2.2
PING 10.1.2.2 (10.1.2.2) 56(84) bytes of data.
64 bytes from 10.1.2.2: icmp_seq=1 ttl=63 time=0.087 ms
^C
--- 10.1.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.087/0.087/0.087/0.000 ms
[root@k8s-1 ~]# 

# 抓包详情:
 ip netns exec ns1 arp -d 10.1.1.1 # 清空对应gateway的arp缓存
[root@k8s-1 ~]# ip netns exec ns1 tcpdump  -ne -i v1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on v1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:13:03.636646 26:e9:e2:4b:ff:e9 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 10.1.1.1 tell 10.1.1.2, length 28
14:13:03.636673 42:58:a1:ed:f4:d3 > 26:e9:e2:4b:ff:e9, ethertype ARP (0x0806), length 42: Reply 10.1.1.1 is-at 42:58:a1:ed:f4:d3, length 28
14:13:03.636676 26:e9:e2:4b:ff:e9 > 42:58:a1:ed:f4:d3, ethertype IPv4 (0x0800), length 98: 10.1.1.2 > 10.1.2.2: ICMP echo request, id 42491, seq 1, length 64
14:13:03.636717 42:58:a1:ed:f4:d3 > 26:e9:e2:4b:ff:e9, ethertype IPv4 (0x0800), length 98: 10.1.2.2 > 10.1.1.2: ICMP echo reply, id 42491, seq 1, length 64
14:13:08.935762 42:58:a1:ed:f4:d3 > 26:e9:e2:4b:ff:e9, ethertype ARP (0x0806), length 42: Request who-has 10.1.1.2 tell 10.1.1.1, length 28
14:13:08.935820 26:e9:e2:4b:ff:e9 > 42:58:a1:ed:f4:d3, ethertype ARP (0x0806), length 42: Reply 10.1.1.2 is-at 26:e9:e2:4b:ff:e9, length 28

4、DIY IPIP Tunnel

Tunneling is a way to transform data frames to allow them pass networks with incompatible address spaces or even incompatible protocols. There are different kinds of tunnels: some process only IPv4 packets and some can carry any type of frame. Linux kernel supports 3 tunnel types: IPIP (IPv4 in IPv4), GRE (IPv4/IPv6 over IPv4) and SIT (IPv6 over IPv4). Tunnels are managed with ip program, part of Iproute2:
 $ /sbin/ip tunnel help
 Usage: ip tunnel { add | change | del | show } [ NAME ]
           [ mode { ipip | gre | sit } ] [ remote ADDR ] [ local ADDR ]
           [ [i|o]seq ] [ [i|o]key KEY ] [ [i|o]csum ]
           [ ttl TTL ] [ tos TOS ] [ [no]pmtudisc ] [ dev PHYS_DEV ]

 Where: NAME := STRING
        ADDR := { IP_ADDRESS | any }
        TOS  := { NUMBER | inherit }
        TTL  := { 1..255 | inherit }
        KEY  := { DOTTED_QUAD | NUMBER }

# iproute2 is usually shipped with documentation, of which you need the file ip-tunnels.ps to learn about tunnel management. In Fedora Core 4 it is /usr/share/doc/iproute-2.6.11/ip-tunnels.ps.

#IPIP tunnels
IPIP kind of tunnels is the simplest one. It has the lowest overhead, but can incapsulate only IPv4 unicast traffic, so you will not be able to setup OSPF, RIP or any other multicast-based protocol. You can setup only one tunnel for unique tunnel endpoints pair. It can work with FreeBSD and cisco IOS. Kernel module is 'ipip'. The following example demonstrates configuration of IPIP tunnel with four IPv4 routes, manually or via /etc/net.


# IPIP 包格式:
No.     Time           Source                Destination           Protocol Length Info
1       0.000000       10.10.100.10          10.10.200.10          ICMP     118    Echo (ping) request  id=0x8578, seq=1/256, ttl=64 (reply in 2)

Frame 1: 118 bytes on wire (944 bits), 118 bytes captured (944 bits)
Ethernet II, Src: 2a:32:a4:68:2d:fb (2a:32:a4:68:2d:fb), Dst: 52:0c:6d:7a:80:12 (52:0c:6d:7a:80:12)  # Outer_MAC
Internet Protocol Version 4, Src: 10.10.10.2, Dst: 10.10.20.2         # Outer_IP
Internet Protocol Version 4, Src: 10.10.100.10, Dst: 10.10.200.10     # Inner_IP
Internet Control Message Protocol                                     # No Inner MAC

05- Kubernetes-Flannel IPIP Mode - 图3

# IPIP Tunnel 配置思路:
# topo:
              v1         v2
         v10.10.10.2  10.10.20.2
            [ns1]      [ns2]
              |          |
       10.10.100.10   10.20.200.10
        tun1 ---[kernel]---tun2     # echo 1 > /proc/sys/net/ipv4/ip_forward    # 开启内核转发
          v1_r|          |v2_r
         10.10.10.1  10.10.20.1

                  /\
                  ||  # 上边topo转化一下到下边的topo
                  \/
        tun1                tun2
    10.10.100.10         10.10.200.10
         |                   |
 10.10.10.2[ens33]     10.10.20.2[ens33]
         |                   |
    10.10.10.1[gw]------10.10.20.1[gw]
# 这里主要是需要使用隔离的方式实现,所以需要创建不同的NS,通过不同的NS来模式不同的HOST。实际环境也可以使用真正的不同的HOST来实现该实例。
# 配置思路:
1.创建ns
ip netns add ns1
ip netns add ns2
2.添加虚拟网卡对
ip link add v1 type veth peer name v1_r
ip link add v2 type veth peer name v2_r
ip link set v1 netns ns1
ip link set v2 netns ns2
3.添加地址
ip a a 10.10.10.1/24 dev v1_r
ip l s v1_r up
ip a a 10.10.20.1/24 dev v2_r
ip l s v2_r up
ip netns exec ns1 ip a a 10.10.10.2/24 dev v1
ip netns exec ns1 ip l s v1 up
ip netns exec ns2 ip a a 10.10.20.2/24 dev v2
ip netns exec ns2 ip l s v2 up
4.添加路由
ip netns exec ns1 route add -net 10.10.20.0 netmask 255.255.255.0 gw 10.10.10.1
ip netns exec ns2 route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.20.1
5.添加ip转发
echo 1 > /proc/sys/net/ipv4/ip_forward
6.在 ns1 上创建 tunl1 和 IPIP tunnel
ip netns exec ns1 ip tunnel add tunl1 mode ipip remote 10.10.20.2 local 10.10.10.2
ip netns exec ns1 ip l s tunl1 up
ip netns exec ns1 ip a a 10.10.100.10 peer 10.10.200.10 dev tunl1
7.在 ns2 上创建 tunl1 和 IPIP tunnel
ip netns exec ns2 ip tunnel add tunl2 mode ipip remote 10.10.10.2 local 10.10.20.2
ip netns exec ns2 ip l s tunl2 up
ip netns exec ns2 ip a a 10.10.200.10 peer 10.10.100.10 dev tunl2

# 数据包转发逻辑:[10.10.100.10 ping 10.10.200.10]
[root@k8s-2 ~]# ip netns exec ns1 ping 10.10.200.10
PING 10.10.200.10 (10.10.200.10) 56(84) bytes of data.
64 bytes from 10.10.200.10: icmp_seq=1 ttl=64 time=0.065 ms
^C
--- 10.10.200.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.065/0.065/0.065/0.000 ms
[root@k8s-2 ~]# 
# 转发逻辑:
# 1.在ns1中查看:
 [root@k8s-2 ~]# ip netns exec ns1 ifconfig 
tunl1: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1480                         # tunl1的信息,ip,mac
        inet 10.10.100.10  netmask 255.255.255.255  destination 10.10.200.10
        inet6 fe80::5efe:a0a:a02  prefixlen 64  scopeid 0x20<link>
        tunnel   txqueuelen 1000  (IPIP Tunnel)
        RX packets 3  bytes 252 (252.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 252 (252.0 B)
        TX errors 8  dropped 0 overruns 0  carrier 0  collisions 0

v1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500                         # v1的信息,ip,mac  
        inet 10.10.10.2  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::2832:a4ff:fe68:2dfb  prefixlen 64  scopeid 0x20<link>
        ether 2a:32:a4:68:2d:fb  txqueuelen 1000  (Ethernet)
        RX packets 19  bytes 1458 (1.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19  bytes 1458 (1.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@k8s-2 ~]# ip netns exec ns1 route -n         # 查询当前ns1的路由表
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.10.10.0      0.0.0.0         255.255.255.0   U     0      0        0 v1
10.10.20.0      10.10.10.1      255.255.255.0   UG    0      0        0 v1
10.10.200.10    0.0.0.0         255.255.255.255 UH    0      0        0 tunl1
[root@k8s-2 ~]# 
# 此时由于目的地址10.10.200.10 的地址和10.10.100.10的地址并不在同一个网段,所以需要查询路由表:
所以匹配到:
10.10.200.10    0.0.0.0         255.255.255.255 UH    0      0        0 tunl1 # 发往10.10.200.10的网络需要走出接口地址为tunl1
[root@k8s-2 ~]# ip netns exec ns1 tcpdump -ne -i tunl1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl1, link-type RAW (Raw IP), capture size 262144 bytes  # 通过抓包我们看到这里,是RAW IP Data。也就意味着,我们此时不需要解析D_MAC,这点重要。
15:35:10.629336 ip: 10.10.100.10 > 10.10.200.10: ICMP echo request, id 13981, seq 1, length 64
15:35:10.629439 ip: 10.10.200.10 > 10.10.100.10: ICMP echo reply, id 13981, seq 1, length 64
# 加之tunl1是一个Tunnel设备,所以会对当前数据包再进程IPIP模式封装。

所以封装的报文格式为:[因为外层的数据包S_IP和D_IP不在同网段,所以需要走三层网关,路由查询]
# Outer_S_IP:10.10.10.2   Outer_D_IP:10.10.20.2   Outer_S_MAC:$(10.10.10.2)  Outer_D_MAC:$(10.10.10.1)
# Inner_S_IP:10.10.100.10 Inner_D_IP: 10.10.200.10 Inner_S_MAC:     //        Inner_D_MAC:     //
查询路由表:
[root@k8s-2 ~]# ip netns exec ns1 route -n 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.10.10.0      0.0.0.0         255.255.255.0   U     0      0        0 v1       
10.10.20.0      10.10.10.1      255.255.255.0   UG    0      0        0 v1  # 匹配到这条路由信息,此时送到10.10.20.2对应的gw10.10.10.1。出接口是v1,对应S_MAC的接口
10.10.200.10    0.0.0.0         255.255.255.255 UH    0      0        0 tunl1
[root@k8s-2 ~]# 

各接口的IP和MAC信息:
[root@k8s-2 ~]# ip netns exec ns1 ifconfig 
tunl1: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1480
        inet 10.10.100.10  netmask 255.255.255.255  destination 10.10.200.10
        inet6 fe80::5efe:a0a:a02  prefixlen 64  scopeid 0x20<link>
        tunnel   txqueuelen 1000  (IPIP Tunnel)                    
        RX packets 5  bytes 420 (420.0 B)                                      # 无 MAC地址
        RX errors 0  dropped 0  overruns 0  frame 0 
        TX packets 5  bytes 420 (420.0 B)
        TX errors 10  dropped 0 overruns 0  carrier 0  collisions 0

v1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.2  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::2832:a4ff:fe68:2dfb  prefixlen 64  scopeid 0x20<link>
        ether 2a:32:a4:68:2d:fb  txqueuelen 1000  (Ethernet)
        RX packets 26  bytes 1988 (1.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 25  bytes 1918 (1.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@k8s-2 ~]# ifconfig 
v1_r: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::500c:6dff:fe7a:8012  prefixlen 64  scopeid 0x20<link>
        ether 52:0c:6d:7a:80:12  txqueuelen 1000  (Ethernet)
        RX packets 25  bytes 1918 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 26  bytes 1988 (1.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# 转包信息:
[root@k8s-2 ~]# ip netns exec ns1 tcpdump -ne -i v1 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on v1, link-type EN10MB (Ethernet), capture size 262144 bytes
15:54:19.032974 2a:32:a4:68:2d:fb > 52:0c:6d:7a:80:12, ethertype IPv4 (0x0800), length 118: 10.10.10.2 > 10.10.20.2: 10.10.100.10 > 10.10.200.10: ICMP echo request, id 21857, seq 1, length 64 (ipip-proto-4)
15:54:19.033105 52:0c:6d:7a:80:12 > 2a:32:a4:68:2d:fb, ethertype IPv4 (0x0800), length 118: 10.10.20.2 > 10.10.10.2: 10.10.200.10 > 10.10.100.10: ICMP echo reply, id 21857, seq 1, length 64 (ipip-proto-4)
# Outer_S_IP:10.10.10.2      Outer_D_IP:10.10.20.2      Outer_S_MAC:2a:32:a4:68:2d:fb    Outer_D_MAC:52:0c:6d:7a:80:12
# Inner_S_IP:10.10.100.10    Inner_D_IP: 10.10.200.10    Inner_S_MAC:     //              Inner_D_MAC:     //
# 抓包可从这里获取:

5、Flannel之IPIP Backend

#
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "ipip"
      }
    }
---
#


Flannel中是如何去调用IPIP模块去实现Overlay网络的呢?这里主要模拟跨节点通信的原理:

env:
[root@k8s-1 ~]# kubectl get pods -o wide 
NAME        READY   STATUS    RESTARTS   AGE    IP           NODE    NOMINATED NODE   READINESS GATES
cni-8v2vj   1/1     Running   1          120d   10.244.0.3   k8s-1   <none>           <none>
cni-w8b7p   1/1     Running   1          120d   10.244.1.7   k8s-2   <none>           <none>
[root@k8s-1 ~]# 
[root@k8s-1 ~]# kubectl exec -it cni-8v2vj  -- ifconfig                      # k8s-1 的Pod信息
eth0      Link encap:Ethernet  HWaddr 26:32:89:A6:66:6E  
          inet addr:10.244.0.3  Bcast:10.244.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1480  Metric:1
          RX packets:13 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1178 (1.1 KiB)  TX bytes:42 (42.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

[root@k8s-1 ~]# 
[root@k8s-1 ~]# ifconfig                     # k8s-1节点信息
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1480
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::45:21ff:fe42:c49d  prefixlen 64  scopeid 0x20<link>
        ether 02:45:21:42:c4:9d  txqueuelen 1000  (Ethernet)
        RX packets 1  bytes 28 (28.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 9  bytes 854 (854.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:4c:45:1b:a0  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.12.1.11  netmask 255.255.255.0  broadcast 172.12.1.255
        inet6 fe80::e222:32bb:f400:f0c3  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:bd:fb:4a  txqueuelen 1000  (Ethernet)
        RX packets 2293  bytes 293007 (286.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2687  bytes 1106994 (1.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.ipip: flags=193<UP,RUNNING,NOARP>  mtu 1480
        inet 10.244.0.0  netmask 255.255.255.255
        inet6 fe80::200:5efe:ac0c:10b  prefixlen 64  scopeid 0x20<link>
        tunnel   txqueuelen 0  (IPIP Tunnel)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 8  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 120749  bytes 22923848 (21.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 120749  bytes 22923848 (21.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth5b471bd4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1480
        inet6 fe80::5877:c0ff:fe36:b2f3  prefixlen 64  scopeid 0x20<link>
        ether 5a:77:c0:36:b2:f3  txqueuelen 0  (Ethernet)
        RX packets 1  bytes 42 (42.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 13  bytes 1178 (1.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@k8s-1 ~]# 


###########################################################

[root@k8s-1 ~]# kubectl exec -it cni-w8b7p  -- ifconfig 
eth0      Link encap:Ethernet  HWaddr F2:22:7C:58:D4:DB  
          inet addr:10.244.1.7  Bcast:10.244.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1480  Metric:1
          RX packets:16 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1304 (1.2 KiB)  TX bytes:42 (42.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

[root@k8s-1 ~]# 

[root@k8s-2 ~]# ifconfig 
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1480
        inet 10.244.1.1  netmask 255.255.255.0  broadcast 10.244.1.255
        inet6 fe80::2842:51ff:fec2:2085  prefixlen 64  scopeid 0x20<link>
        ether 2a:42:51:c2:20:85  txqueuelen 1000  (Ethernet)
        RX packets 1734  bytes 141204 (137.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1766  bytes 170858 (166.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:09:5d:0c:7c  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.12.1.12  netmask 255.255.255.0  broadcast 172.12.1.255
        inet6 fe80::a9cd:74a4:47fc:9fec  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:e2:bf:86  txqueuelen 1000  (Ethernet)
        RX packets 2642  bytes 1132782 (1.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2132  bytes 281910 (275.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.ipip: flags=193<UP,RUNNING,NOARP>  mtu 1480
        inet 10.244.1.0  netmask 255.255.255.255
        inet6 fe80::200:5efe:ac0c:10c  prefixlen 64  scopeid 0x20<link>
        tunnel   txqueuelen 0  (IPIP Tunnel)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 8  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 165  bytes 11936 (11.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 165  bytes 11936 (11.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth22c373b4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1480
        inet6 fe80::40bc:fff:fe62:dec6  prefixlen 64  scopeid 0x20<link>
        ether 42:bc:0f:62:de:c6  txqueuelen 0  (Ethernet)
        RX packets 870  bytes 82962 (81.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 893  bytes 86195 (84.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth419fa604: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1480
        inet6 fe80::182f:58ff:fed3:4b1d  prefixlen 64  scopeid 0x20<link>
        ether 1a:2f:58:d3:4b:1d  txqueuelen 0  (Ethernet)
        RX packets 863  bytes 82476 (80.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 897  bytes 86459 (84.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethea5a8f1b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1480
        inet6 fe80::7086:95ff:fe37:ab9b  prefixlen 64  scopeid 0x20<link>
        ether 72:86:95:37:ab:9b  txqueuelen 0  (Ethernet)
        RX packets 1  bytes 42 (42.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 16  bytes 1304 (1.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@k8s-2 ~]# 
1.首先是数据包从Pod中出来到ROOT NS中:使用10.244.0.3 ping  10.244.1.7
由于10.244.1.7和自己10.244.0.3并不在同一个网段:所以需要进程三层路由表查询:
[root@k8s-1 ~]# kubectl exec -it cni-8v2vj -- ifconfig 
eth0      Link encap:Ethernet  HWaddr 26:32:89:A6:66:6E  
          inet addr:10.244.0.3  Bcast:10.244.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1480  Metric:1
          RX packets:13 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1178 (1.1 KiB)  TX bytes:42 (42.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

[root@k8s-1 ~]# 
[root@k8s-1 ~]# kubectl exec -it cni-8v2vj -- route -n 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.244.0.1      0.0.0.0         UG    0      0        0 eth0
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.244.0.0      10.244.0.1      255.255.0.0     UG    0      0        0 eth0
[root@k8s-1 ~]# 
S_IP:10.244.0.3  S_MAC:$(10.244.0.3)
D_IP: 10.244.1.7  D_MAC:$(10.244.0.1)
抓包如下:# 此时在pod中的eth0转包结果:
No.     Time                          Source                Destination           Protocol Length Info
      1 2021-04-01 20:31:08.004680    de:12:83:ae:98:64     Broadcast             ARP      42     Who has 10.244.1.4? Tell 10.244.1.1

Frame 1: 42 bytes on wire (336 bits), 42 bytes captured (336 bits)
Ethernet II, Src: de:12:83:ae:98:64 (de:12:83:ae:98:64), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request)

No.     Time                          Source                Destination           Protocol Length Info
      2 2021-04-01 20:31:08.004712    a2:fe:41:22:15:94     de:12:83:ae:98:64     ARP      42     10.244.1.4 is at a2:fe:41:22:15:94

Frame 2: 42 bytes on wire (336 bits), 42 bytes captured (336 bits)
Ethernet II, Src: a2:fe:41:22:15:94 (a2:fe:41:22:15:94), Dst: de:12:83:ae:98:64 (de:12:83:ae:98:64)
Address Resolution Protocol (reply)

No.     Time                          Source                Destination           Protocol Length Info
      3 2021-04-01 20:31:08.004730    10.244.0.2            10.244.1.4            ICMP     98     Echo (ping) request  id=0x1000, seq=0/0, ttl=62 (reply in 4)

Frame 3: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)
Ethernet II, Src: de:12:83:ae:98:64 (de:12:83:ae:98:64), Dst: a2:fe:41:22:15:94 (a2:fe:41:22:15:94)
Internet Protocol Version 4, Src: 10.244.0.2, Dst: 10.244.1.4
Internet Control Message Protocol

No.     Time                          Source                Destination           Protocol Length Info
      4 2021-04-01 20:31:08.004751    10.244.1.4            10.244.0.2            ICMP     98     Echo (ping) reply    id=0x1000, seq=0/0, ttl=64 (request in 3)

Frame 4: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)
Ethernet II, Src: a2:fe:41:22:15:94 (a2:fe:41:22:15:94), Dst: de:12:83:ae:98:64 (de:12:83:ae:98:64)
Internet Protocol Version 4, Src: 10.244.1.4, Dst: 10.244.0.2
Internet Control Message Protocol
# IP 和 MAC地址对应上边的env即可。

# 2.次数数据包到达ROOT NS:
此时我们查看路由表信息:
[root@k8s-1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.12.1.2      0.0.0.0         UG    100    0        0 ens33
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0 
10.244.1.0      172.12.1.12     255.255.255.0   UG    0      0        0 flannel.ipip  # 由于D_IP为10.244.1.7,所以匹配到此路由条目,从flannel.ipip发出
172.12.1.0      0.0.0.0         255.255.255.0   U     100    0        0 ens33
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
此时由上边的经验我们知道,此时数据包是一个RAW IP Data:
所以此时仅仅是做RAW格式数据包转化,并且指导内核做IPIP封装。
先看第一个转换数据包为RAW IP Data:# 此时在flannel.ipip接口抓包的结果:
No.     Time                          Source                Destination           Protocol Length Info
1       2021-07-31 17:10:56.190468    10.244.0.3            10.244.1.7            ICMP     84     Echo (ping) request  id=0x4a00, seq=0/0, ttl=63 (reply in 2)

Frame 1: 84 bytes on wire (672 bits), 84 bytes captured (672 bits)
Raw packet data  # this one
Internet Protocol Version 4, Src: 10.244.0.3, Dst: 10.244.1.7
Internet Control Message Protocol

No.     Time                          Source                Destination           Protocol Length Info
2       2021-07-31 17:10:56.190828    10.244.1.7            10.244.0.3            ICMP     84     Echo (ping) reply    id=0x4a00, seq=0/0, ttl=63 (request in 1)

Frame 2: 84 bytes on wire (672 bits), 84 bytes captured (672 bits)
Raw packet data  # this one
Internet Protocol Version 4, Src: 10.244.1.7, Dst: 10.244.0.3
Internet Control Message Protocol

# 此为步骤一:
我们还能看到它需要指导内核进行IPIP封装:
所以我们再次观察刚才的路由表信息:
[root@k8s-1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.12.1.2      0.0.0.0         UG    100    0        0 ens33
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.1.0      172.12.1.12     255.255.255.0   UG    0      0        0 flannel.ipip   # 出接口是flannel.ipip,网关是172.12.1.12(是另外一台主机k8s-2)
172.12.1.0      0.0.0.0         255.255.255.0   U     100    0        0 ens33
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
[root@k8s-1 ~]# ip -d link show flannel.ipip   ## 使用此命令我们可以查询到ipip tunnel的信息:
5: flannel.ipip@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ipip 172.12.1.11 brd 0.0.0.0 promiscuity 0 
    ipip remote any local 172.12.1.11 ttl inherit nopmtudisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
可以看到loacl是172.12.1.11,而上边的路由表表明它的下一跳(网关)是172.12.1.12,所以此时使用的IPIP Tunnel是local:172.12.1.11 remote:172.12.1.12
所以此时数据包格式为:
Outer_S_IP:172.12.1.11   Outer_D_IP:172.12.1.12 Outer_S_MAC:$(172.12.1.11)  Outer_D_MAC:$(172.12.1.12)
Inner_S_IP:10.244.0.3    Inner_D_IP:10.244.1.7  Inner_S_MAC:      //        Inner_D_MAC:      //

抓包show的结果为:#此时在ens33上抓包的结果:
No.     Time                          Source                Destination           Protocol Length Info
675     2021-07-31 17:10:56.190480    10.244.0.3            10.244.1.7            ICMP     118    Echo (ping) request  id=0x4a00, seq=0/0, ttl=63 (reply in 676)

Frame 675: 118 bytes on wire (944 bits), 118 bytes captured (944 bits)
Ethernet II, Src: Vmware_bd:fb:4a (00:0c:29:bd:fb:4a), Dst: Vmware_e2:bf:86 (00:0c:29:e2:bf:86)  # 地址信息
Internet Protocol Version 4, Src: 172.12.1.11, Dst: 172.12.1.12
Internet Protocol Version 4, Src: 10.244.0.3, Dst: 10.244.1.7
Internet Control Message Protocol

No.     Time                          Source                Destination           Protocol Length Info
676     2021-07-31 17:10:56.190828    10.244.1.7            10.244.0.3            ICMP     118    Echo (ping) reply    id=0x4a00, seq=0/0, ttl=63 (request in 675)

Frame 676: 118 bytes on wire (944 bits), 118 bytes captured (944 bits)
Ethernet II, Src: Vmware_e2:bf:86 (00:0c:29:e2:bf:86), Dst: Vmware_bd:fb:4a (00:0c:29:bd:fb:4a)
Internet Protocol Version 4, Src: 172.12.1.12, Dst: 172.12.1.11
Internet Protocol Version 4, Src: 10.244.1.7, Dst: 10.244.0.3                                    # 地址信息   
Internet Control Message Protocol