1. ENV:
  2. kubeadm init --kubernetes-version=v1.20.5 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --skip-phases=addon/kube-proxy --ignore-preflight-errors=Swap
  3. [root@dev3 ~]# kubectl get nodes -o wide
  4. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
  5. dev1 Ready control-plane,master 30d v1.20.5 192.168.2.31 <none> CentOS Linux 7 (Core) 5.15.8-1.el7.elrepo.x86_64 docker://20.10.12
  6. dev2 Ready <none> 30d v1.20.5 192.168.2.32 <none> CentOS Linux 7 (Core) 5.15.8-1.el7.elrepo.x86_64 docker://20.10.12
  7. dev3 Ready <none> 30d v1.20.5 192.168.2.33 <none> CentOS Linux 7 (Core) 5.15.8-1.el7.elrepo.x86_64 docker://20.10.12
  8. [root@dev3 ~]#
  9. 1.add the repo:
  10. helm repo add cilium https://helm.cilium.io/
  11. 2.generate the installation template:
  12. helm template cilium cilium/cilium --version 1.10.6 \
  13. --namespace kube-system \
  14. --set kubeProxyReplacement=strict \
  15. --set k8sServiceHost=192.168.2.31 \
  16. --set k8sServicePort=6443 > 1.10.6_bpf_nativerouting.yaml
  17. 3.我们加以下参数enable log level to debug:
  18. debug-verbose: "datapath"
  19. debug: "true"
  20. monitor-aggregation: "none"
  21. 4.根据log,这里设置一下参数:
  22. [root@dev1 kubernetes]# kubectl -nkube-system logs -f cilium-7h7jl| grep -i falling
  23. level=info msg="BPF host routing requires enable-bpf-masquerade. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon
  24. enable-bpf-masquerade: "true"
  25. 5.这只模式为Native Routing模式:
  26. # Encapsulation mode for communication between nodes
  27. # Possible values:
  28. # - disabled
  29. # - vxlan (default)
  30. # - geneve
  31. tunnel: disabled
  32. 6.TS
  33. level=fatal msg="Error while creating daemon" error="invalid daemon configuration: native routing cidr must be configured with option --native-routing-cidr in combination with --masquerade --tunnel=disabled --ipam=cluster-pool --enable-ipv4=true" subsys=daemon
  34. level=debug msg="Controller func execution time: 1.512µs" name=endpoint-gc subsys=controller uuid=b432ab1b-f6b4-4038-a6af-0641205399a5
  35. Native-Routing
  36. The native routing datapath is enabled with tunnel: disabled and enables the native packet forwarding mode. The native packet forwarding mode leverages the routing capabilities of the network Cilium runs on instead of performing encapsulation.
  37. In native routing mode, Cilium will delegate all packets which are not addressed to another local endpoint to the routing subsystem of the Linux kernel. This means that the packet will be routed as if a local process would have emitted the packet. As a result, the network connecting the cluster nodes must be capable of routing PodCIDRs.
  38. Cilium automatically enables IP forwarding in the Linux kernel when native routing is configured.
  39. Requirements on the network
  40. In order to run the native routing mode, the network connecting the hosts on which Cilium is running on must be capable of forwarding IP traffic using addresses given to pods or other workloads.
  41. The Linux kernel on the node must be aware on how to forward packets of pods or other workloads of all nodes running Cilium. This can be achieved in two ways:
  42. The node itself does not know how to route all pod IPs but a router exists on the network that knows how to reach all other pods. In this scenario, the Linux node is configured to contain a default route to point to such a router. This model is used for cloud provider network integration. See Google Cloud, AWS ENI, and Azure IPAM for more details.
  43. Each individual node is made aware of all pod IPs of all other nodes and routes are inserted into the Linux kernel routing table to represent this. If all nodes share a single L2 network, then this can be taken care of by enabling the option auto-direct-node-routes: true. Otherwise, an additional system component such as a BGP daemon must be run to distribute the routes. See the guide Using kube-router to run BGP on how to achieve this using the kube-router project.
  44. Configuration
  45. The following configuration options must be set to run the datapath in native routing mode:
  46. [主要配置在这里,注意不同的版本参数会有不同,所以需要参考对应的版本的官网]
  47. tunnel: disabled: Enable native routing mode.
  48. native-routing-cidr: x.x.x.x/y: Set the CIDR in which native routing can be performed.
  49. 7.enable hostrouting
  50. [root@dev1 kubernetes]# kubectl -nkube-system logs cilium-m62pk | grep -i fall
  51. level=info msg=" --enable-xt-socket-fallback='true'" subsys=daemon
  52. level=debug msg="Skipping Leases support fallback discovery" subsys=k8s
  53. level=info msg="BPF host routing requires enable-bpf-masquerade. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon
  54. [root@dev1 kubernetes]#
  55. 添加以下内容:
  56. # If you want to run cilium in debug mode change this value to true
  57. # debug: "false"
  58. debug-verbose: "datapath"
  59. debug: "true"
  60. monitor-aggregation: "none"
  61. enable-bpf-masquerade: "true" # 根据此时添加。
  62. # The agent can be put into the following three policy enforcement modes
  63. 8.检查运行的模式:
  64. [root@dev1 kubernetes]# kubectl -nkube-system exec -it cilium-5z5k7 -- cilium status
  65. KVStore: Ok Disabled
  66. Kubernetes: Ok 1.20 (v1.20.5) [linux/amd64]
  67. Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
  68. KubeProxyReplacement: Strict [ens33 192.168.2.33 (Direct Routing)]
  69. Cilium: Ok 1.10.6 (v1.10.6-17d3d15)
  70. NodeMonitor: Listening for events on 128 CPUs with 64x4096 of shared memory
  71. Cilium health daemon: Ok
  72. IPAM: IPv4: 3/254 allocated from 10.244.2.0/24,
  73. BandwidthManager: Disabled
  74. Host Routing: BPF
  75. Masquerading: BPF [ens33] 10.0.0.0/8 [IPv4: Enabled, IPv6: Disabled]
  76. Controller Status: 25/25 healthy
  77. Proxy Status: OK, ip 10.244.2.85, 0 redirects active on ports 10000-20000
  78. Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 3.22 Metrics: Disabled
  79. Encryption: Disabled
  80. Cluster health: 3/3 reachable (2022-01-19T06:44:27Z)
  81. [root@dev1 kubernetes]#

20221006-Cilium Host Routing Enabled with BPF Mode[Native Routing] - 图1

  • 1.通节点Pod通信 ```properties env: [root@dev1 kubernetes]# kubectl get pods -o wide -w NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cni-client-7ccd98bdb8-kt9m4 1/1 Running 0 17s 10.244.0.102 dev1 cni-same-7d599bc5d8-cgbgs 1/1 Running 0 29s 10.244.0.25 dev1 ^C[root@dev1 kubernetes]# kubectl exec -it cni-client-7ccd98bdb8-kt9m4 — ping -c 2 10.244.0.25 PING 10.244.0.25 (10.244.0.25): 56 data bytes 64 bytes from 10.244.0.25: seq=0 ttl=63 time=0.223 ms 64 bytes from 10.244.0.25: seq=1 ttl=63 time=0.151 ms

—- 10.244.0.25 ping statistics —- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.151/0.187/0.223 ms [root@dev1 kubernetes]# kubectl exec -it cni-client-7ccd98bdb8-kt9m4 — ping -c 2 10.244.0.25 PING 10.244.0.25 (10.244.0.25): 56 data bytes 64 bytes from 10.244.0.25: seq=0 ttl=63 time=0.132 ms 64 bytes from 10.244.0.25: seq=1 ttl=63 time=0.165 ms

—- 10.244.0.25 ping statistics —- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.132/0.148/0.165 ms [root@dev1 kubernetes]# 这里实际上使用的bpf_redirect_peer的能力来解决,和VXLAN模式下一样。

[ICMP Request] [root@dev1 ~]# tcpdump -pne -i lxc89c8c94364d6 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lxc89c8c94364d6, link-type EN10MB (Ethernet), capture size 262144 bytes 14:48:43.692133 1e:42:49:8a:4e:bb > d2:eb:8d:a6:d4:e7, ethertype IPv4 (0x0800), length 98: 10.244.0.102 > 10.244.0.25: ICMP echo request, id 7936, seq 0, length 64 14:48:44.692423 1e:42:49:8a:4e:bb > d2:eb:8d:a6:d4:e7, ethertype IPv4 (0x0800), length 98: 10.244.0.102 > 10.244.0.25: ICMP echo request, id 7936, seq 1, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel [root@dev1 ~]#

5.cilium monitor -vv的log: [root@dev1 kubernetes]# kubectl exec -it cni-client-7ccd98bdb8-kt9m4 — ping -c 1 10.244.0.25 PING 10.244.0.25 (10.244.0.25): 56 data bytes 64 bytes from 10.244.0.25: seq=0 ttl=63 time=0.186 ms

—- 10.244.0.25 ping statistics —- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.186/0.186/0.186 ms [root@dev1 kubernetes]# [这里我们看到“貌似”有两次Conntrack lookup,但是后边有:Attempting local delivery for container id 149 from seclabel 22099,在] CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.0.102:12544 dst=10.244.0.25:0 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=1 CPU 03: MARK 0x0 FROM 2073 DEBUG: CT verdict: New, revnat=0 CPU 03: MARK 0x0 FROM 2073 DEBUG: Successfully mapped addr=10.244.0.25 to identity=11575 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack create: proxy-port=0 revnat=0 src-identity=22099 lb=0.0.0.0 CPU 03: MARK 0x0 FROM 2073 DEBUG: Attempting local delivery for container id 149 from seclabel 22099 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 1/2: src=10.244.0.102:12544 dst=10.244.0.25:0 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=0 CPU 03: MARK 0x0 FROM 149 DEBUG: CT verdict: New, revnat=0

CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack create: proxy-port=0 revnat=0 src-identity=22099 lb=0.0.0.0

Ethernet {Contents=[..14..] Payload=[..86..] SrcMAC=c6:dc:81:06:fc:cb DstMAC=1e:ea:b6:69:74:23 EthernetType=IPv4 Length=0} IPv4 {Contents=[..20..] Payload=[..64..] Version=4 IHL=5 TOS=0 Length=84 Id=17761 Flags=DF FragOffset=0 TTL=63 Protocol=ICMPv4 Checksum=57313 SrcIP=10.244.0.102 DstIP=10.244.0.25 Options=[] Padding=[]} ICMPv4 {Contents=[..8..] Payload=[..56..] TypeCode=EchoRequest Checksum=11788 Id=12544 Seq=0} CPU 03: MARK 0x0 FROM 149 to-endpoint: 98 bytes (98 captured), state new, interface lxc1ddf0c103b7bidentity 22099->11575, orig-ip 10.244.0.102, to endpoint 149 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 1/2: src=10.244.0.25:0 dst=10.244.0.102:12544 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=1 CPU 03: MARK 0x0 FROM 149 DEBUG: CT entry found lifetime=9819, revnat=0 CPU 03: MARK 0x0 FROM 149 DEBUG: CT verdict: Reply, revnat=0 CPU 03: MARK 0x0 FROM 149 DEBUG: Successfully mapped addr=10.244.0.102 to identity=22099 CPU 03: MARK 0x0 FROM 149 DEBUG: Attempting local delivery for container id 2073 from seclabel 11575 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.0.25:0 dst=10.244.0.102:12544 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=0 CPU 03: MARK 0x0 FROM 2073 DEBUG: CT entry found lifetime=9819, revnat=0

CPU 03: MARK 0x0 FROM 2073 DEBUG: CT verdict: Reply, revnat=0

Ethernet {Contents=[..14..] Payload=[..86..] SrcMAC=d2:eb:8d:a6:d4:e7 DstMAC=1e:42:49:8a:4e:bb EthernetType=IPv4 Length=0} IPv4 {Contents=[..20..] Payload=[..64..] Version=4 IHL=5 TOS=0 Length=84 Id=11330 Flags= FragOffset=0 TTL=63 Protocol=ICMPv4 Checksum=14593 SrcIP=10.244.0.25 DstIP=10.244.0.102 Options=[] Padding=[]} ICMPv4 {Contents=[..8..] Payload=[..56..] TypeCode=EchoReply Checksum=13836 Id=12544 Seq=0} CPU 03: MARK 0x0 FROM 2073 to-endpoint: 98 bytes (98 captured), state reply, interface lxc89c8c94364d6identity 11575->22099, orig-ip 10.244.0.25, to endpoint 2073


- [x] **2.不同节点Pod通信**
```properties
env:
[root@dev1 ~]# kubectl get pods -o wide 
NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE   NOMINATED NODE   READINESS GATES
cni-client-7ccd98bdb8-kt9m4   1/1     Running   0          118m    10.244.0.102   dev1   <none>           <none>
cni-server-7559c58f9-h72mw    1/1     Running   0          8m18s   10.244.2.145   dev3   <none>           <none>
[root@dev1 ~]# 

CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2
CPU 04: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.0.102:22016 dst=10.244.2.145:0
CPU 04: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=1
CPU 04: MARK 0x0 FROM 2073 DEBUG: CT verdict: New, revnat=0
CPU 04: MARK 0x0 FROM 2073 DEBUG: Successfully mapped addr=10.244.2.145 to identity=17840
CPU 04: MARK 0x0 FROM 2073 DEBUG: Conntrack create: proxy-port=0 revnat=0 src-identity=22099 lb=0.0.0.0
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=10.244.2.145 to identity=17840
CPU 03: MARK 0x0 FROM 472 DEBUG: Attempting local delivery for container id 2073 from seclabel 17840
CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.2.145:0 dst=10.244.0.102:22016
CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=0
CPU 03: MARK 0x0 FROM 2073 DEBUG: CT entry found lifetime=16257, revnat=0
CPU 03: MARK 0x0 FROM 2073 DEBUG: CT verdict: Reply, revnat=0
------------------------------------------------------------------------------
level=info msg="Initializing dissection cache..." subsys=monitor
Ethernet        {Contents=[..14..] Payload=[..86..] SrcMAC=d2:eb:8d:a6:d4:e7 DstMAC=1e:42:49:8a:4e:bb EthernetType=IPv4 Length=0}
IPv4    {Contents=[..20..] Payload=[..64..] Version=4 IHL=5 TOS=0 Length=84 Id=63334 Flags= FragOffset=0 TTL=62 Protocol=ICMPv4 Checksum=27748 SrcIP=10.244.2.145 DstIP=10.244.0.102 Options=[] Padding=[]}
ICMPv4  {Contents=[..8..] Payload=[..56..] TypeCode=EchoReply Checksum=170 Id=22016 Seq=0}
CPU 03: MARK 0x0 FROM 2073 to-endpoint: 98 bytes (98 captured), state reply, interface lxc89c8c94364d6identity 17840->22099, orig-ip 10.244.2.145, to endpoint 2073
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2

20221006-Cilium Host Routing Enabled with BPF Mode[Native Routing] - 图2
20221006-Cilium Host Routing Enabled with BPF Mode[Native Routing] - 图3