ENV:
kubeadm init --kubernetes-version=v1.20.5 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --skip-phases=addon/kube-proxy --ignore-preflight-errors=Swap
[root@dev3 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dev1 Ready control-plane,master 30d v1.20.5 192.168.2.31 <none> CentOS Linux 7 (Core) 5.15.8-1.el7.elrepo.x86_64 docker://20.10.12
dev2 Ready <none> 30d v1.20.5 192.168.2.32 <none> CentOS Linux 7 (Core) 5.15.8-1.el7.elrepo.x86_64 docker://20.10.12
dev3 Ready <none> 30d v1.20.5 192.168.2.33 <none> CentOS Linux 7 (Core) 5.15.8-1.el7.elrepo.x86_64 docker://20.10.12
[root@dev3 ~]#
1.add the repo:
helm repo add cilium https://helm.cilium.io/
2.generate the installation template:
helm template cilium cilium/cilium --version 1.10.6 \
--namespace kube-system \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=192.168.2.31 \
--set k8sServicePort=6443 > 1.10.6_bpf_nativerouting.yaml
3.我们加以下参数enable log level to debug:
debug-verbose: "datapath"
debug: "true"
monitor-aggregation: "none"
4.根据log,这里设置一下参数:
[root@dev1 kubernetes]# kubectl -nkube-system logs -f cilium-7h7jl| grep -i falling
level=info msg="BPF host routing requires enable-bpf-masquerade. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon
enable-bpf-masquerade: "true"
5.这只模式为Native Routing模式:
# Encapsulation mode for communication between nodes
# Possible values:
# - disabled
# - vxlan (default)
# - geneve
tunnel: disabled
6.TS:
level=fatal msg="Error while creating daemon" error="invalid daemon configuration: native routing cidr must be configured with option --native-routing-cidr in combination with --masquerade --tunnel=disabled --ipam=cluster-pool --enable-ipv4=true" subsys=daemon
level=debug msg="Controller func execution time: 1.512µs" name=endpoint-gc subsys=controller uuid=b432ab1b-f6b4-4038-a6af-0641205399a5
Native-Routing
The native routing datapath is enabled with tunnel: disabled and enables the native packet forwarding mode. The native packet forwarding mode leverages the routing capabilities of the network Cilium runs on instead of performing encapsulation.
In native routing mode, Cilium will delegate all packets which are not addressed to another local endpoint to the routing subsystem of the Linux kernel. This means that the packet will be routed as if a local process would have emitted the packet. As a result, the network connecting the cluster nodes must be capable of routing PodCIDRs.
Cilium automatically enables IP forwarding in the Linux kernel when native routing is configured.
Requirements on the network
In order to run the native routing mode, the network connecting the hosts on which Cilium is running on must be capable of forwarding IP traffic using addresses given to pods or other workloads.
The Linux kernel on the node must be aware on how to forward packets of pods or other workloads of all nodes running Cilium. This can be achieved in two ways:
The node itself does not know how to route all pod IPs but a router exists on the network that knows how to reach all other pods. In this scenario, the Linux node is configured to contain a default route to point to such a router. This model is used for cloud provider network integration. See Google Cloud, AWS ENI, and Azure IPAM for more details.
Each individual node is made aware of all pod IPs of all other nodes and routes are inserted into the Linux kernel routing table to represent this. If all nodes share a single L2 network, then this can be taken care of by enabling the option auto-direct-node-routes: true. Otherwise, an additional system component such as a BGP daemon must be run to distribute the routes. See the guide Using kube-router to run BGP on how to achieve this using the kube-router project.
Configuration
The following configuration options must be set to run the datapath in native routing mode:
[主要配置在这里,注意不同的版本参数会有不同,所以需要参考对应的版本的官网]
tunnel: disabled: Enable native routing mode.
native-routing-cidr: x.x.x.x/y: Set the CIDR in which native routing can be performed.
7.enable hostrouting:
[root@dev1 kubernetes]# kubectl -nkube-system logs cilium-m62pk | grep -i fall
level=info msg=" --enable-xt-socket-fallback='true'" subsys=daemon
level=debug msg="Skipping Leases support fallback discovery" subsys=k8s
level=info msg="BPF host routing requires enable-bpf-masquerade. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon
[root@dev1 kubernetes]#
添加以下内容:
# If you want to run cilium in debug mode change this value to true
# debug: "false"
debug-verbose: "datapath"
debug: "true"
monitor-aggregation: "none"
enable-bpf-masquerade: "true" # 根据此时添加。
# The agent can be put into the following three policy enforcement modes
8.检查运行的模式:
[root@dev1 kubernetes]# kubectl -nkube-system exec -it cilium-5z5k7 -- cilium status
KVStore: Ok Disabled
Kubernetes: Ok 1.20 (v1.20.5) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Strict [ens33 192.168.2.33 (Direct Routing)]
Cilium: Ok 1.10.6 (v1.10.6-17d3d15)
NodeMonitor: Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 3/254 allocated from 10.244.2.0/24,
BandwidthManager: Disabled
Host Routing: BPF
Masquerading: BPF [ens33] 10.0.0.0/8 [IPv4: Enabled, IPv6: Disabled]
Controller Status: 25/25 healthy
Proxy Status: OK, ip 10.244.2.85, 0 redirects active on ports 10000-20000
Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 3.22 Metrics: Disabled
Encryption: Disabled
Cluster health: 3/3 reachable (2022-01-19T06:44:27Z)
[root@dev1 kubernetes]#
- 1.通节点Pod通信
```properties
env:
[root@dev1 kubernetes]# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cni-client-7ccd98bdb8-kt9m4 1/1 Running 0 17s 10.244.0.102 dev1
cni-same-7d599bc5d8-cgbgs 1/1 Running 0 29s 10.244.0.25 dev1 ^C[root@dev1 kubernetes]# kubectl exec -it cni-client-7ccd98bdb8-kt9m4 — ping -c 2 10.244.0.25 PING 10.244.0.25 (10.244.0.25): 56 data bytes 64 bytes from 10.244.0.25: seq=0 ttl=63 time=0.223 ms 64 bytes from 10.244.0.25: seq=1 ttl=63 time=0.151 ms
—- 10.244.0.25 ping statistics —- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.151/0.187/0.223 ms [root@dev1 kubernetes]# kubectl exec -it cni-client-7ccd98bdb8-kt9m4 — ping -c 2 10.244.0.25 PING 10.244.0.25 (10.244.0.25): 56 data bytes 64 bytes from 10.244.0.25: seq=0 ttl=63 time=0.132 ms 64 bytes from 10.244.0.25: seq=1 ttl=63 time=0.165 ms
—- 10.244.0.25 ping statistics —- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.132/0.148/0.165 ms [root@dev1 kubernetes]# 这里实际上使用的bpf_redirect_peer的能力来解决,和VXLAN模式下一样。
[ICMP Request] [root@dev1 ~]# tcpdump -pne -i lxc89c8c94364d6 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lxc89c8c94364d6, link-type EN10MB (Ethernet), capture size 262144 bytes 14:48:43.692133 1e:42:49:8a:4e:bb > d2:eb:8d:a6:d4:e7, ethertype IPv4 (0x0800), length 98: 10.244.0.102 > 10.244.0.25: ICMP echo request, id 7936, seq 0, length 64 14:48:44.692423 1e:42:49:8a:4e:bb > d2:eb:8d:a6:d4:e7, ethertype IPv4 (0x0800), length 98: 10.244.0.102 > 10.244.0.25: ICMP echo request, id 7936, seq 1, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel [root@dev1 ~]#
5.cilium monitor -vv的log: [root@dev1 kubernetes]# kubectl exec -it cni-client-7ccd98bdb8-kt9m4 — ping -c 1 10.244.0.25 PING 10.244.0.25 (10.244.0.25): 56 data bytes 64 bytes from 10.244.0.25: seq=0 ttl=63 time=0.186 ms
—- 10.244.0.25 ping statistics —- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.186/0.186/0.186 ms [root@dev1 kubernetes]# [这里我们看到“貌似”有两次Conntrack lookup,但是后边有:Attempting local delivery for container id 149 from seclabel 22099,在] CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.0.102:12544 dst=10.244.0.25:0 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=1 CPU 03: MARK 0x0 FROM 2073 DEBUG: CT verdict: New, revnat=0 CPU 03: MARK 0x0 FROM 2073 DEBUG: Successfully mapped addr=10.244.0.25 to identity=11575 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack create: proxy-port=0 revnat=0 src-identity=22099 lb=0.0.0.0 CPU 03: MARK 0x0 FROM 2073 DEBUG: Attempting local delivery for container id 149 from seclabel 22099 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 1/2: src=10.244.0.102:12544 dst=10.244.0.25:0 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=0 CPU 03: MARK 0x0 FROM 149 DEBUG: CT verdict: New, revnat=0
CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack create: proxy-port=0 revnat=0 src-identity=22099 lb=0.0.0.0
Ethernet {Contents=[..14..] Payload=[..86..] SrcMAC=c6:dc:81:06:fc:cb DstMAC=1e:ea:b6:69:74:23 EthernetType=IPv4 Length=0} IPv4 {Contents=[..20..] Payload=[..64..] Version=4 IHL=5 TOS=0 Length=84 Id=17761 Flags=DF FragOffset=0 TTL=63 Protocol=ICMPv4 Checksum=57313 SrcIP=10.244.0.102 DstIP=10.244.0.25 Options=[] Padding=[]} ICMPv4 {Contents=[..8..] Payload=[..56..] TypeCode=EchoRequest Checksum=11788 Id=12544 Seq=0} CPU 03: MARK 0x0 FROM 149 to-endpoint: 98 bytes (98 captured), state new, interface lxc1ddf0c103b7bidentity 22099->11575, orig-ip 10.244.0.102, to endpoint 149 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 1/2: src=10.244.0.25:0 dst=10.244.0.102:12544 CPU 03: MARK 0x0 FROM 149 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=1 CPU 03: MARK 0x0 FROM 149 DEBUG: CT entry found lifetime=9819, revnat=0 CPU 03: MARK 0x0 FROM 149 DEBUG: CT verdict: Reply, revnat=0 CPU 03: MARK 0x0 FROM 149 DEBUG: Successfully mapped addr=10.244.0.102 to identity=22099 CPU 03: MARK 0x0 FROM 149 DEBUG: Attempting local delivery for container id 2073 from seclabel 11575 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.0.25:0 dst=10.244.0.102:12544 CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=0 CPU 03: MARK 0x0 FROM 2073 DEBUG: CT entry found lifetime=9819, revnat=0
CPU 03: MARK 0x0 FROM 2073 DEBUG: CT verdict: Reply, revnat=0
Ethernet {Contents=[..14..] Payload=[..86..] SrcMAC=d2:eb:8d:a6:d4:e7 DstMAC=1e:42:49:8a:4e:bb EthernetType=IPv4 Length=0} IPv4 {Contents=[..20..] Payload=[..64..] Version=4 IHL=5 TOS=0 Length=84 Id=11330 Flags= FragOffset=0 TTL=63 Protocol=ICMPv4 Checksum=14593 SrcIP=10.244.0.25 DstIP=10.244.0.102 Options=[] Padding=[]} ICMPv4 {Contents=[..8..] Payload=[..56..] TypeCode=EchoReply Checksum=13836 Id=12544 Seq=0} CPU 03: MARK 0x0 FROM 2073 to-endpoint: 98 bytes (98 captured), state reply, interface lxc89c8c94364d6identity 11575->22099, orig-ip 10.244.0.25, to endpoint 2073
- [x] **2.不同节点Pod通信**
```properties
env:
[root@dev1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cni-client-7ccd98bdb8-kt9m4 1/1 Running 0 118m 10.244.0.102 dev1 <none> <none>
cni-server-7559c58f9-h72mw 1/1 Running 0 8m18s 10.244.2.145 dev3 <none> <none>
[root@dev1 ~]#
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2
CPU 04: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.0.102:22016 dst=10.244.2.145:0
CPU 04: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=1
CPU 04: MARK 0x0 FROM 2073 DEBUG: CT verdict: New, revnat=0
CPU 04: MARK 0x0 FROM 2073 DEBUG: Successfully mapped addr=10.244.2.145 to identity=17840
CPU 04: MARK 0x0 FROM 2073 DEBUG: Conntrack create: proxy-port=0 revnat=0 src-identity=22099 lb=0.0.0.0
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=10.244.2.145 to identity=17840
CPU 03: MARK 0x0 FROM 472 DEBUG: Attempting local delivery for container id 2073 from seclabel 17840
CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 1/2: src=10.244.2.145:0 dst=10.244.0.102:22016
CPU 03: MARK 0x0 FROM 2073 DEBUG: Conntrack lookup 2/2: nexthdr=1 flags=0
CPU 03: MARK 0x0 FROM 2073 DEBUG: CT entry found lifetime=16257, revnat=0
CPU 03: MARK 0x0 FROM 2073 DEBUG: CT verdict: Reply, revnat=0
------------------------------------------------------------------------------
level=info msg="Initializing dissection cache..." subsys=monitor
Ethernet {Contents=[..14..] Payload=[..86..] SrcMAC=d2:eb:8d:a6:d4:e7 DstMAC=1e:42:49:8a:4e:bb EthernetType=IPv4 Length=0}
IPv4 {Contents=[..20..] Payload=[..64..] Version=4 IHL=5 TOS=0 Length=84 Id=63334 Flags= FragOffset=0 TTL=62 Protocol=ICMPv4 Checksum=27748 SrcIP=10.244.2.145 DstIP=10.244.0.102 Options=[] Padding=[]}
ICMPv4 {Contents=[..8..] Payload=[..56..] TypeCode=EchoReply Checksum=170 Id=22016 Seq=0}
CPU 03: MARK 0x0 FROM 2073 to-endpoint: 98 bytes (98 captured), state reply, interface lxc89c8c94364d6identity 17840->22099, orig-ip 10.244.2.145, to endpoint 2073
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2
CPU 03: MARK 0x0 FROM 472 DEBUG: Successfully mapped addr=192.168.2.2 to identity=2