介绍
集群网络环境
网卡名称:ens192
10.103.22.183 master0110.103.22.184 node0210.103.22.185 node03
什么是Calico
Calico是一个用于容器、虚拟机和基于本地主机的工作负载的开源网络和网络安全解决方案。Calico支持广泛的平台,包括Kubernetes、OpenShift、Docker EE、OpenStack和bare metal服务。Calico将灵活的网络功能与随处运行的安全实施相结合,提供了一个具有本地Linux内核性能和真正的云本地可伸缩性的解决方案。
为什么使用Calico
- 针对网络安全的最佳实践
Calico丰富的网络策略模型使得锁定通信变得很容易,所以唯一的流量就是你想要的流量。您可以将Calico的安全增强看作是用它自己的个人防火墙来包装您的每个工作负载,当您部署新服务或将应用程序向上或向下扩展时,防火墙会实时动态地重新配置。
Calico的策略引擎可以在主机网络层(如果使用Istio & Envoy)和服务网格层(如果使用Istio & Envoy)执行相同的策略模型,保护您的基础设施不受工作负载影响。
- 性能
Calico使用Linux内核内置的高度优化的转发和访问控制功能来提供本地Linux网络数据平面性能,通常不需要任何与第一代SDN网络相关的encap/decap开销。
- 扩容
Calico的核心设计原则利用了最佳实践的云原生设计模式,并结合了全球最大的互联网运营商所信任的基于标准的网络协议。其结果是一个具有非凡可伸缩性的解决方案,多年来一直在生产中大规模运行。Calico的开发测试周期包括定期测试数千个节点集群。无论您运行的是10个节点集群、100个节点集群还是更多,您都可以获得最大Kubernetes集群所要求的性能和可伸缩性特性的改进。
- 互操作
Calico支持Kubernetes工作负载和非Kubernetes或遗留工作负载无缝且安全地通信。Kubernetes pods是您网络上的一等公民,能够与网络上的任何其他工作负载进行通信。此外,Calico可以无缝地扩展,以保护与Kubernetes一起的现有基于主机的工作负载(无论是在公共云中,还是在VMs上的on-prem或裸机服务器上)。所有工作负载都受制于相同的网络策略模型,因此允许流的惟一流量就是您希望流的流量。
- 和linux有很多相似之处
Calico使用现有系统管理员已经熟悉的Linux术语。输入您喜欢的Linux网络命令,您将得到您期望的结果。在绝大多数部署中,离开应用程序的包是通过网络传输的包,没有封装、隧道或覆盖。系统和网络管理员用来获得可见性和分析网络问题的所有现有工具都像现在一样工作。
- 针对Kubernetes网络策略的支持
Calico的网络策略引擎在API的开发过程中形成了Kubernetes网络策略的原始参考实现。Calico的独特之处在于它实现了API定义的全部特性,为用户提供了API定义时所设想的所有功能和灵活性。对于需要更强大功能的用户,Calico支持一组扩展的网络策略功能,这些功能与Kubernetes API一起无缝地工作,为用户定义网络策略提供了更大的灵活性。
组件介绍
Calico主要组件
- calico/node: 该agent作为Calico守护进程的一部分运行。它管理接口,路由和接点的状态报告及强制性策略。
- BIRD: 一个BGP的客户端,由Felix程序广播路由。
- Etcd: 一个可选的分布式数据库存储。
- Calico Controller: Calico策略控制器。
calico/node
calico/node是一个由两个容器组成的Pod
- 一个calico/node容器运行两个守护进程。
a. Felix
b. the Bird BGP daemon (optional)
- A calico-CNI插件, 响应来自节点上的kubelet的CNI请求。
Felix
这个Felix组件是Calico网络的核心。它运行在集群中的每个节点上,它主要负责接口、路由的管理、状态报告及强制性策略。
接口和路由的管理
Felix守护进程负责编程接口并在内核路由表中创建路由,以便在创建pod时为它们提供可路由的IP地址。Felix创建虚拟的网络接口,并且针对每个pod从Calico IPAM中分配一个IP地址。 接口一般以cali前辍开头,除非明确指定
状态报告
Felix通过监视工具(如Prometheus)公开用于实例状态报告的度量。
强制性策略
Felix负责网络策略的实施。Felix监视Pod上的标签,并与定义的网络策略对象进行比较,以决定是否允许或拒绝Pod的流量。Felix将有关接口及其IP地址和主机网络状态的信息写入etcd。
typha
Typha守护程序位于数据存储区(例如Kubernetes API服务器)和Felix的许多实例之间。Typha的主要目的是通过减少每个节点对数据存储的影响来增加规模。Felix和confd等服务连接到Typha,而不是直接连接到数据存储,因为Typha代表其所有客户端维护单个数据存储连接。它缓存数据存储区状态并删除重复数据事件,以便可以将其散发给许多侦听器。
BIRD
BIRD是一个BGP守护进程,它将Felix编写的路由信息分发给集群节点上的其他BIRD代理。BIRD agent是和Calico守护进程的Pod一起安装的。这确保了流量是跨节点可路由的。默认情况下,Calico创建一个完整的网格拓扑。这意味着每个BIRD代理都需要连接到集群中的其他所有BIRD代理。
对于较大的部署,BIRD可以配置为路由反射器。路由反射器拓扑允许将BIRD设置为其他BIRD代理通信的集中点。它还减少了每个BGP代理打开连接的数量。
ETCD
Calico使用一个称为etcd的分布式数据存储,存储Calico资源配置和网络策略规则。Felix守护进程与etcd数据存储进行通信,用于发布每个节点的路由、节点和接口信息。
为了获得更高的可用性,应该为大型部署设置多节点etcd集群。在这个设置中,etcd确保在etcd集群中复制Calico配置,使它们始终处于最后已知的良好状态。
一个可选的部署模型是使用Kubernetes API服务器作为分布式数据存储,从而消除了构建和维护etcd数据存储的需要。
安装calico
50个节点
curl https://docs.projectcalico.org/manifests/calico.yaml -O
大于50个节点
curl https://docs.projectcalico.org/manifests/calico-typha.yaml -o calico.yaml
调整calico-typha副本数
apiVersion: apps/v1beta1kind: Deploymentmetadata:name: calico-typha...spec:...replicas: <number of replicas>
- 官网建议是200个节点增加一个calico-typha的副本数
- 一般建议calico-typha副本数为3保证其高可用
- calico-typha 不得多于k8s节点的数量
- calico-typha的副本数为0时 felix不会启动
安装
执行
使用的是第二种安装方式(大于50个节点的)
kubectl apply -f calico-typha.yaml
修改配置
apiVersion: apps/v1kind: Deploymentmetadata:name: calico-typhanamespace: kube-systemlabels:k8s-app: calico-typhaspec:replicas: 2revisionHistoryLimit: 2selector:matchLabels:k8s-app: calico-typha
- calico-typha副本数为2
#configmap配置calico分配pod ip地址段- name: CALICO_IPV4POOL_CIDRvalue: "192.110.0.0/16"
- 配置pods的地址池
查询
kubectl get pods -n kube-system -o wide |grep calicocalico-kube-controllers-6d4bfc7c57-jpfwx 1/1 Running 2 50m 192.110.140.65 node02 <none> <none>calico-node-prnv8 1/1 Running 0 50m 10.103.22.183 master01 <none> <none>calico-node-rsg5h 1/1 Running 0 50m 10.103.22.185 node03 <none> <none>calico-node-tljfz 1/1 Running 0 50m 10.103.22.184 node02 <none> <none>calico-typha-9dfb6964-q6mtq 1/1 Running 0 50m 10.103.22.185 node03 <none> <none>calico-typha-9dfb6964-whndn 1/1 Running 0 50m 10.103.22.184 node02 <none> <none>
- 运行正常
查看路由
ip routedefault via 10.103.22.1 dev ens192 proto static metric 10010.103.22.0/24 dev ens192 proto kernel scope link src 10.103.22.184 metric 100172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1blackhole 192.110.140.64/26 proto bird192.110.140.65 dev cali9fe3f3b71b4 scope link192.110.140.68 dev calie60333d71a4 scope link192.110.140.69 dev calibb530faa265 scope link192.110.186.192/26 via 10.103.22.185 dev tunl0 proto bird onlink192.110.241.64/26 via 10.103.22.183 dev tunl0 proto bird onlink
- calico已经完成路由的添加
安装calicoctl命令
Calico提供一个名为calicoctl的命令行实用程序,用于管理Calico配置。运行calicoctl实用程序的主机需要连接到Calico etcd数据存储。另外,可以将calicoctl配置为连接到Kubernetes API数据存储。
您可以在任何可以通过网络访问Calico数据存储的主机上以二进制或容器的形式运行calicoctl。其中有三种的安装方式:
- 在单一的主机上作为二进制进行安装
- 在单一的主机上作为容器进行安装
- 作为kubernetes pod进行安装
二进制安装calicoctl(目前使用的方式)
下载calicoctl 二进制文件
$ curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.11.1/calicoctl
设置文件为可执行
$ chmod +x calicoctl
把calicoctl移动到可搜索的路径
$ mv calicoctl /usr/local/bin
配置calicoctl的配置文件
$ cat /etc/calico/calicoctl.cfgapiVersion: projectcalico.org/v3kind: CalicoAPIConfigmetadata:spec:datastoreType: "kubernetes"kubeconfig: "/root/.kube/config"
查询结果
calicoctl node statusCalico process is running.IPv4 BGP status+---------------+-------------------+-------+----------+-------------+| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |+---------------+-------------------+-------+----------+-------------+| 10.103.22.184 | node-to-node mesh | up | 05:58:02 | Established || 10.103.22.185 | node-to-node mesh | up | 05:58:02 | Established |+---------------+-------------------+-------+----------+-------------+IPv6 BGP statusNo IPv6 peers found.
- 已经显示BGP的状态
容器安装calicoctl
$ docker pull calico/ctl:v3.11.1
calicoctl作为Kubernetes pod
使用与数据存储类型匹配的YAML将calicoctl容器部署到节点。
- etcd
$ kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calicoctl-etcd.yaml
或
- Kubernetes API存储
$ kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calicoctl.yaml
- 可以使用kubectl命令显示如下:
$ kubectl exec -ti -n kube-system calicoctl -- /calicoctl get profiles -o wideNAME TAGSkns.default kns.defaultkns.kube-system kns.kube-system
- 建议设置个别名
$ alias calicoctl="kubectl exec -i -n kube-system calicoctl /calicoctl -- "
[warning]为了能使用
calicoctl别名,重定向文件到输入
calicoctl create -f - < my_manifest.yaml
配置BGP路由反射器及对等体
BGP协议配置
缺省的节点到节点的BGP网格,当集群节点太大时同步路由就会分成损耗西能,所以必须修改
调整默认的node-to-node的模式需要先创建一个default的BGP协议配置
默认BGP peering 状态
calicoctl node statusCalico process is running.IPv4 BGP status+---------------+-------------------+-------+------------+-------------+| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |+---------------+-------------------+-------+------------+-------------+| 10.103.22.183 | node-to-node mesh | up | 2020-10-24 | Established || 10.103.22.184 | node-to-node mesh | up | 2020-12-10 | Established || 10.103.22.185 | node-to-node mesh | up | 2020-10-24 | Established |+---------------+-------------------+-------+------------+-------------+IPv6 BGP statusNo IPv6 peers found.
- 使用的node to node模式
- 建立的BGP对等体数量为:2的n次方
创建default BGPConfiguration
vim calico-default.yaml
apiVersion: projectcalico.org/v3kind: BGPConfigurationmetadata:name: defaultspec:logSeverityScreen: InfonodeToNodeMeshEnabled: falseasNumber: 63400
- calico apply -f calico-default.yaml
- 不使用node-node的模式
- logSeverityScreen日志级别
- nodeToNodeMeshEnabled:node to node模式是否开启
配置node作为路由反射器
Calico 可以配置扮演成一个路由反射器。每个节点要充当路由反射器必须有一个集群ID——通常为一个未使用的IPv4地址。
配置master01为路由反射器
#查看node节点calicoctl get nodeNAMEmaster01node02node03#配置master01为路由反射器calicoctl patch node master01 -p '{"spec": {"bgp": {"routeReflectorClusterID": "244.0.0.1"}}}'
- 配置一个节点作为路由反射器,集群ID 244.0.0.1
路由反射器添加label
#为路由反射器添加labelkubectl label node master01 route-reflector=true
- 常规情况下,给这个节点打上标签,标明这个是路由反射器。允许它通过BGPPeer resource选择。
设置BGPPeer
vim calico-bgppeer.yaml
apiVersion: projectcalico.org/v3kind: BGPPeermetadata:name: peer-with-route-reflectorsspec:nodeSelector: all()peerSelector: route-reflector == 'true'
- calico apply -f calico-bgppeer.yaml
- 使用标签器区分路由反射器节点和非路由反射器节点
查看BGP peering 状态
- 您可以使用calicoctl查看一个特定节点的边界网关协议连接的当前状态。这是用于确认您的配置是根据需要的行为。
#master01查看calicoctl node statusCalico process is running.IPv4 BGP status+---------------+---------------+-------+----------+-------------+| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |+---------------+---------------+-------+----------+-------------+| 10.103.22.184 | node specific | up | 14:06:27 | Established || 10.103.22.185 | node specific | up | 14:06:25 | Established |+---------------+---------------+-------+----------+-------------+IPv6 BGP statusNo IPv6 peers found.
- master01和 node02还有node03 建立BGP对等关系
- 由于master01是路由反射器会个所有的节点去建立BGP对等关系
#node02查看calicoctl node statusCalico process is running.IPv4 BGP status+---------------+---------------+-------+----------+-------------+| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |+---------------+---------------+-------+----------+-------------+| 10.103.22.183 | node specific | up | 14:06:27 | Established |+---------------+---------------+-------+----------+-------------+IPv6 BGP statusNo IPv6 peers found.
- node01和master01已经建立对等关系
- node节点只会和路由反射器建立对等关系
- 这样就减少了BGP对等关系的数量
- 建立的BGP对等体数量为:n
配置全局的 BGP 对等体
apiVersion: projectcalico.org/v3kind: BGPPeermetadata:name: my-global-peerspec:peerIP: 192.20.30.40asNumber: 64567
- 下面的示例创建一个全局BGP对等点,它将每个Calico节点配置为在AS 64567中使用192.20.30.40的对等点。
配置每节点的 BGP peer
每个节点的BGP对等点应用于集群中的一个或多个节点。您可以通过精确地指定节点的名称或使用标签选择器来选择节点。
apiVersion: projectcalico.org/v3kind: BGPPeermetadata:name: rack1-torspec:peerIP: 192.20.30.40asNumber: 64567nodeSelector: rack == ‘rack-1’
调整global AS number
- 改变默认的global AS number
默认的,所有的calico 节点使用64512 autonomous system, 除非特殊指定。下面的命令把它改成64513.
calicoctl patch bgpconfiguration default -p '{"spec": {"asNumber": “64513”}}'
- 针对特定的节点改变AS number,如下所示
calicoctl patch node node-1 -p '{"spec": {"bgp": {“asNumber”: “64514”}}}'
宣告k8s集群子网和外部子网
添加BGPConfiguration配置
serviceClusterIPs:- cidr: 192.110.0.0/16serviceExternalIPs:- cidr: 10.103.23.0/24
- serviceClusterIPs 宣告k8s集群的子网
- serviceExternalIPs 宣告k8s集群外的其他子网
注意:
serviceClusterIPs使用场景:在k8s子网内的主机没有加入k8s集群的机器,需要直接通过ip而不是通过ingress代理访问pod服务
serviceExternalIPs 使用场景:不在k8s子网内的主机没有加入k8s集群的机器,需要直接通过ip而不是通过ingress代理访问pod服务
以上能实现的前提条件是需要访问pod的机器安装了BIRD服务,并将其与群集中运行的Calico节点进行对等,以便它可以学习路由
IP IN IP模式
封装类型
您可以配置每个IP池不同封装配置。然而,你不能一个IP池内混合封装类型。
- Configure IP in IP encapsulation for only cross subnet traffic
- Configure IP in IP encapsulation for all inter workload traffic
IPv4/6 地址支持
IP in IP和 VXLAN只支持IPv4地址。
IPIP实践
Calico 只有一个选项来选择性地封装流量 ,跨越子网边界。我们建议使用IP in IP的cross subnet选项把开销降到最低。
- cross subnet 是指当节点在同一个子网时使用路由的方式,不同子网时使用IPIP封装的方式。
注意:切换封装模式会导到正在连接的进程中断。
Always(calico默认使用方式)
配置IPPool
ipipMode设置Always
vim ippool-always.yaml
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: ippool-ipip-alwaysspec:cidr: 192.168.0.0/16ipipMode: AlwaysnatOutgoing: true
- 所有的流量包都用IPIP进行封装
- calico官方默认使用的是ipip-always的方式
- calicoctl apply -f calico-ippool-default.yaml
查看IPPool
calicoctl get IPPool default-ipv4-ippool -o yamlapiVersion: projectcalico.org/v3kind: IPPoolmetadata:creationTimestamp: "2020-12-21T05:57:19Z"name: default-ipv4-ippoolresourceVersion: "591742"uid: 84960860-4a7c-4b40-b767-946c2baf981espec:blockSize: 26cidr: 192.110.0.0/16ipipMode: AlwaysnatOutgoing: truenodeSelector: all()vxlanMode: Never
抓包验证
查看pods所在节点
kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-dr8xc 1/1 Running 1 39h 192.110.186.198 node03 <none> <none>web-fbns2 1/1 Running 1 39h 192.110.186.197 node03 <none> <none>web-ftlqt 1/1 Running 1 39h 192.110.140.68 node02 <none> <none>
进入web-ftlqt pod(节点在node02),长ping 192.110.186.197(节点在node03)
kubectl exec -it web-ftlqt -- /bin/bashping 192.110.186.197PING 192.110.186.197 (192.110.186.197): 48 data bytes56 bytes from 192.110.186.197: icmp_seq=0 ttl=62 time=1.270 ms56 bytes from 192.110.186.197: icmp_seq=1 ttl=62 time=0.537 ms
在node03节点上进行抓包
查看网卡信息
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:c0:81:9f:46 brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 46:40:96:15:95:e7 brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 12:0c:d9:ad:73:ac brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 07: cali7dd0f951039@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 18: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.0inet 192.110.186.192/32 scope global tunl0valid_lft forever preferred_lft forever9: cali7d765aed89d@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
- ens192 主机网卡
- tunl0 calico创建的隧道网卡
- cali开头的设备为 calico为pod创建的网卡
- kube-ipvs0 ipvs创建的网卡
先在ens192网卡上抓包
#在ens192网卡(服务器网卡)上抓包tcpdump -i ens192 -nn dst 192.110.140.68tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
- 没有数据包,说明pods发送过来的包经过了封装,所以node03节点ens192网卡看不到192.110.140.68的数据包
在tunl0网卡上抓包
#在tunl0网卡(calico ipip模式的隧道设备)上抓包tcpdump -i tunl0 -nn dst 192.110.140.68tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes11:30:40.303384 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 5633, length 5611:30:41.304847 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 5889, length 5611:30:42.305407 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 6145, length 5611:30:43.306508 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 6401, length 5611:30:44.310433 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 6657, length 5611:30:45.309017 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 6913, length 5611:30:46.310418 IP 192.110.186.197 > 192.110.140.68: ICMP echo reply, id 19712, seq 7169, length 567 packets captured7 packets received by filter0 packets dropped by kernel
- 有数据包通过
- 可以看出数据包通过tunl0设备到node03的pos上
- 数据包经过封装,到达node03的pod
CrossSubnet
配置IPPool
设置ipipMode为CrossSubnet
vimippool-crosssubnet.yaml
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: ippool-ipip-crosssubnetspec:cidr: 192.168.0.0/16ipipMode: CrossSubnetnatOutgoing: true
使用自带IPPool配置
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: default-ipv4-ippoolspec:cidr: 192.110.0.0/16ipipMode: CrossSubnetnatOutgoing: truenodeSelector: all()vxlanMode: Never
- IP in IP封装可以选择性的执行
- 节点在同一个子网时使用路由的方式,不同子网时使用IPIP封装的方式。
- calicoctl apply -f calico-ippool-default.yaml
查看IPPool
calicoctl get IPPool default-ipv4-ippool -o yamlapiVersion: projectcalico.org/v3kind: IPPoolmetadata:creationTimestamp: "2020-12-21T05:57:19Z"name: default-ipv4-ippoolresourceVersion: "799024"uid: 84960860-4a7c-4b40-b767-946c2baf981espec:blockSize: 26cidr: 192.110.0.0/16ipipMode: CrossSubnetnatOutgoing: truenodeSelector: all()vxlanMode: Never
- ipipMode已经调整为CrossSubnet
查看路由(CrossSubnet没生效前)
ip routedefault via 10.103.22.1 dev ens192 proto static metric 10010.103.22.0/24 dev ens192 proto kernel scope link src 10.103.22.184 metric 100172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1blackhole 192.110.140.64/26 proto bird192.110.140.65 dev cali9fe3f3b71b4 scope link192.110.140.68 dev calie60333d71a4 scope link192.110.140.69 dev calibb530faa265 scope link192.110.186.192/26 via 10.103.22.185 dev tunl0 proto bird onlink192.110.241.64/26 via 10.103.22.183 dev tunl0 proto bird onlink
- 没有生效之前,pod网段的路由都是到tunl0
- 还是使用ipip模式进行封装然后通过tunl0隧道设备进行传输数据包
查看路由(CrossSubnet生效)
ip routedefault via 10.103.22.1 dev ens192 proto static metric 10010.103.22.0/24 dev ens192 proto kernel scope link src 10.103.22.184 metric 100172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1blackhole 192.110.140.64/26 proto bird192.110.140.65 dev cali9fe3f3b71b4 scope link192.110.140.66 dev cali827b87b13d2 scope link192.110.140.69 dev calibb530faa265 scope link192.110.186.192/26 via 10.103.22.185 dev ens192 proto bird192.110.241.64/26 via 10.103.22.183 dev ens192 proto bird
- CrossSubnet 生效
- pod网段路由都是到ens192网卡
抓包验证
查看pods所在节点
kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-2djb2 1/1 Running 0 11m 192.110.140.66 node02 <none> <none>web-6gv9d 1/1 Running 0 11m 192.110.186.194 node03 <none> <none>web-ccct6 1/1 Running 0 11m 192.110.186.193 node03 <none> <none>
进入web-2djb2 pod(节点在node02),长ping 192.110.186.194(节点在node03)
kubectl exec -it web-2djb2 -- /bin/bashping 192.110.186.194PING 192.110.186.194 (192.110.186.194): 48 data bytes56 bytes from 192.110.186.194: icmp_seq=0 ttl=62 time=1.251 ms56 bytes from 192.110.186.194: icmp_seq=1 ttl=62 time=0.500 ms56 bytes from 192.110.186.194: icmp_seq=2 ttl=62 time=6.477 ms56 bytes from 192.110.186.194: icmp_seq=3 ttl=62 time=0.763 ms56 bytes from 192.110.186.194: icmp_seq=4 ttl=62 time=0.530 ms56 bytes from 192.110.186.194: icmp_seq=5 ttl=62 time=0.470 ms56 bytes from 192.110.186.194: icmp_seq=6 ttl=62 time=0.590 ms56 bytes from 192.110.186.194: icmp_seq=7 ttl=62 time=0.528 ms56 bytes from 192.110.186.194: icmp_seq=8 ttl=62 time=0.430 ms^C--- 192.110.186.194 ping statistics ---9 packets transmitted, 9 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.430/1.282/6.477/1.852 ms
在node03节点上进行抓包
查看网卡信息
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:c0:81:9f:46 brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 46:40:96:15:95:e7 brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 12:0c:d9:ad:73:ac brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 08: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.0inet 192.110.186.192/32 scope global tunl0valid_lft forever preferred_lft forever11: cali2b2abdd19e8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 112: cali64dc30e7c56@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
- ens192 主机网卡
- tunl0 calico创建的隧道网卡
- cali开头的设备为 calico为pod创建的网卡
- kube-ipvs0 ipvs创建的网卡
先在ens192网卡上抓包
#在ens192网卡(服务器网卡)上抓包tcpdump -i ens192 -nn dst 192.110.140.66tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes14:50:29.840816 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 0, length 5614:50:30.841764 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 256, length 5614:50:31.846405 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 512, length 5614:50:32.847888 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 768, length 5614:50:33.849056 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1024, length 5614:50:34.850389 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1280, length 5614:50:35.851610 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1536, length 5614:50:36.852774 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1792, length 568 packets captured8 packets received by filter0 packets dropped by kernel
- 有数据包通过
- 数据包通过ens192网卡直接路由到node03的pod上
- 数据包没有封装直接路由到node03的pod上
在tunl0网卡上抓包
#在tunl0网卡(calico ipip模式的隧道设备)上抓包tcpdump -i tunl0 -nn dst 192.110.140.66tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
- 没有数据包通过
- 说明数据包直接路由传输
- 没有进行封装
不同子网节点验证(目前没有环境没有验证)
实验结果
不同子网会使用IPIP模式进行数据包封装传输
Never(不使用ipip)
配置IPPool
ipipMode设置Never
vim ippool-never.yaml
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: ippool-ipip-neverspec:cidr: 192.168.0.0/16ipipMode: NevernatOutgoing: true
使用自带IPPool配置
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: default-ipv4-ippoolspec:cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: Never
- 不使用IPIP模式
- 必须所有节点都在一个子网内
- calicoctl apply -f calico-ippool-default.yaml
查看IPPool
calicoctl get IPPool default-ipv4-ippool -o yamlapiVersion: projectcalico.org/v3kind: IPPoolmetadata:creationTimestamp: "2020-12-21T05:57:19Z"name: default-ipv4-ippoolresourceVersion: "804703"uid: 84960860-4a7c-4b40-b767-946c2baf981espec:blockSize: 26cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: Never
- ipipMode已经设置成Never
查看路由
ip routedefault via 10.103.22.1 dev ens192 proto static metric 10010.103.22.0/24 dev ens192 proto kernel scope link src 10.103.22.184 metric 100172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1blackhole 192.110.140.64/26 proto bird192.110.140.65 dev cali9fe3f3b71b4 scope link192.110.140.66 dev cali827b87b13d2 scope link192.110.140.69 dev calibb530faa265 scope link192.110.186.192/26 via 10.103.22.185 dev ens192 proto bird192.110.241.64/26 via 10.103.22.183 dev ens192 proto bird
- Never生效
- pod网段路由都是到ens192网卡
抓包验证
查看pods所在节点
kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-2djb2 1/1 Running 0 11m 192.110.140.66 node02 <none> <none>web-6gv9d 1/1 Running 0 11m 192.110.186.194 node03 <none> <none>web-ccct6 1/1 Running 0 11m 192.110.186.193 node03 <none> <none>
进入web-2djb2 pod(节点在node02),长ping 192.110.186.194(节点在node03)
kubectl exec -it web-2djb2 -- /bin/bashping 192.110.186.194PING 192.110.186.194 (192.110.186.194): 48 data bytes56 bytes from 192.110.186.194: icmp_seq=0 ttl=62 time=1.251 ms56 bytes from 192.110.186.194: icmp_seq=1 ttl=62 time=0.500 ms56 bytes from 192.110.186.194: icmp_seq=2 ttl=62 time=6.477 ms56 bytes from 192.110.186.194: icmp_seq=3 ttl=62 time=0.763 ms56 bytes from 192.110.186.194: icmp_seq=4 ttl=62 time=0.530 ms56 bytes from 192.110.186.194: icmp_seq=5 ttl=62 time=0.470 ms56 bytes from 192.110.186.194: icmp_seq=6 ttl=62 time=0.590 ms56 bytes from 192.110.186.194: icmp_seq=7 ttl=62 time=0.528 ms56 bytes from 192.110.186.194: icmp_seq=8 ttl=62 time=0.430 ms^C--- 192.110.186.194 ping statistics ---9 packets transmitted, 9 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.430/1.282/6.477/1.852 ms
在node03节点上进行抓包
查看网卡信息
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:c0:81:9f:46 brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 46:40:96:15:95:e7 brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 12:0c:d9:ad:73:ac brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 08: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.0inet 192.110.186.192/32 scope global tunl0valid_lft forever preferred_lft forever11: cali2b2abdd19e8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 112: cali64dc30e7c56@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
- ens192 主机网卡
- tunl0 calico创建的隧道网卡
- cali开头的设备为 calico为pod创建的网卡
- kube-ipvs0 ipvs创建的网卡
先在ens192网卡上抓包
#在ens192网卡(服务器网卡)上抓包tcpdump -i ens192 -nn dst 192.110.140.66tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes14:50:29.840816 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 0, length 5614:50:30.841764 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 256, length 5614:50:31.846405 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 512, length 5614:50:32.847888 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 768, length 5614:50:33.849056 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1024, length 5614:50:34.850389 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1280, length 5614:50:35.851610 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1536, length 5614:50:36.852774 IP 192.110.186.194 > 192.110.140.66: ICMP echo reply, id 18688, seq 1792, length 568 packets captured8 packets received by filter0 packets dropped by kernel
- 有数据包通过
- 数据包通过ens192网卡直接路由到node03的pod上
- 数据包没有封装直接路由到node03的pod上
在tunl0网卡上抓包
#在tunl0网卡(calico ipip模式的隧道设备)上抓包tcpdump -i tunl0 -nn dst 192.110.140.66tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
- 没有数据包通过
- 说明数据包直接路由传输
- 没有进行封装
VXLAN模式
封装类型
您可以配置每个IP池不同封装配置。然而,你不能一个IP池内混合封装类型。
- Configure VXLAN encapsulation for only cross subnet traffic
- Configure VXLAN encapsulation for all inter workload traffic
CrossSubnet
配置IPPool
设置ipipMode为CrossSubnet
vim ippool-vxlan-crosssubnet.yaml
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: ippool-vxlan-cross-subnet-1spec:cidr: 192.168.0.0/16vxlanMode: CrossSubnetnatOutgoing: true
使用自带IPPool配置
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: default-ipv4-ippoolspec:cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: CrossSubnet
- vxlan封装可以选择性的执行
- 节点在同一个子网时使用路由的方式,不同子网时使用IPIP封装的方式。
- calicoctl apply -f calico-vxlan-test.yaml
查看IPPool
calicoctl get IPPool default-ipv4-ippool -o yamlapiVersion: projectcalico.org/v3kind: IPPoolmetadata:creationTimestamp: "2020-12-21T05:57:19Z"name: default-ipv4-ippoolresourceVersion: "808677"uid: 84960860-4a7c-4b40-b767-946c2baf981espec:blockSize: 26cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: CrossSubnet
- vxlanMode已经调整为CrossSubnet
查看主机网卡
ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:c0:81:9f:46 brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 46:40:96:15:95:e7 brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 12:0c:d9:ad:73:ac brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 08: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.011: cali2b2abdd19e8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 112: cali64dc30e7c56@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 213: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group defaultlink/ether 66:43:c9:45:ef:85 brd ff:ff:ff:ff:ff:ffinet 192.168.186.192/32 scope global vxlan.calicovalid_lft forever preferred_lft forever
- 已经生成vxlan.calico设备(隧道)
抓包验证
查看pods所在节点
kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-2djb2 1/1 Running 1 98m 192.110.140.67 node02 <none> <none>web-6gv9d 1/1 Running 1 98m 192.110.186.197 node03 <none> <none>web-ccct6 1/1 Running 1 98m 192.110.186.195 node03 <none> <none>
进入web-2djb2 pod(节点在node02),长ping 192.110.186.195(节点在node03)
kubectl exec -it web-2djb2 -- /bin/bashping 192.110.186.195PING 192.110.186.195 (192.110.186.195): 48 data bytes56 bytes from 192.110.186.195: icmp_seq=0 ttl=62 time=1.827 ms56 bytes from 192.110.186.195: icmp_seq=1 ttl=62 time=0.528 ms56 bytes from 192.110.186.195: icmp_seq=2 ttl=62 time=0.703 ms56 bytes from 192.110.186.195: icmp_seq=3 ttl=62 time=1.418 ms^C--- 192.110.186.195 ping statistics ---4 packets transmitted, 4 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.528/1.119/1.827/0.527 ms
在node03节点上进行抓包
查看网卡信息
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:41:42:1b:cf brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 0a:17:1c:2f:75:0f brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 0a:de:71:6e:3a:86 brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali2b2abdd19e8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 07: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 18: cali64dc30e7c56@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 29: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.010: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group defaultlink/ether 66:43:c9:45:ef:85 brd ff:ff:ff:ff:ff:ffinet 192.110.186.198/32 scope global vxlan.calicovalid_lft forever preferred_lft forever
- ens192 主机网卡
- vxlan calico创建的隧道网卡
- cali开头的设备为 calico为pod创建的网卡
- kube-ipvs0 ipvs创建的网卡
先在ens192网卡上抓包
#在ens192网卡(服务器网卡)上抓包tcpdump -i ens192 -nn dst 192.110.140.67tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes16:13:23.748235 IP 192.110.186.195 > 192.110.140.67: ICMP echo reply, id 24064, seq 0, length 5616:13:24.747423 IP 192.110.186.195 > 192.110.140.67: ICMP echo reply, id 24064, seq 256, length 5616:13:25.748290 IP 192.110.186.195 > 192.110.140.67: ICMP echo reply, id 24064, seq 512, length 5616:13:26.754372 IP 192.110.186.195 > 192.110.140.67: ICMP echo reply, id 24064, seq 768, length 56^C4 packets captured5 packets received by filter0 packets dropped by kernel
- 有数据包通过
- 数据包通过ens192网卡直接路由到node03的pod上
- 数据包没有封装直接路由到node03的pod上
在vxlan.calico网卡上抓包
#在vxlan.calico网卡(calico vxlan模式的隧道设备)上抓包tcpdump -i vxlan.calico -nn dst 192.110.140.67tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on vxlan.calico, link-type EN10MB (Ethernet), capture size 262144 bytes^C0 packets captured0 packets received by filter0 packets dropped by kernel
- 没有数据包通过
- 说明数据包直接路由传输
- 没有进行封装
不同子网节点验证(目前没有环境没有验证)
实验结果
不同子网会使用IPIP模式进行数据包封装传输
Always
配置IPPool
ipipMode设置Always
vim ippool-always.yaml
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: default-ipv4-ippoolspec:cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: Always
- natOutgoing: true所有的流量包都用IPIP进行封装
- calicoctl apply -f calico-vxlan-test.yaml
查看IPPool
calicoctl get IPPool default-ipv4-ippool -o yamlapiVersion: projectcalico.org/v3kind: IPPoolmetadata:creationTimestamp: "2020-12-21T05:57:19Z"name: default-ipv4-ippoolresourceVersion: "815620"uid: 84960860-4a7c-4b40-b767-946c2baf981espec:blockSize: 26cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: Always
- vxlanMode已经调整成Always
抓包验证
查看pods所在节点
kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-2djb2 1/1 Running 1 98m 192.110.140.67 node02 <none> <none>web-6gv9d 1/1 Running 1 98m 192.110.186.197 node03 <none> <none>web-ccct6 1/1 Running 1 98m 192.110.186.195 node03 <none> <none>
进入web-ftlqt pod(节点在node02),长ping 192.110.186.195(节点在node03)
kubectl exec -it web-2djb2 -- /bin/bashping 192.110.186.195PING 192.110.186.195 (192.110.186.195): 48 data bytes56 bytes from 192.110.186.195: icmp_seq=0 ttl=62 time=0.996 ms56 bytes from 192.110.186.195: icmp_seq=1 ttl=62 time=0.725 ms56 bytes from 192.110.186.195: icmp_seq=2 ttl=62 time=2.307 ms56 bytes from 192.110.186.195: icmp_seq=3 ttl=62 time=2.684 ms56 bytes from 192.110.186.195: icmp_seq=4 ttl=62 time=0.621 ms^C--- 192.110.186.195 ping statistics ---5 packets transmitted, 5 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.621/1.467/2.684/0.857 ms
在node03节点上进行抓包
查看网卡信息
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:41:42:1b:cf brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 0a:17:1c:2f:75:0f brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 0a:de:71:6e:3a:86 brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali2b2abdd19e8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 07: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 18: cali64dc30e7c56@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 29: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.010: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group defaultlink/ether 66:43:c9:45:ef:85 brd ff:ff:ff:ff:ff:ffinet 192.110.186.198/32 scope global vxlan.calicovalid_lft forever preferred_lft forever
- ens192 主机网卡
- vxlan calico创建的隧道网卡
- cali开头的设备为 calico为pod创建的网卡
- kube-ipvs0 ipvs创建的网卡
先在ens192网卡上抓包
#在ens192网卡(服务器网卡)上抓包tcpdump -i ens192 -nn dst 192.110.186.195tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes^C0 packets captured0 packets received by filter0 packets dropped by kernel
- 没有数据包,说明pods发送过来的包经过了封装,所以node03节点ens192网卡看不到192.110.186.195的数据包
在vxlan.calico网卡上抓包
#在vxlan.calico网卡(calico vxlan模式的隧道设备)上抓包tcpdump -i vxlan.calico -nn dst 192.110.186.195tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on vxlan.calico, link-type EN10MB (Ethernet), capture size 262144 bytes16:32:58.895301 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 15104, length 5616:32:59.896622 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 15360, length 5616:33:00.897224 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 15616, length 5616:33:01.899095 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 15872, length 5616:33:02.900233 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 16128, length 5616:33:03.901294 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 16384, length 5616:33:04.902684 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 27392, seq 16640, length 56^C7 packets captured7 packets received by filter0 packets dropped by kernel
- 有数据包通过
- 可以看出数据包通过vxlan.calico设备到node03的pos上
- 数据包经过封装,到达node03的pod
Never(不使用vxlan)
配置IPPool
ipipMode设置Never
vim ippool-never.yaml
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: ippool-vxlanspec:cidr: 192.168.0.0/16vxlanMode: NevernatOutgoing: true
使用自带IPPool配置
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:name: default-ipv4-ippoolspec:cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: Never
- 不使用vxlan模式
- 必须所有节点都在一个子网内
- calicoctl apply -f calico-vxlan-test.yaml
查看IPPool
apiVersion: projectcalico.org/v3kind: IPPoolmetadata:creationTimestamp: "2020-12-21T05:57:19Z"name: default-ipv4-ippoolresourceVersion: "816709"uid: 84960860-4a7c-4b40-b767-946c2baf981espec:blockSize: 26cidr: 192.110.0.0/16ipipMode: NevernatOutgoing: truenodeSelector: all()vxlanMode: Never
- vxlanMode已经设置成Never
查看路由
ip routedefault via 10.103.22.1 dev ens192 proto static metric 10010.103.22.0/24 dev ens192 proto kernel scope link src 10.103.22.184 metric 100172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1blackhole 192.110.140.64/26 proto bird192.110.140.67 dev cali827b87b13d2 scope link192.110.140.68 dev cali9fe3f3b71b4 scope link192.110.140.69 dev calibb530faa265 scope link192.110.186.192/26 via 10.103.22.185 dev ens192 proto bird192.110.241.64/26 via 10.103.22.183 dev ens192 proto bird
- Never生效
- pod网段路由都是到ens192网卡
抓包验证
查看pods所在节点
kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-2djb2 1/1 Running 1 124m 192.110.140.67 node02 <none> <none>web-6gv9d 1/1 Running 1 124m 192.110.186.197 node03 <none> <none>web-ccct6 1/1 Running 1 124m 192.110.186.195 node03 <none> <none>
进入web-2djb2 pod(节点在node02),长ping 192.110.186.194(节点在node03)
kubectl exec -it web-2djb2 -- /bin/bashpping 192.110.186.195PING 192.110.186.195 (192.110.186.195): 48 data bytes56 bytes from 192.110.186.195: icmp_seq=0 ttl=62 time=0.828 ms56 bytes from 192.110.186.195: icmp_seq=1 ttl=62 time=0.464 ms56 bytes from 192.110.186.195: icmp_seq=2 ttl=62 time=1.384 ms56 bytes from 192.110.186.195: icmp_seq=3 ttl=62 time=0.444 ms56 bytes from 192.110.186.195: icmp_seq=4 ttl=62 time=0.418 ms^C--- 192.110.186.195 ping statistics ---5 packets transmitted, 5 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.418/0.708/1.384/0.370 ms
在node03节点上进行抓包
查看网卡信息
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a0:ce:ff brd ff:ff:ff:ff:ff:ffinet 10.103.22.185/24 brd 10.103.22.255 scope global noprefixroute ens192valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:41:42:1b:cf brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000link/ether 0a:17:1c:2f:75:0f brd ff:ff:ff:ff:ff:ff5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group defaultlink/ether 0a:de:71:6e:3a:86 brd ff:ff:ff:ff:ff:ffinet 192.100.182.252/32 brd 192.100.182.252 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.227.87/32 brd 192.100.227.87 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.10/32 brd 192.100.0.10 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 192.100.0.1/32 brd 192.100.0.1 scope global kube-ipvs0valid_lft forever preferred_lft forever6: cali2b2abdd19e8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 07: cali5ed4c62f10d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 18: cali64dc30e7c56@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 29: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000link/ipip 0.0.0.0 brd 0.0.0.010: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group defaultlink/ether 66:43:c9:45:ef:85 brd ff:ff:ff:ff:ff:ffinet 192.110.186.198/32 scope global vxlan.calicovalid_lft forever preferred_lft forever
- ens192 主机网卡
- vxlan calico创建的隧道网卡
- cali开头的设备为 calico为pod创建的网卡
- kube-ipvs0 ipvs创建的网卡
先在ens192网卡上抓包
#在ens192网卡(服务器网卡)上抓包tcpdump -i ens192 -nn dst 192.110.186.195tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes16:41:35.376592 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 30208, seq 0, length 5616:41:36.377526 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 30208, seq 256, length 5616:41:37.378774 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 30208, seq 512, length 5616:41:38.380282 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 30208, seq 768, length 5616:41:39.380485 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 30208, seq 1024, length 5616:41:40.382600 IP 192.110.140.67 > 192.110.186.195: ICMP echo request, id 30208, seq 1280, length 56^C6 packets captured6 packets received by filter0 packets dropped by kernel
- 有数据包通过
- 数据包通过ens192网卡直接路由到node03的pod上
- 数据包没有封装直接路由到node03的pod上
- 在vxlan.calico网卡上抓包
#在vxlan.calico网卡(calico vxlan模式的隧道设备)上抓包tcpdump -i vxlan.calico -nn dst 192.110.186.195tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on vxlan.calico, link-type EN10MB (Ethernet), capture size 262144 bytes^C0 packets captured0 packets received by filter0 packets dropped by kernel
- 没有数据包通过
- 说明数据包直接路由传输
- 没有进行封装
calico 封装类型总结
IP IN IP
- crosssubnet 在跨子网的环境中使用
- 同子网使用路由模式
- 不同子网使用ipip封装
- 使用比较灵活,高效
- always 使用ipip封装
- 所有环境都使用ipip封装
- 对包进行封装会降低性能
- never 不是用ipip封装
- 不对包进行封装
- 使用路由模式
- 在同子网中使用,不能跨子网
VXLAN
- crosssubnet 在跨子网的环境中使用
- 同子网使用路由模式
- 不同子网使用vxlan封装
- 使用比较灵活,高效
- always 使用vxlan封装
- 所有环境都使用vxlan封装
- 对包进行封装会降低性能
- never 不是用vxlan封装
- 不对包进行封装
- 使用路由模式
- 在同子网中使用,不能跨子网
Never
- ipip和vxlan都配置成never模式
- calico使用路由模式
- 在同子网中使用,不能跨子网
