Service
Service是工作在四层网络的负载均衡资源。其本身是iptables或ipvs规则,所以它不能处理客户端的请求。
apiVersion: apps/v1kind: Deploymentmetadata:name: url-analysis-deploymentlabels:app: url-analysisspec:replicas: 6selector:matchLabels:app: url-analysistemplate:metadata:labels:app: url-analysisspec:containers:- name: url-analysisimage: addenda1998/url-analysis:v1.0imagePullPolicy: IfNotPresentports:- containerPort: 8080
ClusterIp Service
apiVersion: v1
kind: Service
metadata:
name: cluster-ip-svc
namespace: dev
spec:
selector:
app: url-analysis
ports:
- name: http
protocol: TCP
# Service自身的端口号
port: 8080
# 目标端口号
targetPort: 56010
测试:
[root@k8s-master service]# kubectl get service -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cluster-ip-svc ClusterIP 10.96.240.122 <none> 8080/TCP 13m app=url-analysis
[root@k8s-master service]# curl 10.96.240.122:8080/hello
ClientIp : 192.168.235.192, ServerName : url-analysis-deployment-547845566c-hgnlm, ServerIp : 192.168.169.144
[root@k8s-master service]# ip addr
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:ff:8e:6d brd ff:ff:ff:ff:ff:ff
inet 10.1.1.12/24 brd 10.1.1.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feff:8e6d/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:76:9e:39:7b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.235.192/32 scope global tunl0
valid_lft forever preferred_lft forever
Endpoints
每创建一个Service就会创建一个同名的Endpoints。
[root@k8s-master service]# kubectl get endpoints -owide
NAME ENDPOINTS AGE
cluster-ip-svc 192.168.169.141:56010,192.168.169.143:56010,192.168.169.144:56010 + 3 more... 30m
[root@k8s-master service]# kubectl describe endpoints cluster-ip-svc
Name: cluster-ip-svc
Namespace: dev
Labels: <none>
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-11-21T13:03:14Z
Subsets:
Addresses: 192.168.169.141,192.168.169.143,192.168.169.144,192.168.36.127,192.168.36.66,192.168.36.67
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 56010 TCP
Events: <none>
readlinessProbe
apiVersion: apps/v1
kind: Deployment
metadata:
name: url-analysis-rp-deployment
labels:
app: url-analysis-rp
spec:
replicas: 6
selector:
matchLabels:
app: url-analysis-rp
template:
metadata:
labels:
app: url-analysis-rp
spec:
containers:
- name: url-analysis
image: addenda1998/url-analysis:v1.0
imagePullPolicy: IfNotPresent
readinessProbe:
httpGet:
path: '/hello'
port: 56010
initialDelaySeconds: 15
periodSeconds: 10
上面的Service表示,容器启动15秒之后开始检测容器的存活,并作为是否加入Endpoints的依据。初次检测之后每10秒检测一次。
[root@k8s-master service]# kubectl describe endpoints cluster-ip-rp-svc
Name: cluster-ip-rp-svc
Namespace: dev
Labels: <none>
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-11-21T13:27:35Z
Subsets:
Addresses: 192.168.169.156,192.168.169.157,192.168.169.158,192.168.36.68,192.168.36.69,192.168.36.70
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 56010 TCP
Events: <none>
被检测为存活的Pod会存在属性.Subsets.Addresses中,不存活的会存在属性.Subsets.NotReadyAddresses中。
NodePort Service
apiVersion: v1
kind: Service
metadata:
name: node-port-svc
namespace: dev
spec:
type: NodePort
selector:
app: url-analysis
externalTrafficPolicy: Local
ports:
- name: http
protocol: TCP
# Service自身的端口号
port: 8080
# 目标端口号
targetPort: 80
# 宿主机端口号
nodePort: 32223
externalTrafficPolicy
[root@k8s-master service]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
url-analysis-deployment-547845566c-pftqj 1/1 Running 1 22h 192.168.169.161 k8s-node2 <none> <none>
k8s-node1:10.1.1.17
k8s-node2:10.1.1.9
k8s-master:10.1.1.12
Local
externalTrafficPolicy配置为Local,表示当前节点没有对应的Pod的时候,会丢弃报文,外在表现就是请求会一直被阻塞至超时。
[root@k8s-master service]# curl 10.1.1.9:32223/hello
ClientIp : 10.1.1.12, ServerName : url-analysis-deployment-547845566c-pftqj, ServerIp : 192.168.169.161
[root@k8s-master service]# curl 10.1.1.17:32223/hello
curl: (7) Failed to connect to 10.1.1.17 port 32223: Connection timed out
Cluster
配置为Cluster的时候,如果当前节点没有Pod,Service会把包转发到拥有Pod的节点。
[root@k8s-master service]# curl 10.1.1.9:32223/hello
ClientIp : 10.1.1.9, ServerName : url-analysis-deployment-547845566c-pftqj, ServerIp : 192.168.169.161
[root@k8s-master service]# curl 10.1.1.17:32223/hello
ClientIp : 192.168.36.64, ServerName : url-analysis-deployment-547845566c-pftqj, ServerIp : 192.168.169.161
大致分析一下可以看出来当配置为Local的时候,不会做SNAT,但是配置为Cluster之后就会做SNAT。SNAT的时候如果Service和Pod在同一个机子上,源地址修改为本机真实网卡的IP,如果不是则修改为k8s虚拟的网卡的地址。
分析规则
在安装k8s的时候需要配置两个参数:
kubeadm init \
...
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16
现在我们可以看出来了,所有创建的Service的地址都是在10.96.0.0/16下面,所有的pod的地址都是在192.168.0.0/16下面。
k8s-master的网卡:
[root@k8s-master service]# ip addr
1: ...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:ff:8e:6d brd ff:ff:ff:ff:ff:ff
inet 10.1.1.12/24 brd 10.1.1.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feff:8e6d/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ...
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.235.192/32 scope global tunl0
valid_lft forever preferred_lft forever
7: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 2e:f4:86:c9:ab:d7 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.198.86/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
k8s-node1网卡:
[root@k8s-node1 ~]# ip addr
1: ...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:4c:1a:1c brd ff:ff:ff:ff:ff:ff
inet 10.1.1.17/24 brd 10.1.1.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe4c:1a1c/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ...
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.36.64/32 scope global tunl0
valid_lft forever preferred_lft forever
7: cali108db64c264@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: calidb8c7ca4f61@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
9: calia7a2ce60b92@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
10: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether da:72:4d:47:58:66 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.198.86/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
k8s-node2的网卡:
[root@k8s-node2 ~]# ip addr
1: ...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 52:54:00:4e:5e:45 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.9/24 brd 10.1.1.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe4e:5e45/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ...
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.169.128/32 scope global tunl0
valid_lft forever preferred_lft forever
8: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether b6:27:e4:6e:85:a2 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.198.86/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
12: califbd678050ba@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
13: calie97258c7108@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
创建一个Deployment:
[root@k8s-master service]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
url-analysis-deployment-547845566c-4rt9d 1/1 Running 0 8m40s 192.168.169.166 k8s-node2 <none> <none>
url-analysis-deployment-547845566c-64gq4 1/1 Running 0 8m18s 192.168.169.168 k8s-node2 <none> <none>
ClusterIp
创建一个Cluster类型的Service:
[root@k8s-master service]# kubectl get service -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cluster-ip-svc ClusterIP 10.96.198.86 <none> 8080/TCP 33s app=url-analysis
在三台机子上都可以看到这个ipvs规则:
[root@k8s-master service]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
...
TCP 10.96.198.86:8080 rr
-> 192.168.169.166:56010 Masq 1 0 0
-> 192.168.169.168:56010 Masq 1 0 0
...
也就是说当一个请求被发送到网络协议栈上的时候如果目的是10.96.198.86:8080,会被DNAT到192.168.169.166:56010或192.168.169.168:56010。至于10.96.198.86和192.168.169.168,192.168.169.166之间是怎么联通的,先不考虑。
NodePort
NodePort和ClusterIp两种模式的区别就是,对于NodePort模式,会同时在宿主机上开一个端口,和Docker的端口映射一样。
Local
由于我们的Pod是在k8s-node2上,所以在可以看见如下的ipvs规则。
[root@k8s-node2 ~]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.17.0.1:32223 rr
-> 192.168.169.169:56010 Masq 1 0 0
-> 192.168.169.170:56010 Masq 1 0 0
TCP 192.168.169.128:32223 rr
-> 192.168.169.169:56010 Masq 1 0 0
-> 192.168.169.170:56010 Masq 1 0 0
TCP 10.1.1.9:32223 rr
-> 192.168.169.169:56010 Masq 1 0 0
-> 192.168.169.170:56010 Masq 1 0 1
TCP 10.96.143.212:8080 rr
-> 192.168.169.169:56010 Masq 1 0 0
-> 192.168.169.170:56010 Masq 1 0 0
TCP 127.0.0.1:32223 rr
-> 192.168.169.169:56010 Masq 1 0 0
-> 192.168.169.170:56010 Masq 1 0 0
但是在k8s-master和k8s-node1上就只能看见
[root@k8s-node1 ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.17.0.1:32223 rr
TCP 192.168.36.64:32223 rr
TCP 10.1.1.17:32223 rr
TCP 10.96.143.212:8080 rr
-> 192.168.169.172:56010 Masq 1 0 0
-> 192.168.169.173:56010 Masq 1 0 0
TCP 127.0.0.1:32223 rr
所以,相比于ClusterIp模式,增加了一些DNAT规则。
ClusterIp
对于ClusterIp,在k8s-node1和k8s-master上,也可以看见ipvs规则。
[root@k8s-master service]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.235.192:32223 rr
-> 192.168.169.172:56010 Masq 1 0 0
-> 192.168.169.173:56010 Masq 1 0 0
TCP 10.1.1.12:32223 rr
-> 192.168.169.172:56010 Masq 1 0 0
-> 192.168.169.173:56010 Masq 1 0 0
TCP 10.96.143.212:8080 rr
-> 192.168.169.172:56010 Masq 1 0 0
-> 192.168.169.173:56010 Masq 1 0 0
TCP 127.0.0.1:32223 rr
-> 192.168.169.172:56010 Masq 1 0 0
-> 192.168.169.173:56010 Masq 1 0 0
TCP 172.17.0.1:32223 rr
-> 192.168.169.172:56010 Masq 1 0 0
-> 192.168.169.173:56010 Masq 1 0 0
同时我们再看一下iptables规则。
[root@k8s-node1 ~]# iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
cali-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
cali-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:O3lYWMrLQYEMJtB5 */
KUBE-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
cali-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0 /* cali:tVnHkvAo15HuiPy0 */
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-MARK-DROP (1 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x8000
Chain KUBE-MARK-MASQ (3 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000
Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
RETURN all -- 0.0.0.0/0 0.0.0.0/0 mark match ! 0x4000/0x4000
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xor 0x4000
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ random-fully
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ all -- !192.168.0.0/16 0.0.0.0/0 /* Kubernetes service cluster ip + port for masquerade purpose */ match-set KUBE-CLUSTER-IP dst,dst
KUBE-NODE-PORT all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 match-set KUBE-CLUSTER-IP dst,dst
Chain KUBE-FIREWALL (0 references)
target prot opt source destination
KUBE-MARK-DROP all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-NODE-PORT (1 references)
target prot opt source destination
RETURN tcp -- 0.0.0.0/0 0.0.0.0/0 /* Kubernetes nodeport TCP port with externalTrafficPolicy=local */ match-set KUBE-NODE-PORT-LOCAL-TCP dst
KUBE-MARK-MASQ tcp -- 0.0.0.0/0 0.0.0.0/0 /* Kubernetes nodeport TCP port for masquerade purpose */ match-set KUBE-NODE-PORT-TCP dst
Chain KUBE-LOAD-BALANCER (0 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 0.0.0.0/0 0.0.0.0/0
上面是Local模式的规则,和Cluster相比,只是多了一行规则,就是第57行在Cluster中是没有的。这也是为什么在Local模式下不会做SNAT的原因,因为在Cluster模式下,命中KUBE-NODE-PORT-LOCAL-TCP的包会被直接Reture掉。
查看一下Cluster模式下的KUBE-NODE-PORT-LOCAL-TCP:
[root@k8s-node1 ~]# ipset list KUBE-NODE-PORT-LOCAL-TCP
Name: KUBE-NODE-PORT-LOCAL-TCP
Type: bitmap:port
Revision: 3
Header: range 0-65535
Size in memory: 8300
References: 0
Number of entries: 0
Members:
查看一下Local模式下的KUBE-NODE-PORT-LOCAL-TCP:
[root@k8s-node1 ~]# ipset list KUBE-NODE-PORT-LOCAL-TCP
Name: KUBE-NODE-PORT-LOCAL-TCP
Type: bitmap:port
Revision: 3
Header: range 0-65535
Size in memory: 8300
References: 1
Number of entries: 1
Members:
32223
测试一下哪些网卡的32223端口被做DNAT了。
[root@k8s-node2 ~]# curl 127.0.0.1:32223/hello
curl: (7) Failed to connect to 127.0.0.1 port 32223: Connection timed out
[root@k8s-node2 ~]# curl 10.1.1.9:32223/hello
ClientIp : 10.1.1.9, ServerName : url-analysis-deployment-547845566c-4rt9d, ServerIp : 192.168.169.164
[root@k8s-node2 ~]# curl 172.17.0.1:32223/hello
ClientIp : 10.1.1.9, ServerName : url-analysis-deployment-547845566c-64gq4, ServerIp : 192.168.169.174
[root@k8s-node2 ~]# curl 192.168.169.128:32223/hello
ClientIp : 192.168.169.128, ServerName : url-analysis-deployment-547845566c-4rt9d, ServerIp : 192.168.169.164
可以看到除了本地回环网卡,其他网卡的32223端口都被做了DNAT。
