Service

Service是工作在四层网络的负载均衡资源。其本身是iptables或ipvs规则,所以它不能处理客户端的请求。

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: url-analysis-deployment
  5. labels:
  6. app: url-analysis
  7. spec:
  8. replicas: 6
  9. selector:
  10. matchLabels:
  11. app: url-analysis
  12. template:
  13. metadata:
  14. labels:
  15. app: url-analysis
  16. spec:
  17. containers:
  18. - name: url-analysis
  19. image: addenda1998/url-analysis:v1.0
  20. imagePullPolicy: IfNotPresent
  21. ports:
  22. - containerPort: 8080

ClusterIp Service

apiVersion: v1
kind: Service
metadata:
  name: cluster-ip-svc
  namespace: dev
spec:
  selector:
    app: url-analysis
  ports:
  - name: http
    protocol: TCP
    # Service自身的端口号
    port: 8080
    # 目标端口号
    targetPort: 56010

测试:

[root@k8s-master service]# kubectl get service -owide
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE   SELECTOR
cluster-ip-svc   ClusterIP   10.96.240.122   <none>        8080/TCP   13m   app=url-analysis
[root@k8s-master service]# curl 10.96.240.122:8080/hello
ClientIp : 192.168.235.192, ServerName : url-analysis-deployment-547845566c-hgnlm, ServerIp : 192.168.169.144
[root@k8s-master service]# ip addr
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:ff:8e:6d brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.12/24 brd 10.1.1.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feff:8e6d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:76:9e:39:7b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.235.192/32 scope global tunl0
       valid_lft forever preferred_lft forever

Endpoints

每创建一个Service就会创建一个同名的Endpoints。

[root@k8s-master service]# kubectl get endpoints -owide
NAME             ENDPOINTS                                                                       AGE
cluster-ip-svc   192.168.169.141:56010,192.168.169.143:56010,192.168.169.144:56010 + 3 more...   30m
[root@k8s-master service]# kubectl describe endpoints cluster-ip-svc
Name:         cluster-ip-svc
Namespace:    dev
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-11-21T13:03:14Z
Subsets:
  Addresses:          192.168.169.141,192.168.169.143,192.168.169.144,192.168.36.127,192.168.36.66,192.168.36.67
  NotReadyAddresses:  <none>
  Ports:
    Name  Port   Protocol
    ----  ----   --------
    http  56010  TCP
Events:  <none>

readlinessProbe

apiVersion: apps/v1
kind: Deployment
metadata:
  name: url-analysis-rp-deployment
  labels:
    app: url-analysis-rp
spec:
  replicas: 6
  selector:
    matchLabels:
      app: url-analysis-rp
  template:
    metadata:
      labels:
        app: url-analysis-rp
    spec:
      containers:
      - name: url-analysis
        image: addenda1998/url-analysis:v1.0
        imagePullPolicy: IfNotPresent
        readinessProbe:
          httpGet:
            path: '/hello'
            port: 56010
          initialDelaySeconds: 15
          periodSeconds: 10

上面的Service表示,容器启动15秒之后开始检测容器的存活,并作为是否加入Endpoints的依据。初次检测之后每10秒检测一次。

[root@k8s-master service]# kubectl describe endpoints cluster-ip-rp-svc
Name:         cluster-ip-rp-svc
Namespace:    dev
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-11-21T13:27:35Z
Subsets:
  Addresses:          192.168.169.156,192.168.169.157,192.168.169.158,192.168.36.68,192.168.36.69,192.168.36.70
  NotReadyAddresses:  <none>
  Ports:
    Name  Port   Protocol
    ----  ----   --------
    http  56010  TCP

Events:  <none>

被检测为存活的Pod会存在属性.Subsets.Addresses中,不存活的会存在属性.Subsets.NotReadyAddresses中。

NodePort Service

apiVersion: v1
kind: Service
metadata:
  name: node-port-svc
  namespace: dev
spec:
  type: NodePort
  selector:
    app: url-analysis
  externalTrafficPolicy: Local
  ports: 
  - name: http
    protocol: TCP
    # Service自身的端口号
    port: 8080
    # 目标端口号
    targetPort: 80
    # 宿主机端口号
    nodePort: 32223

externalTrafficPolicy

[root@k8s-master service]# kubectl get pod -owide
NAME                                       READY   STATUS    RESTARTS   AGE   IP                NODE        NOMINATED NODE   READINESS GATES
url-analysis-deployment-547845566c-pftqj   1/1     Running   1          22h   192.168.169.161   k8s-node2   <none>           <none>

k8s-node1:10.1.1.17
k8s-node2:10.1.1.9
k8s-master:10.1.1.12

Local

externalTrafficPolicy配置为Local,表示当前节点没有对应的Pod的时候,会丢弃报文,外在表现就是请求会一直被阻塞至超时。

[root@k8s-master service]# curl 10.1.1.9:32223/hello
ClientIp : 10.1.1.12, ServerName : url-analysis-deployment-547845566c-pftqj, ServerIp : 192.168.169.161
[root@k8s-master service]# curl 10.1.1.17:32223/hello
curl: (7) Failed to connect to 10.1.1.17 port 32223: Connection timed out

Cluster

配置为Cluster的时候,如果当前节点没有Pod,Service会把包转发到拥有Pod的节点。

[root@k8s-master service]# curl 10.1.1.9:32223/hello
ClientIp : 10.1.1.9, ServerName : url-analysis-deployment-547845566c-pftqj, ServerIp : 192.168.169.161
[root@k8s-master service]# curl 10.1.1.17:32223/hello
ClientIp : 192.168.36.64, ServerName : url-analysis-deployment-547845566c-pftqj, ServerIp : 192.168.169.161

大致分析一下可以看出来当配置为Local的时候,不会做SNAT,但是配置为Cluster之后就会做SNAT。SNAT的时候如果Service和Pod在同一个机子上,源地址修改为本机真实网卡的IP,如果不是则修改为k8s虚拟的网卡的地址。

分析规则

在安装k8s的时候需要配置两个参数:

kubeadm init \
...
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16

现在我们可以看出来了,所有创建的Service的地址都是在10.96.0.0/16下面,所有的pod的地址都是在192.168.0.0/16下面。
k8s-master的网卡:

[root@k8s-master service]# ip addr
1: ...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:ff:8e:6d brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.12/24 brd 10.1.1.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feff:8e6d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: ...
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.235.192/32 scope global tunl0
       valid_lft forever preferred_lft forever
7: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 2e:f4:86:c9:ab:d7 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.198.86/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever

k8s-node1网卡:

[root@k8s-node1 ~]# ip addr
1: ...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:4c:1a:1c brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.17/24 brd 10.1.1.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe4c:1a1c/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: ...
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.36.64/32 scope global tunl0
       valid_lft forever preferred_lft forever
7: cali108db64c264@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
8: calidb8c7ca4f61@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
9: calia7a2ce60b92@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
10: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether da:72:4d:47:58:66 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.198.86/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever

k8s-node2的网卡:

[root@k8s-node2 ~]# ip addr
1: ...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:54:00:4e:5e:45 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.9/24 brd 10.1.1.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe4e:5e45/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: ...
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.169.128/32 scope global tunl0
       valid_lft forever preferred_lft forever
8: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether b6:27:e4:6e:85:a2 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.198.86/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
12: califbd678050ba@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
13: calie97258c7108@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

创建一个Deployment:

[root@k8s-master service]# kubectl get pod -owide
NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE        NOMINATED NODE   READINESS GATES
url-analysis-deployment-547845566c-4rt9d   1/1     Running   0          8m40s   192.168.169.166   k8s-node2   <none>           <none>
url-analysis-deployment-547845566c-64gq4   1/1     Running   0          8m18s   192.168.169.168   k8s-node2   <none>           <none>

ClusterIp

创建一个Cluster类型的Service:

[root@k8s-master service]# kubectl get service -owide
NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE   SELECTOR
cluster-ip-svc   ClusterIP   10.96.198.86   <none>        8080/TCP   33s   app=url-analysis

在三台机子上都可以看到这个ipvs规则:

[root@k8s-master service]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
...
TCP  10.96.198.86:8080 rr
  -> 192.168.169.166:56010        Masq    1      0          0
  -> 192.168.169.168:56010        Masq    1      0          0
...

也就是说当一个请求被发送到网络协议栈上的时候如果目的是10.96.198.86:8080,会被DNAT到192.168.169.166:56010或192.168.169.168:56010。至于10.96.198.86和192.168.169.168,192.168.169.166之间是怎么联通的,先不考虑。

NodePort

NodePort和ClusterIp两种模式的区别就是,对于NodePort模式,会同时在宿主机上开一个端口,和Docker的端口映射一样。

Local

由于我们的Pod是在k8s-node2上,所以在可以看见如下的ipvs规则。

[root@k8s-node2 ~]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.17.0.1:32223 rr
  -> 192.168.169.169:56010        Masq    1      0          0
  -> 192.168.169.170:56010        Masq    1      0          0
TCP  192.168.169.128:32223 rr
  -> 192.168.169.169:56010        Masq    1      0          0
  -> 192.168.169.170:56010        Masq    1      0          0
TCP  10.1.1.9:32223 rr
  -> 192.168.169.169:56010        Masq    1      0          0
  -> 192.168.169.170:56010        Masq    1      0          1
TCP  10.96.143.212:8080 rr
  -> 192.168.169.169:56010        Masq    1      0          0
  -> 192.168.169.170:56010        Masq    1      0          0
TCP  127.0.0.1:32223 rr
  -> 192.168.169.169:56010        Masq    1      0          0
  -> 192.168.169.170:56010        Masq    1      0          0

但是在k8s-master和k8s-node1上就只能看见

[root@k8s-node1 ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.17.0.1:32223 rr
TCP  192.168.36.64:32223 rr
TCP  10.1.1.17:32223 rr
TCP  10.96.143.212:8080 rr
  -> 192.168.169.172:56010        Masq    1      0          0
  -> 192.168.169.173:56010        Masq    1      0          0
TCP  127.0.0.1:32223 rr

所以,相比于ClusterIp模式,增加了一些DNAT规则。

ClusterIp

对于ClusterIp,在k8s-node1和k8s-master上,也可以看见ipvs规则。

[root@k8s-master service]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.235.192:32223 rr
  -> 192.168.169.172:56010        Masq    1      0          0
  -> 192.168.169.173:56010        Masq    1      0          0
TCP  10.1.1.12:32223 rr
  -> 192.168.169.172:56010        Masq    1      0          0
  -> 192.168.169.173:56010        Masq    1      0          0
TCP  10.96.143.212:8080 rr
  -> 192.168.169.172:56010        Masq    1      0          0
  -> 192.168.169.173:56010        Masq    1      0          0
TCP  127.0.0.1:32223 rr
  -> 192.168.169.172:56010        Masq    1      0          0
  -> 192.168.169.173:56010        Masq    1      0          0
TCP  172.17.0.1:32223 rr
  -> 192.168.169.172:56010        Masq    1      0          0
  -> 192.168.169.173:56010        Masq    1      0          0

同时我们再看一下iptables规则。

[root@k8s-node1 ~]# iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
cali-PREROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:6gwbT8clXdHdC1b1 */
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
cali-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:O3lYWMrLQYEMJtB5 */
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
cali-OUTPUT  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:tVnHkvAo15HuiPy0 */
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain DOCKER (2 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-MARK-DROP (1 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (3 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
RETURN     all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK xor 0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ random-fully

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  -- !192.168.0.0/16       0.0.0.0/0            /* Kubernetes service cluster ip + port for masquerade purpose */ match-set KUBE-CLUSTER-IP dst,dst
KUBE-NODE-PORT  all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst

Chain KUBE-FIREWALL (0 references)
target     prot opt source               destination
KUBE-MARK-DROP  all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-NODE-PORT (1 references)
target     prot opt source               destination
RETURN     tcp  --  0.0.0.0/0            0.0.0.0/0            /* Kubernetes nodeport TCP port with externalTrafficPolicy=local */ match-set KUBE-NODE-PORT-LOCAL-TCP dst
KUBE-MARK-MASQ  tcp  --  0.0.0.0/0            0.0.0.0/0            /* Kubernetes nodeport TCP port for masquerade purpose */ match-set KUBE-NODE-PORT-TCP dst

Chain KUBE-LOAD-BALANCER (0 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0

上面是Local模式的规则,和Cluster相比,只是多了一行规则,就是第57行在Cluster中是没有的。这也是为什么在Local模式下不会做SNAT的原因,因为在Cluster模式下,命中KUBE-NODE-PORT-LOCAL-TCP的包会被直接Reture掉。
查看一下Cluster模式下的KUBE-NODE-PORT-LOCAL-TCP:

[root@k8s-node1 ~]# ipset list KUBE-NODE-PORT-LOCAL-TCP
Name: KUBE-NODE-PORT-LOCAL-TCP
Type: bitmap:port
Revision: 3
Header: range 0-65535
Size in memory: 8300
References: 0
Number of entries: 0
Members:

查看一下Local模式下的KUBE-NODE-PORT-LOCAL-TCP:

[root@k8s-node1 ~]# ipset list KUBE-NODE-PORT-LOCAL-TCP
Name: KUBE-NODE-PORT-LOCAL-TCP
Type: bitmap:port
Revision: 3
Header: range 0-65535
Size in memory: 8300
References: 1
Number of entries: 1
Members:
32223

测试一下哪些网卡的32223端口被做DNAT了。

[root@k8s-node2 ~]# curl 127.0.0.1:32223/hello
curl: (7) Failed to connect to 127.0.0.1 port 32223: Connection timed out
[root@k8s-node2 ~]# curl 10.1.1.9:32223/hello
ClientIp : 10.1.1.9, ServerName : url-analysis-deployment-547845566c-4rt9d, ServerIp : 192.168.169.164
[root@k8s-node2 ~]# curl 172.17.0.1:32223/hello
ClientIp : 10.1.1.9, ServerName : url-analysis-deployment-547845566c-64gq4, ServerIp : 192.168.169.174
[root@k8s-node2 ~]# curl 192.168.169.128:32223/hello
ClientIp : 192.168.169.128, ServerName : url-analysis-deployment-547845566c-4rt9d, ServerIp : 192.168.169.164

可以看到除了本地回环网卡,其他网卡的32223端口都被做了DNAT。