Kubernetes 使用 keepalived 来产生虚拟 IP address

我们将探讨如何利用 IPVS - The Linux Virtual Server Project“ 来 kubernetes 配置 VIP

前言

kubernetes v1.6 版提供了三种方式去暴露 Service:

  1. L4 的 LoadBalacncer : 只能在 cloud providers 上被使用 像是 GCE 或 AWS
  2. NodePort : NodePort 允许在每个节点上开启一个 port 口, 借由这个 port 口会再将请求导向到随机的 pod 上
  3. L7 Ingress :Ingress 为一个 LoadBalancer(例: nginx, HAProxy, traefik, vulcand) 会将 HTTP/HTTPS 的各个请求导向到相对应的 service endpoint

有了这些方式, 为何我们还需要 keepalived ?

  1. ___________________
  2. | |
  3. |-----| Host IP: 10.4.0.3 |
  4. | |___________________|
  5. |
  6. | ___________________
  7. | | |
  8. Public ----(example.com = 10.4.0.3/4/5)----|-----| Host IP: 10.4.0.4 |
  9. | |___________________|
  10. |
  11. | ___________________
  12. | | |
  13. |-----| Host IP: 10.4.0.5 |
  14. |___________________|

我们假设 Ingress 运行在 3 个 kubernetes 节点上, 并对外暴露 10.4.0.x 的 IP 去做 loadbalance

DNS Round Robin (RR) 将对应到 example.com 的请求轮循给这 3 个节点, 如果 10.4.0.3 掛了, 仍有三分之一的流量会导向 10.4.0.3, 这样就会有一段 downtime, 直到 DNS 发现 10.4.0.3 掛了并修正导向

严格来说, 这并没有真正的做到 High Availability (HA)

这边 IPVS 可以帮助我们解决这件事, 这个想法是虚拟 IP(VIP) 对应到每个 service 上, 并将 VIP 暴露到 kubernetes 群集之外

service-loadbalanceringress-nginx 的区别

我们看到以下的图

  1. ___________________
  2. | |
  3. | VIP: 10.4.0.50 |
  4. |-----| Host IP: 10.4.0.3 |
  5. | | Role: Master |
  6. | |___________________|
  7. |
  8. | ___________________
  9. | | |
  10. | | VIP: Unassigned |
  11. Public ----(example.com = 10.4.0.50)----|-----| Host IP: 10.4.0.4 |
  12. | | Role: Slave |
  13. | |___________________|
  14. |
  15. | ___________________
  16. | | |
  17. | | VIP: Unassigned |
  18. |-----| Host IP: 10.4.0.5 |
  19. | Role: Slave |
  20. |___________________|

我们可以看到只有一个 node 被选为 Master(透过 VRRP 选择的), 而我们的 VIP 是 10.4.0.50, 如果 10.4.0.3 掛掉了, 那会从剩余的节点中选一个成为 Master 并接手 VIP, 这样我们就可以确保落实真正的 HA

环境需求

只需要确认要运行 keepalived-vip 的 kubernetes 群集 DaemonSets 功能是正常的就行了

RBAC

由于 kubernetes 在 1.6 后引进了 RBAC 的概念, 所以我们要先去设定 rule, 至於有关 RBAC 的详情请至 说明

vip-rbac.yaml

  1. apiVersion: rbac.authorization.k8s.io/v1beta1
  2. kind: ClusterRole
  3. metadata:
  4. name: kube-keepalived-vip
  5. rules:
  6. - apiGroups: [""]
  7. resources:
  8. - pods
  9. - nodes
  10. - endpoints
  11. - services
  12. - configmaps
  13. verbs: ["get", "list", "watch"]
  14. ---
  15. apiVersion: v1
  16. kind: ServiceAccount
  17. metadata:
  18. name: kube-keepalived-vip
  19. namespace: default
  20. ---
  21. apiVersion: rbac.authorization.k8s.io/v1beta1
  22. kind: ClusterRoleBinding
  23. metadata:
  24. name: kube-keepalived-vip
  25. roleRef:
  26. apiGroup: rbac.authorization.k8s.io
  27. kind: ClusterRole
  28. name: kube-keepalived-vip
  29. subjects:
  30. - kind: ServiceAccount
  31. name: kube-keepalived-vip
  32. namespace: default

clusterrolebinding.yaml

  1. apiVersion: rbac.authorization.k8s.io/v1alpha1
  2. kind: ClusterRoleBinding
  3. metadata:
  4. name: kube-keepalived-vip
  5. roleRef:
  6. apiGroup: rbac.authorization.k8s.io
  7. kind: ClusterRole
  8. name: kube-keepalived-vip
  9. subjects:
  10. - kind: ServiceAccount
  11. name: kube-keepalived-vip
  12. namespace: default
  1. $ kubectl create -f vip-rbac.yaml
  2. $ kubectl create -f clusterrolebinding.yaml

示例

先建立一个简单的 service

nginx-deployment.yaml

  1. apiVersion: apps/v1beta1
  2. kind: Deployment
  3. metadata:
  4. name: nginx-deployment
  5. spec:
  6. replicas: 3
  7. template:
  8. metadata:
  9. labels:
  10. app: nginx
  11. spec:
  12. containers:
  13. - name: nginx
  14. image: nginx:1.7.9
  15. ports:
  16. - containerPort: 80
  17. ---
  18. apiVersion: v1
  19. kind: Service
  20. metadata:
  21. name: nginx
  22. labels:
  23. app: nginx
  24. spec:
  25. type: NodePort
  26. ports:
  27. - port: 80
  28. nodePort: 30302
  29. targetPort: 80
  30. protocol: TCP
  31. name: http
  32. selector:
  33. app: nginx

主要功能就是 pod 去监听听 80 port, 再开启 service NodePort 监听 30320

  1. $ kubecrl create -f nginx-deployment.yaml

接下来我们要做的是 config map

  1. $ echo "apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: vip-configmap
  5. data:
  6. 10.87.2.50: default/nginx" | kubectl create -f -

注意, 这边的 10.87.2.50 必须换成你自己同网段下无使用的 IP e.g. 10.87.2.X
后面 nginx 为 service 的 name, 这边可以自行更换

接着确认一下

  1. $kubectl get configmap
  2. NAME DATA AGE
  3. vip-configmap 1 23h

再来就是设置 keepalived-vip

  1. apiVersion: extensions/v1beta1
  2. kind: DaemonSet
  3. metadata:
  4. name: kube-keepalived-vip
  5. spec:
  6. template:
  7. metadata:
  8. labels:
  9. name: kube-keepalived-vip
  10. spec:
  11. hostNetwork: true
  12. containers:
  13. - image: gcr.io/google_containers/kube-keepalived-vip:0.9
  14. name: kube-keepalived-vip
  15. imagePullPolicy: Always
  16. securityContext:
  17. privileged: true
  18. volumeMounts:
  19. - mountPath: /lib/modules
  20. name: modules
  21. readOnly: true
  22. - mountPath: /dev
  23. name: dev
  24. # use downward API
  25. env:
  26. - name: POD_NAME
  27. valueFrom:
  28. fieldRef:
  29. fieldPath: metadata.name
  30. - name: POD_NAMESPACE
  31. valueFrom:
  32. fieldRef:
  33. fieldPath: metadata.namespace
  34. # to use unicast
  35. args:
  36. - --services-configmap=default/vip-configmap
  37. # unicast uses the ip of the nodes instead of multicast
  38. # this is useful if running in cloud providers (like AWS)
  39. #- --use-unicast=true
  40. volumes:
  41. - name: modules
  42. hostPath:
  43. path: /lib/modules
  44. - name: dev
  45. hostPath:
  46. path: /dev

建立 daemonset

  1. $ kubectl get daemonset kube-keepalived-vip
  2. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE
  3. kube-keepalived-vip 5 5 5 5 5

检查一下配置状态

  1. kubectl get pod -o wide |grep keepalive
  2. kube-keepalived-vip-c4sxw 1/1 Running 0 23h 10.87.2.6 10.87.2.6
  3. kube-keepalived-vip-c9p7n 1/1 Running 0 23h 10.87.2.8 10.87.2.8
  4. kube-keepalived-vip-psdp9 1/1 Running 0 23h 10.87.2.10 10.87.2.10
  5. kube-keepalived-vip-xfmxg 1/1 Running 0 23h 10.87.2.12 10.87.2.12
  6. kube-keepalived-vip-zjts7 1/1 Running 3 23h 10.87.2.4 10.87.2.4

可以随机挑一个 pod, 去看里面的配置

  1. $ kubectl exec kube-keepalived-vip-c4sxw cat /etc/keepalived/keepalived.conf
  2. global_defs {
  3. vrrp_version 3
  4. vrrp_iptables KUBE-KEEPALIVED-VIP
  5. }
  6. vrrp_instance vips {
  7. state BACKUP
  8. interface eno1
  9. virtual_router_id 50
  10. priority 103
  11. nopreempt
  12. advert_int 1
  13. track_interface {
  14. eno1
  15. }
  16. virtual_ipaddress {
  17. 10.87.2.50
  18. }
  19. }
  20. # Service: default/nginx
  21. virtual_server 10.87.2.50 80 { // 此为 service 开的口
  22. delay_loop 5
  23. lvs_sched wlc
  24. lvs_method NAT
  25. persistence_timeout 1800
  26. protocol TCP
  27. real_server 10.2.49.30 8080 { // 这里说明 pod 的真实状况
  28. weight 1
  29. TCP_CHECK {
  30. connect_port 80
  31. connect_timeout 3
  32. }
  33. }
  34. }

最后我们去测试这功能

  1. $ curl 10.87.2.50
  2. <!DOCTYPE html>
  3. <html>
  4. <head>
  5. <title>Welcome to nginx!</title>
  6. <style>
  7. body {
  8. width: 35em;
  9. margin: 0 auto;
  10. font-family: Tahoma, Verdana, Arial, sans-serif;
  11. }
  12. </style>
  13. </head>
  14. <body>
  15. <h1>Welcome to nginx!</h1>
  16. <p>If you see this page, the nginx web server is successfully installed and
  17. working. Further configuration is required.</p>
  18. <p>For online documentation and support please refer to
  19. <a href="http://nginx.org/">nginx.org</a>.<br/>
  20. Commercial support is available at
  21. <a href="http://nginx.com/">nginx.com</a>.</p>
  22. <p><em>Thank you for using nginx.</em></p>
  23. </body>
  24. </html>

10.87.2.50:80(我们假设的 VIP, 实际上其实没有 node 是用这 IP) 即可帮我们导向这个 service

以上的程式代码都在 github 上可以找到。

参考文档