环境信息

  1. kubectl version
  2. Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:51Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  3. Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:36:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

术语解释

  • Endpoint:抓取目标实际访问的抓取地址,组成为:__scheme__ + __address__ + __metrics_path__。因此,很多标签重写都是重写的其中之一,或者全部,例如,__address_以及__metrics_path_两个是高频重写点

原理解释

集群组件本身都提供了/metrics端点暴露自身指标。

监控准备

以kubeadm启动的k8s集群中,etcd是以static pod的形式启动的,默认没有service及对应的endpoint可供集群内的prometheus访问。所以首先创建一个用来为prometheus提供接口的service(endpoint)。

了解Static Pod

静态Pod由特定节点上的kubelet daemon程序直接管理,API Server不会监控它们。不像其他Pod会被Control Plane(例如,Deployment)管理,而是由kubelet负责监视静态Pod(如果失败了,由kubelet负责重启)。

静态Pod总是绑定到特定节点的Kubelet进程上。kubelet会自动尝试为每一个静态Pod在API Server上创建一个镜像Pod,也就是说,在API Server上可以看到静态Pod,但是却不能管理。

具体可参考官方文档

静态Pod有两种创建方式:

  • 本地配置文件方式:kubelet 启动时由 —pod-manifest-path 指定的目录(默认/etc/kubernetes/manifests),kubelet会定期扫描这个目录,并根据这个目录下的.yaml或.json文件进行创建和更新操作
    • 如果把pod的yaml描述文件放到这个目录中,等kubelet扫描到文件,会自动在本机创建出来 pod;
    • 如果把pod的yaml文件更改了,kubelet也会识别到,会自动更新 pod;
    • 如果把pod的yaml文件删除了,kubelet会自动删除掉pod;
    • 因为静态pod不能被api-server直接管理,所以它的更新删除操作不能由kubectl来执行,只能直接修改或删除文本文件。
  • HTTP仓库配置文件方式:--manifest-url,kubelet定期从url获取文件,其余操作和第一种方式一样。

本地配置文件方式:解析
查看kubelet进程的配置:

  1. ps -ef | grep kubelet
  2. root 561 1 6 1126 ? 10:05:57 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --hostname-override=172.23.16.106 --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 --root-dir=/var/lib/kubelet

查看配置文件--config=/var/lib/kubelet/config.yaml

  1. more /var/lib/kubelet/config.yaml
  2. ---
  3. ...
  4. staticPodPath: /etc/kubernetes/manifests
  5. streamingConnectionIdleTimeout: 4h0m0s
  6. syncFrequency: 1m0s
  7. tlsCertFile: /var/lib/kubelet/pki/kubelet.crt
  8. tlsPrivateKeyFile: /var/lib/kubelet/pki/kubelet.key
  9. volumeStatsAggPeriod: 1m0s

查看现有StaticPod:

  1. ll /etc/kubernetes/manifests
  2. 总用量 20
  3. -rw-r--r-- 1 root root 2354 11 5 18:01 etcd-external.yaml
  4. -rw------- 1 root root 3059 11 5 18:03 kube-apiserver.yaml
  5. -rw------- 1 root root 2997 11 5 18:03 kube-controller-manager.yaml
  6. -rw------- 1 root root 1325 11 5 18:03 kube-scheduler.yaml
  7. -rw-r--r-- 1 root root 1157 11 5 18:00 lb-kube-apiserver.yaml

如何暴露访问?除了kube-apiserver之外(Kubernetes会在default命名空间下自动创建一个Service kubernetes指向这个静态Pod),其他静态Pod都需要手工创建Service,以etcd-external.yaml为例:

  1. cat etcd-external.yaml
  2. ---
  3. apiVersion: v1
  4. kind: Pod
  5. metadata:
  6. creationTimestamp: null
  7. labels:
  8. component: etcd
  9. tier: control-plane
  10. name: etcd
  11. namespace: kube-system
  12. spec:
  13. containers:
  14. - command:
  15. - etcd
  16. - --name=etcd-172.23.16.106
  17. - --advertise-client-urls=https://172.23.16.106:2379
  18. - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
  19. - --cert-file=/etc/kubernetes/pki/etcd/server.crt
  20. - --key-file=/etc/kubernetes/pki/etcd/server.key
  21. - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
  22. - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
  23. - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
  24. - --peer-client-cert-auth=true
  25. - --listen-peer-urls=https://172.23.16.106:2380
  26. - --listen-metrics-urls=http://127.0.0.1:2381
  27. - --listen-client-urls=https://127.0.0.1:2379,https://172.23.16.106:2379
  28. - --initial-cluster-state=existing
  29. - --initial-advertise-peer-urls=https://172.23.16.106:2380
  30. - --initial-cluster=etcd-172.23.16.106=https://172.23.16.106:2380
  31. - --initial-cluster-token=etcd-cluster-token
  32. - --client-cert-auth=true
  33. - --snapshot-count=10000
  34. - --data-dir=/var/lib/etcd
  35. # 推荐一小时压缩一次数据这样可以极大的保证集群稳定
  36. - --auto-compaction-retention=1
  37. # Etcd Raft消息最大字节数,官方推荐是10M
  38. - --max-request-bytes=10485760
  39. # ETCD db数据大小,默认是2G,官方推荐是8G
  40. - --quota-backend-bytes=8589934592
  41. image: registry.aliyuncs.com/google_containers/etcd:3.4.3-0
  42. imagePullPolicy: IfNotPresent
  43. livenessProbe:
  44. httpGet:
  45. host: 127.0.0.1
  46. path: /health
  47. port: 2381
  48. scheme: HTTP
  49. failureThreshold: 8
  50. initialDelaySeconds: 15
  51. timeoutSeconds: 15
  52. name: etcd
  53. resources: {}
  54. volumeMounts:
  55. - mountPath: /var/lib/etcd
  56. name: etcd-data
  57. - mountPath: /etc/kubernetes/pki/etcd
  58. name: etcd-certs
  59. - mountPath: /etc/localtime
  60. name: localtime
  61. readOnly: true
  62. hostNetwork: true
  63. priorityClassName: system-cluster-critical
  64. volumes:
  65. - hostPath:
  66. path: /var/lib/etcd
  67. type: DirectoryOrCreate
  68. name: etcd-data
  69. - hostPath:
  70. path: /etc/kubernetes/pki/etcd
  71. type: DirectoryOrCreate
  72. name: etcd-certs
  73. - hostPath:
  74. path: /etc/localtime
  75. type: File
  76. name: localtime
  77. status: {}

可以看到标签信息进而进行标签选择。

同时,需要注意,ETCD需要双向SSL验证,从如下配置--peer-client-cert-auth=true可知:

  1. - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
  2. - --cert-file=/etc/kubernetes/pki/etcd/server.crt
  3. - --key-file=/etc/kubernetes/pki/etcd/server.key
  4. - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
  5. - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
  6. - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
  7. - --peer-client-cert-auth=true

因此,请求端点时,需要配置客户端证书及秘钥,也即--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt--peer-key-file=/etc/kubernetes/pki/etcd/peer.key

服务准备

  • 执行脚本
  1. mkdir -p /u01/repo/exporter
  2. cd /u01/repo/exporter
  3. vim etcd.yaml
  4. kubectl apply -f etcd.yaml
  • YAML脚本
  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. namespace: kube-system
  5. name: etcd
  6. labels:
  7. component: etcd
  8. annotations:
  9. prometheus.io/scrape: "true"
  10. spec:
  11. selector:
  12. component: etcd
  13. type: ClusterIP
  14. clusterIP: None
  15. ports:
  16. - name: http-metrics
  17. port: 2379
  18. targetPort: 2379
  19. protocol: TCP

服务验证

  1. cd /etc/kubernetes/pki/etcd
  2. ## 方式一:通过-k参数指定不验证服务器端证书
  3. curl -k --cert ./peer.crt --key ./peer.key https://172.23.16.106:2379/metrics --header "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc0QXUtSGlaNGFoTjVuRFhRTF8zREN3T0VFQ2loYkpzNXYzTHdXZ3FVMEEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJob3BzLWFkbWluLXRva2VuLXo5Yjh0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImhvcHMtYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI5ZWMzYjE5ZC1hYWQxLTRmMDUtOGRlNC1mMGRmZTM4NjViZTYiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aG9wcy1hZG1pbiJ9.I5UM5xUbQ6Qc9B-stcZp022tXIYjKSf4VtbF-FzCtCBJOy8gQdqoxfWj5gQa7O5TYgBH_YXxpy-q-J217wupxPeH_owNUGOBGj3eB8Sbs-BUVINqltGAAS9Mmzh2a-ApP9u1OsvEYafXf0vYAYSCBWnNyhUsljx4o-Yo6CmdyQh0f6FqEhcZMKXbyasVVcmZelswFUftLM4BRrxaF3JGvlw_PG7HAvIPDhQqC43gzI_m6xtVzjpGBSe_GTf7TY5_7Cs2c6u6ZowPVId0KNaCDTzlg0Jm5kIhETU8rbYp1kwunguabh1bH7PUG7m1daFsvkPktUW3Uc6QXsZcbB4qDw"
  4. ## 方式二:通过--cacert指定CA证书
  5. curl --cacert ./ca.crt --cert ./peer.crt --key ./peer.key https://172.23.16.106:2379/metrics --header "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc0QXUtSGlaNGFoTjVuRFhRTF8zREN3T0VFQ2loYkpzNXYzTHdXZ3FVMEEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJob3BzLWFkbWluLXRva2VuLXo5Yjh0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImhvcHMtYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI5ZWMzYjE5ZC1hYWQxLTRmMDUtOGRlNC1mMGRmZTM4NjViZTYiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aG9wcy1hZG1pbiJ9.I5UM5xUbQ6Qc9B-stcZp022tXIYjKSf4VtbF-FzCtCBJOy8gQdqoxfWj5gQa7O5TYgBH_YXxpy-q-J217wupxPeH_owNUGOBGj3eB8Sbs-BUVINqltGAAS9Mmzh2a-ApP9u1OsvEYafXf0vYAYSCBWnNyhUsljx4o-Yo6CmdyQh0f6FqEhcZMKXbyasVVcmZelswFUftLM4BRrxaF3JGvlw_PG7HAvIPDhQqC43gzI_m6xtVzjpGBSe_GTf7TY5_7Cs2c6u6ZowPVId0KNaCDTzlg0Jm5kIhETU8rbYp1kwunguabh1bH7PUG7m1daFsvkPktUW3Uc6QXsZcbB4qDw"
  6. ## 方式三:经过验证,即使不指定Token,也可以正常访问
  7. curl --cacert ./ca.crt --cert ./peer.crt --key ./peer.key https://172.23.16.106:2379/metrics
  8. ---
  9. ...
  10. # HELP etcd_cluster_version Which version is running. 1 for 'cluster_version' label with current cluster version
  11. # TYPE etcd_cluster_version gauge
  12. etcd_cluster_version{cluster_version="3.4"} 1
  13. # HELP etcd_debugging_disk_backend_commit_rebalance_duration_seconds The latency distributions of commit.rebalance called by bboltdb backend.
  14. # TYPE etcd_debugging_disk_backend_commit_rebalance_duration_seconds histogram
  15. etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.001"} 247054
  16. etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.002"} 247065
  17. etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.004"} 247071
  18. etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.008"} 247072
  19. etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.016"} 247075

注意:curl所有参数可通过curl --help查看。

  • -k:忽略验证服务器端证书,-k, --insecure Allow connections to SSL sites without certs (H)
  • --cacert ./ca.crt:指定验证服务器端证书的根证书,--cacert FILE CA certificate to verify peer against (SSL)
  • --cert ./peer.crt:指定客户端证书,-E, --cert CERT[:PASSWD] Client certificate file and password (SSL)
  • --key ./peer.key:指定客户端证书秘钥,--key KEY Private key file name (SSL/SSH)
  • --header "Authorization: Bearer $Token":指定Service Account的认证Token,正常来讲需要通过最小权限原则设置,但是,首先,本例中最终验证的结果是可以不配置;一般来讲如果需要配置,且不知道该用什么样的最小权限集合,则配置cluster-admin集群管理员角色至Service Account即可,本例中是hops-admin

核心配置项解析

抓取任务配置

任务编码 指标路径 协议 请求参数 认证授权及TLS 说明
K8S-ETCD /metrics https - Token认证,禁用验证服务器端证书 实际的抓取地址需要通过HTTPS访问

K8S服务发现配置

资源类型 API Server地址 命名空间 认证授权及TLS 说明
endpoints https://172.23.16.106:8443 default,kube-system Token认证,禁用验证服务器端证书 API Server地址此处配置的是负载均衡地址,命名空间正常讲应该留空,代表所有空间

标签重写配置

K8S-ETCD

服务发现出来的__address__本身就是ETCD端点的IP,同时,基于静态Pod的原理,此处是不需要做重写的,反而可能无法通过apiserver proxy URLs的方式正常访问。

序号 重写动作 来源标签名称 分隔符 正则匹配 Hash模数 目标标签名称 目标标签替换值
1 保持 [meta_kubernetes_namespace, meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] kube-system;etcd;http-metrics
2 标签映射 __meta_kubernetes_service_label_(.+)
3 替换 [__meta_kubernetes_namespace] kubernetes_namespace
4 替换 [__meta_kubernetes_service_name] kubernetes_name

最终抓取任务配置

  1. - job_name: K8S-ETCD
  2. honor_timestamps: true
  3. scrape_interval: 1m
  4. scrape_timeout: 10s
  5. metrics_path: /metrics
  6. scheme: https
  7. file_sd_configs:
  8. - files:
  9. - /u01/prometheus/target/nodes/K8S-ETCD_targets_hosts.json
  10. refresh_interval: 5m
  11. kubernetes_sd_configs:
  12. - api_server: https://172.23.16.106:8443
  13. role: endpoints
  14. bearer_token: <secret>
  15. tls_config:
  16. insecure_skip_verify: true
  17. namespaces:
  18. names:
  19. - default
  20. - kube-system
  21. bearer_token: <secret>
  22. tls_config:
  23. ca_file: /u01/prometheus/tls/ca/4cb22d45db4d490c85a56a2257211a3c.crt
  24. cert_file: /u01/prometheus/tls/cert/e5dff40c73a34587b8708b63e43965f8.crt
  25. key_file: /u01/prometheus/tls/key/c43bd44814114b30bb1662d98d74199f.key
  26. insecure_skip_verify: true
  27. relabel_configs:
  28. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
  29. separator: ;
  30. regex: kube-system;etcd;http-metrics
  31. replacement: $1
  32. action: keep
  33. - separator: ;
  34. regex: __meta_kubernetes_service_label_(.+)
  35. replacement: $1
  36. action: labelmap
  37. - source_labels: [__meta_kubernetes_namespace]
  38. separator: ;
  39. regex: (.*)
  40. target_label: kubernetes_namespace
  41. replacement: $1
  42. action: replace
  43. - source_labels: [__meta_kubernetes_service_name]
  44. separator: ;
  45. regex: (.*)
  46. target_label: kubernetes_name
  47. replacement: $1
  48. action: replace
  49. - separator: ;
  50. regex: (.*)
  51. target_label: PersonNum
  52. replacement: hops
  53. action: replace
  54. - separator: ;
  55. regex: (.*)
  56. target_label: _tenant_id
  57. replacement: "0"
  58. action: replace

验证预览

image.png

结论

通过自动发现识别ETCD端点:

  • 需要在抓取任务维度配置CA证书、客户端证书、客户端秘钥,ETCD需要SSL客户端双向验证(抓取任务维度的认证和TLS才是用于实际抓取时的认证和TLS,K8S维度的是用于访问API Server服务器做服务发现的)