环境信息
kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:51Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:36:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
术语解释
- Endpoint:抓取目标实际访问的抓取地址,组成为:
__scheme__
+__address__
+__metrics_path__
。因此,很多标签重写都是重写的其中之一,或者全部,例如,__address_
以及__metrics_path_
两个是高频重写点
原理解释
集群组件本身都提供了/metrics
端点暴露自身指标。
监控准备
以kubeadm启动的k8s集群中,etcd是以static pod的形式启动的,默认没有service及对应的endpoint可供集群内的prometheus访问。所以首先创建一个用来为prometheus提供接口的service(endpoint)。
了解Static Pod
静态Pod由特定节点上的kubelet daemon程序直接管理,API Server不会监控它们。不像其他Pod会被Control Plane(例如,Deployment)管理,而是由kubelet负责监视静态Pod(如果失败了,由kubelet负责重启)。
静态Pod总是绑定到特定节点的Kubelet进程上。kubelet会自动尝试为每一个静态Pod在API Server上创建一个镜像Pod,也就是说,在API Server上可以看到静态Pod,但是却不能管理。
具体可参考官方文档。
静态Pod有两种创建方式:
- 本地配置文件方式:kubelet 启动时由 —pod-manifest-path 指定的目录(默认/etc/kubernetes/manifests),kubelet会定期扫描这个目录,并根据这个目录下的.yaml或.json文件进行创建和更新操作
- 如果把pod的yaml描述文件放到这个目录中,等kubelet扫描到文件,会自动在本机创建出来 pod;
- 如果把pod的yaml文件更改了,kubelet也会识别到,会自动更新 pod;
- 如果把pod的yaml文件删除了,kubelet会自动删除掉pod;
- 因为静态pod不能被api-server直接管理,所以它的更新删除操作不能由kubectl来执行,只能直接修改或删除文本文件。
- HTTP仓库配置文件方式:
--manifest-url
,kubelet定期从url获取文件,其余操作和第一种方式一样。
本地配置文件方式:解析
查看kubelet进程的配置:
ps -ef | grep kubelet
root 561 1 6 11月26 ? 10:05:57 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --hostname-override=172.23.16.106 --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 --root-dir=/var/lib/kubelet
查看配置文件--config=/var/lib/kubelet/config.yaml
:
more /var/lib/kubelet/config.yaml
---
...
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 4h0m0s
syncFrequency: 1m0s
tlsCertFile: /var/lib/kubelet/pki/kubelet.crt
tlsPrivateKeyFile: /var/lib/kubelet/pki/kubelet.key
volumeStatsAggPeriod: 1m0s
查看现有StaticPod:
ll /etc/kubernetes/manifests
总用量 20
-rw-r--r-- 1 root root 2354 11月 5 18:01 etcd-external.yaml
-rw------- 1 root root 3059 11月 5 18:03 kube-apiserver.yaml
-rw------- 1 root root 2997 11月 5 18:03 kube-controller-manager.yaml
-rw------- 1 root root 1325 11月 5 18:03 kube-scheduler.yaml
-rw-r--r-- 1 root root 1157 11月 5 18:00 lb-kube-apiserver.yaml
如何暴露访问?除了kube-apiserver
之外(Kubernetes会在default
命名空间下自动创建一个Service kubernetes
指向这个静态Pod),其他静态Pod都需要手工创建Service,以etcd-external.yaml
为例:
cat etcd-external.yaml
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --name=etcd-172.23.16.106
- --advertise-client-urls=https://172.23.16.106:2379
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-client-cert-auth=true
- --listen-peer-urls=https://172.23.16.106:2380
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-client-urls=https://127.0.0.1:2379,https://172.23.16.106:2379
- --initial-cluster-state=existing
- --initial-advertise-peer-urls=https://172.23.16.106:2380
- --initial-cluster=etcd-172.23.16.106=https://172.23.16.106:2380
- --initial-cluster-token=etcd-cluster-token
- --client-cert-auth=true
- --snapshot-count=10000
- --data-dir=/var/lib/etcd
# 推荐一小时压缩一次数据这样可以极大的保证集群稳定
- --auto-compaction-retention=1
# Etcd Raft消息最大字节数,官方推荐是10M
- --max-request-bytes=10485760
# ETCD db数据大小,默认是2G,官方推荐是8G
- --quota-backend-bytes=8589934592
image: registry.aliyuncs.com/google_containers/etcd:3.4.3-0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
failureThreshold: 8
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
- mountPath: /etc/localtime
name: localtime
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /etc/localtime
type: File
name: localtime
status: {}
可以看到标签信息进而进行标签选择。
同时,需要注意,ETCD需要双向SSL验证,从如下配置--peer-client-cert-auth=true
可知:
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-client-cert-auth=true
因此,请求端点时,需要配置客户端证书及秘钥,也即
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
及--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
。
服务准备
- 执行脚本
mkdir -p /u01/repo/exporter
cd /u01/repo/exporter
vim etcd.yaml
kubectl apply -f etcd.yaml
- YAML脚本
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: etcd
labels:
component: etcd
annotations:
prometheus.io/scrape: "true"
spec:
selector:
component: etcd
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 2379
targetPort: 2379
protocol: TCP
服务验证
cd /etc/kubernetes/pki/etcd
## 方式一:通过-k参数指定不验证服务器端证书
curl -k --cert ./peer.crt --key ./peer.key https://172.23.16.106:2379/metrics --header "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc0QXUtSGlaNGFoTjVuRFhRTF8zREN3T0VFQ2loYkpzNXYzTHdXZ3FVMEEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJob3BzLWFkbWluLXRva2VuLXo5Yjh0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImhvcHMtYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI5ZWMzYjE5ZC1hYWQxLTRmMDUtOGRlNC1mMGRmZTM4NjViZTYiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aG9wcy1hZG1pbiJ9.I5UM5xUbQ6Qc9B-stcZp022tXIYjKSf4VtbF-FzCtCBJOy8gQdqoxfWj5gQa7O5TYgBH_YXxpy-q-J217wupxPeH_owNUGOBGj3eB8Sbs-BUVINqltGAAS9Mmzh2a-ApP9u1OsvEYafXf0vYAYSCBWnNyhUsljx4o-Yo6CmdyQh0f6FqEhcZMKXbyasVVcmZelswFUftLM4BRrxaF3JGvlw_PG7HAvIPDhQqC43gzI_m6xtVzjpGBSe_GTf7TY5_7Cs2c6u6ZowPVId0KNaCDTzlg0Jm5kIhETU8rbYp1kwunguabh1bH7PUG7m1daFsvkPktUW3Uc6QXsZcbB4qDw"
## 方式二:通过--cacert指定CA证书
curl --cacert ./ca.crt --cert ./peer.crt --key ./peer.key https://172.23.16.106:2379/metrics --header "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc0QXUtSGlaNGFoTjVuRFhRTF8zREN3T0VFQ2loYkpzNXYzTHdXZ3FVMEEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJob3BzLWFkbWluLXRva2VuLXo5Yjh0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImhvcHMtYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI5ZWMzYjE5ZC1hYWQxLTRmMDUtOGRlNC1mMGRmZTM4NjViZTYiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aG9wcy1hZG1pbiJ9.I5UM5xUbQ6Qc9B-stcZp022tXIYjKSf4VtbF-FzCtCBJOy8gQdqoxfWj5gQa7O5TYgBH_YXxpy-q-J217wupxPeH_owNUGOBGj3eB8Sbs-BUVINqltGAAS9Mmzh2a-ApP9u1OsvEYafXf0vYAYSCBWnNyhUsljx4o-Yo6CmdyQh0f6FqEhcZMKXbyasVVcmZelswFUftLM4BRrxaF3JGvlw_PG7HAvIPDhQqC43gzI_m6xtVzjpGBSe_GTf7TY5_7Cs2c6u6ZowPVId0KNaCDTzlg0Jm5kIhETU8rbYp1kwunguabh1bH7PUG7m1daFsvkPktUW3Uc6QXsZcbB4qDw"
## 方式三:经过验证,即使不指定Token,也可以正常访问
curl --cacert ./ca.crt --cert ./peer.crt --key ./peer.key https://172.23.16.106:2379/metrics
---
...
# HELP etcd_cluster_version Which version is running. 1 for 'cluster_version' label with current cluster version
# TYPE etcd_cluster_version gauge
etcd_cluster_version{cluster_version="3.4"} 1
# HELP etcd_debugging_disk_backend_commit_rebalance_duration_seconds The latency distributions of commit.rebalance called by bboltdb backend.
# TYPE etcd_debugging_disk_backend_commit_rebalance_duration_seconds histogram
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.001"} 247054
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.002"} 247065
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.004"} 247071
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.008"} 247072
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.016"} 247075
注意:curl
所有参数可通过curl --help
查看。
-k
:忽略验证服务器端证书,-k, --insecure Allow connections to SSL sites without certs (H)
--cacert ./ca.crt
:指定验证服务器端证书的根证书,--cacert FILE CA certificate to verify peer against (SSL)
--cert ./peer.crt
:指定客户端证书,-E, --cert CERT[:PASSWD] Client certificate file and password (SSL)
--key ./peer.key
:指定客户端证书秘钥,--key KEY Private key file name (SSL/SSH)
--header "Authorization: Bearer $Token"
:指定Service Account的认证Token,正常来讲需要通过最小权限原则设置,但是,首先,本例中最终验证的结果是可以不配置;一般来讲如果需要配置,且不知道该用什么样的最小权限集合,则配置cluster-admin
集群管理员角色至Service Account
即可,本例中是hops-admin
核心配置项解析
抓取任务配置
任务编码 | 指标路径 | 协议 | 请求参数 | 认证授权及TLS | 说明 |
---|---|---|---|---|---|
K8S-ETCD | /metrics | https | - | Token认证,禁用验证服务器端证书 | 实际的抓取地址需要通过HTTPS访问 |
K8S服务发现配置
资源类型 | API Server地址 | 命名空间 | 认证授权及TLS | 说明 |
---|---|---|---|---|
endpoints | https://172.23.16.106:8443 | default,kube-system | Token认证,禁用验证服务器端证书 | API Server地址此处配置的是负载均衡地址,命名空间正常讲应该留空,代表所有空间 |
标签重写配置
K8S-ETCD
服务发现出来的__address__
本身就是ETCD端点的IP,同时,基于静态Pod的原理,此处是不需要做重写的,反而可能无法通过apiserver proxy URLs
的方式正常访问。
序号 | 重写动作 | 来源标签名称 | 分隔符 | 正则匹配 | Hash模数 | 目标标签名称 | 目标标签替换值 |
---|---|---|---|---|---|---|---|
1 | 保持 | [meta_kubernetes_namespace, meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] | kube-system;etcd;http-metrics | ||||
2 | 标签映射 | __meta_kubernetes_service_label_(.+) |
|||||
3 | 替换 | [__meta_kubernetes_namespace] |
kubernetes_namespace | ||||
4 | 替换 | [__meta_kubernetes_service_name] |
kubernetes_name |
最终抓取任务配置
- job_name: K8S-ETCD
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
file_sd_configs:
- files:
- /u01/prometheus/target/nodes/K8S-ETCD_targets_hosts.json
refresh_interval: 5m
kubernetes_sd_configs:
- api_server: https://172.23.16.106:8443
role: endpoints
bearer_token: <secret>
tls_config:
insecure_skip_verify: true
namespaces:
names:
- default
- kube-system
bearer_token: <secret>
tls_config:
ca_file: /u01/prometheus/tls/ca/4cb22d45db4d490c85a56a2257211a3c.crt
cert_file: /u01/prometheus/tls/cert/e5dff40c73a34587b8708b63e43965f8.crt
key_file: /u01/prometheus/tls/key/c43bd44814114b30bb1662d98d74199f.key
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: kube-system;etcd;http-metrics
replacement: $1
action: keep
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: kubernetes_name
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: PersonNum
replacement: hops
action: replace
- separator: ;
regex: (.*)
target_label: _tenant_id
replacement: "0"
action: replace
验证预览
结论
通过自动发现识别ETCD端点:
- 需要在抓取任务维度配置CA证书、客户端证书、客户端秘钥,ETCD需要SSL客户端双向验证(抓取任务维度的认证和TLS才是用于实际抓取时的认证和TLS,K8S维度的是用于访问API Server服务器做服务发现的)