参考文档:

Kubernetes 集群监控 kube-prometheus 自动发现
Valiant/prometheus-operator

通过 serviceMonitor 一样可以做服务发现
怎么自定义namespace_用ServiceMonitor自定义暴露指标

1、创建服务发现规则

以 pod 的方式进行服务发现 prometheus-additional.yaml

  1. - job_name: 'kubernetes-pods'
  2. kubernetes_sd_configs:
  3. - role: pod
  4. relabel_configs:
  5. - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  6. action: keep
  7. regex: true
  8. - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
  9. action: replace
  10. regex: (.+)
  11. target_label: __metrics_path__
  12. - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  13. action: replace
  14. regex: ([^:]+)(?::\d+)?;(\d+)
  15. replacement: ${1}:${2}
  16. target_label: __address__
  17. - source_labels: [__meta_kubernetes_namespace]
  18. action: replace
  19. target_label: kubernetes_namespace
  20. - source_labels: [__meta_kubernetes_pod_name]
  21. action: replace
  22. target_label: kubernetes_pod_name

在需要被监控pod中添加

prometheus.io/path: ""
prometheus.io/port: "port"
prometheus.io/scrape: "true"

image.png

2、创建Secret 对象

将上面文件直接保存为 prometheus-additional.yaml,然后通过这个文件创建一个对应的 Secret 对象:

$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created

删除语句 kubectl delete secret additional-configs -n monitoring 更新secret需要先删除,而后重新创建,可参考其他方法。

3、创建资源对象

然后我们需要在声明 prometheus 的资源对象文件中通过 additionalScrapeConfigs 属性添加上这个额外的配置:

prometheus-prometheus.yaml:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  image: quay.io/prometheus/prometheus:v2.22.1
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.22.1
  additionalScrapeConfigs:   #新增配置
    name: additional-configs
    key: prometheus-additional.yaml

添加完成后,直接更新 prometheus 这个 CRD 资源对象即可:

kubectl apply -f prometheus-prometheus.yaml

4、创建 RBAC 权限

prometheus-clusterRole.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

上面的权限规则中我们可以看到明显没有对 Service 或者 Pod 的 list 权限,添加上我们需要的权限即可:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - services
      - endpoints
      - pods
      - nodes/proxy
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - configmaps
      - nodes/metrics
    verbs:
      - get
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

更新上面的 ClusterRole 这个资源对象,然后重建下 Prometheus 的所有 Pod,就可以看到 targets 页面下面有 kubernetes-endpoints 这个监控任务了:http://10.1.9.124:31001/targets
image.png

查看配置 http://10.1.9.124:31001/config
刷新 promethues 上的 config,将会看到配置已经生效:
image.png