黑盒监控既已用户的身份测试服务的外部可见性,常见的黑盒监控包括 HTTP探针、 TCP探针 等用于检测站点或者服务的可访问性,以及访问效率等。

黑盒相比于白盒的不同之处在于黑盒是以故障为导向的,当故障发生时能够快速发现故障;而白盒是侧重主动发现或预测潜在的问题。

Blackbox Exporter 是 Prometheus 社区提供的官方黑盒监控解决方案,其允许用户通过:HTTP、 HTTPS、 DNS、 TCP 以及 ICMP 的方式对网络进行探测。

同样首先需要在 Kubernetes 集群中运行 blackbox-exporter 服务,同样通过一个 ConfigMap 资源对象来为 Blackbox 提供配置,如下所示:(blackbox.yaml)

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: blackbox
  5. namespace: monitoring
  6. spec:
  7. selector:
  8. app: blackbox
  9. ports:
  10. - port: 9115
  11. targetPort: 9115
  12. ---
  13. apiVersion: v1
  14. kind: ConfigMap
  15. metadata:
  16. name: blackbox-config
  17. namespace: monitoring
  18. data:
  19. blackbox.yaml: |-
  20. modules:
  21. http_2xx:
  22. prober: http
  23. timeout: 10s
  24. http:
  25. valid_http_versions: ["HTTP/1.1", "HTTP/2"]
  26. valid_status_codes: [200]
  27. method: GET
  28. preferred_ip_protocol: "ip4"
  29. http_post_2xx:
  30. prober: http
  31. timeout: 10s
  32. http:
  33. valid_http_versions: ["HTTP/1.1", "HTTP/2"]
  34. valid_status_codes: [200]
  35. method: POST
  36. preferred_ip_protocol: "ip4"
  37. tcp_connect:
  38. prober: tcp
  39. timeout: 10s
  40. ping:
  41. prober: icmp
  42. timeout: 5s
  43. icmp:
  44. preferred_ip_protocol: "ip4"
  45. dns:
  46. prober: dns
  47. dns:
  48. transport_protocol: "tcp"
  49. preferred_ip_protocol: "ip4"
  50. query_name: "kubernetes.defalut.svc.cluster.local"
  51. ---
  52. apiVersion: apps/v1
  53. kind: Deployment
  54. metadata:
  55. name: blackbox
  56. namespace: monitoring
  57. spec:
  58. selector:
  59. matchLabels:
  60. app: blackbox
  61. template:
  62. metadata:
  63. labels:
  64. app: blackbox
  65. spec:
  66. containers:
  67. - name: blackbox
  68. image: prom/blackbox-exporter:v0.16.0
  69. args:
  70. - "--config.file=/etc/blackbox_exporter/blackbox.yaml"
  71. - "--log.level=error"
  72. ports:
  73. - containerPort: 9115
  74. volumeMounts:
  75. - name: config
  76. mountPath: /etc/blackbox_exporter
  77. volumes:
  78. - name: config
  79. configMap:
  80. name: blackbox-config

然后创建资源清单:

  1. # kubectl apply -f .
  2. service/blackbox created
  3. configmap/blackbox-config created
  4. deployment.apps/blackbox created

然后在Prometheus中加入Blackbox的抓取配置(因为我们是用的Prometheus operator部署的,所以就以add的形式加入配置,如下):
prometheus-additional.yaml

  1. - job_name: 'kubernetes-service-endpoints'
  2. kubernetes_sd_configs:
  3. - role: endpoints
  4. relabel_configs:
  5. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
  6. action: keep
  7. regex: true
  8. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
  9. action: replace
  10. target_label: __scheme__
  11. regex: (https?)
  12. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
  13. action: replace
  14. target_label: __metrics_path__
  15. regex: (.+)
  16. - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  17. action: replace
  18. target_label: __address__
  19. regex: ([^:]+)(?::\d+)?;(\d+)
  20. replacement: $1:$2
  21. - action: labelmap
  22. regex: __meta_kubernetes_service_label_(.+)
  23. - source_labels: [__meta_kubernetes_namespace]
  24. action: replace
  25. target_label: kubernetes_namespace
  26. - source_labels: [__meta_kubernetes_service_name]
  27. action: replace
  28. target_label: kubernetes_name
  29. - job_name: "kubernetes-service-dns"
  30. metrics_path: /probe
  31. params:
  32. module: [dns]
  33. static_configs:
  34. - targets:
  35. - kube-dns.kube-system:53
  36. relabel_configs:
  37. - source_labels: [__address__]
  38. target_label: __param_target
  39. - source_labels: [__param_target]
  40. target_label: instance
  41. - target_label: __address__
  42. replacement: blackbox.monitoring:9115

然后重新创建secret:

  1. # kubectl delete secret additional-config -n monitoring
  2. # kubectl -n monitoring create secret generic additional-config --from-file=prometheus-additional.yaml

然后重新加载一下prometheus

  1. # curl -X POST "http://10.68.215.41:9090/-/reload"

现在就可以在targets中看到已经发现。
image.png

然后在Graph查看probe_success{job=”kubernetes-service-dns”}
image.png

除了 DNS 的配置外,上面我们还配置了一个 http_2xx 的模块,也就是 HTTP 探针,HTTP 探针是进行黑盒监控时最常用的探针之一,通过 HTTP 探针能够对网站或者 HTTP 服务建立有效的监控,包括其本身的可用性,以及用户体验相关的如响应时间等等。除了能够在服务出现异常的时候及时报警,还能帮助系统管理员分析和优化网站体验。这里我们可以使用他来对 http 服务进行检测。
因为前面已经给 Blackbox 配置了 http_2xx 模块,所以这里只需要在 Prometheus 中加入抓取任务,这里我们可以结合前面的 Prometheus 的服务发现功能来做黑盒监控,对于 Service 和 Ingress 类型的服务发现,用来进行黑盒监控是非常合适的,配置如下所示:

  1. - job_name: 'kubernetes-service-endpoints'
  2. kubernetes_sd_configs:
  3. - role: endpoints
  4. relabel_configs:
  5. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
  6. action: keep
  7. regex: true
  8. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
  9. action: replace
  10. target_label: __scheme__
  11. regex: (https?)
  12. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
  13. action: replace
  14. target_label: __metrics_path__
  15. regex: (.+)
  16. - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  17. action: replace
  18. target_label: __address__
  19. regex: ([^:]+)(?::\d+)?;(\d+)
  20. replacement: $1:$2
  21. - action: labelmap
  22. regex: __meta_kubernetes_service_label_(.+)
  23. - source_labels: [__meta_kubernetes_namespace]
  24. action: replace
  25. target_label: kubernetes_namespace
  26. - source_labels: [__meta_kubernetes_service_name]
  27. action: replace
  28. target_label: kubernetes_name
  29. - job_name: "kubernetes-service-dns"
  30. metrics_path: /probe
  31. params:
  32. module: [dns]
  33. static_configs:
  34. - targets:
  35. - kube-dns.kube-system:53
  36. relabel_configs:
  37. - source_labels: [__address__]
  38. target_label: __param_target
  39. - source_labels: [__param_target]
  40. target_label: instance
  41. - target_label: __address__
  42. replacement: blackbox.monitoring:9115
  43. - job_name: "kubernetes-http-service"
  44. metrics_path: /probe
  45. params:
  46. module: [http_2xx]
  47. kubernetes_sd_configs:
  48. - role: service
  49. relabel_configs:
  50. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_http_probe]
  51. action: keep
  52. regex: true
  53. - source_labels: [__address__]
  54. target_label: __param_target
  55. - target_label: __address__
  56. replacement: blackbox.monitoring:9115
  57. - source_labels: [__param_target]
  58. target_label: instance
  59. - action: labelmap
  60. regex: __meta_kubernetes_service_label_(.+)
  61. - source_labels: [__meta_kubernetes_namespace]
  62. target_label: kubernetes_namespace
  63. - source_labels: [__meta_kubernetes_service_name]
  64. target_label: kubernetes_name
  65. - job_name: "kubernetes-ingresses"
  66. metrics_path: /probe
  67. params:
  68. module: [http_2xx]
  69. kubernetes_sd_configs:
  70. - role: ingress
  71. relabel_configs:
  72. - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_http_probe]
  73. action: keep
  74. regex: true
  75. - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
  76. regex: (.+);(.+);(.+)
  77. replacement: ${1}://${2}${3}
  78. target_label: __param_target
  79. - target_label: __address__
  80. replacement: blackbox.monitoring:9115
  81. - source_labels: [__param_target]
  82. target_label: instance
  83. - action: labelmap
  84. regex: __meta_kubernetes_service_label_(.+)
  85. - source_labels: [__meta_kubernetes_namespace]
  86. target_label: kubernetes_namespace
  87. - source_labels: [__meta_kubernetes_service_name]
  88. target_label: kubernetes_name

然后重新创建secret:

  1. # kubectl delete secret additional-config -n monitoring
  2. # kubectl -n monitoring create secret generic additional-config --from-file=prometheus-additional.yaml

然后重新加载一下prometheus

  1. # curl -X POST "http://10.68.215.41:9090/-/reload"

但是我们发现日志有报错如下
image.png
这是因为RBAC权限不足导致的,我们修改prometheus-clusterRole.yaml

  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRole
  3. metadata:
  4. name: prometheus-k8s
  5. rules:
  6. - apiGroups:
  7. - ""
  8. resources:
  9. - nodes/metrics
  10. - configmaps
  11. verbs:
  12. - get
  13. - apiGroups:
  14. - ""
  15. resources:
  16. - nodes
  17. - pods
  18. - services
  19. - endpoints
  20. - nodes/proxy
  21. verbs:
  22. - get
  23. - list
  24. - watch
  25. - apiGroups:
  26. - "extensions"
  27. resources:
  28. - ingresses
  29. verbs:
  30. - get
  31. - list
  32. - watch
  33. - nonResourceURLs:
  34. - /metrics
  35. verbs:
  36. - get

然后重新创建

  1. # kubectl apply -f prometheus-clusterRole.yaml

然我们我们在面板查看
image.png

但是现在还没有任何数据,这是因为上面是匹配 __meta_kubernetes_ingress_annotation_prometheus_io_http_probe 这个元信息,所以如果我们需要让这两个任务发现的话需要在 Service 或者 Ingress 中配置对应的 annotation:

  1. annotations:
  2. prometheus.io/http-probe: "true"

比如:

  1. apiVersion: extensions/v1beta1
  2. kind: Deployment
  3. metadata:
  4. name: redis
  5. namespace: kube-ops
  6. spec:
  7. template:
  8. metadata:
  9. annotations:
  10. prometheus.io/scrape: "true"
  11. prometheus.io/port: "9121"
  12. labels:
  13. app: redis
  14. spec:
  15. containers:
  16. - name: redis
  17. image: redis:4
  18. resources:
  19. requests:
  20. cpu: 100m
  21. memory: 100Mi
  22. ports:
  23. - containerPort: 6379
  24. - name: redis-exporter
  25. image: oliver006/redis_exporter:latest
  26. resources:
  27. requests:
  28. cpu: 100m
  29. memory: 100Mi
  30. ports:
  31. - containerPort: 9121
  32. ---
  33. kind: Service
  34. apiVersion: v1
  35. metadata:
  36. name: redis
  37. namespace: kube-ops
  38. annotations:
  39. prometheus.io/scrape: "true"
  40. prometheus.io/port: "9121"
  41. prometheus.io/http-probe: "true"
  42. spec:
  43. selector:
  44. app: redis
  45. ports:
  46. - name: redis
  47. port: 6379
  48. targetPort: 6379
  49. - name: prom
  50. port: 9121
  51. targetPort: 9121

然后在WEB页面查看如下:
image.png

在Graph上也可以看到监控的指标:
image.png

如果你需要对监控的路径、端口这些做控制,我们可以自己在 relabel_configs 中去做相应的配置,比如我们想对 Service 的黑盒做自定义配置,可以想下面这样配置:

  1. - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_namespace, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port, __meta_kubernetes_service_annotation_prometheus_io_http_probe_path]
  2. action: replace
  3. target_label: __param_target
  4. regex: (.+);(.+);(.+);(.+)
  5. replacement: $1.$2:$3$4

这样我们就需要在 Service 中配置这样的 annotation 了:

  1. annotation:
  2. prometheus.io/http-probe: "true"
  3. prometheus.io/http-probe-port: "8080"
  4. prometheus.io/http-probe-path: "/healthz"

ping检测

  1. - job_name: 'ping_all'
  2. scrape_interval: 1m
  3. metrics_path: /probe
  4. params:
  5. module: [ping]
  6. static_configs:
  7. - targets:
  8. - 192.168.1.2
  9. labels:
  10. instance: node2
  11. - targets:
  12. - 192.168.1.3
  13. labels:
  14. instance: node3
  15. relabel_configs:
  16. - source_labels: [__address__]
  17. target_label: __param_target
  18. - target_label: __address__
  19. replacement: 127.0.0.1:9115 # black_exporter地址

http检测

  1. - job_name: 'http_get_all' # blackbox_export module
  2. scrape_interval: 30s
  3. metrics_path: /probe
  4. params:
  5. module: [http_2xx]
  6. static_configs:
  7. - targets:
  8. - https://www.coolops.cn
  9. relabel_configs:
  10. - source_labels: [__address__]
  11. target_label: __param_target
  12. - source_labels: [__param_target]
  13. target_label: instance
  14. - target_label: __address__
  15. replacement: 127.0.0.1:9115 # black_exporter地址

监控主机存活状态

  1. - job_name: node_status
  2. metrics_path: /probe
  3. params:
  4. module: [icmp]
  5. static_configs:
  6. - targets: ['10.165.94.31']
  7. labels:
  8. instance: node_status
  9. group: 'node'
  10. relabel_configs:
  11. - source_labels: [__address__]
  12. target_label: __param_target
  13. - target_label: __address__
  14. replacement: 127.0.0.1:9115 # black_exporter地址

监控端口状态

  1. - job_name: 'prometheus_port_status'
  2. metrics_path: /probe
  3. params:
  4. module: [tcp_connect]
  5. static_configs:
  6. - targets: ['172.19.155.133:8765']
  7. labels:
  8. instance: 'port_status'
  9. group: 'tcp'
  10. relabel_configs:
  11. - source_labels: [__address__]
  12. target_label: __param_target
  13. - source_labels: [__param_target]
  14. target_label: instance
  15. - target_label: __address__
  16. replacement: 127.0.0.1:9115 # black_exporter地址

告警

  1. groups:
  2. - name: example
  3. rules:
  4. - alert: curlHttpStatus
  5. expr: probe_http_status_code{job="blackbox-http"}>=400 and probe_success{job="blackbox-http"}==0
  6. #for: 1m
  7. labels:
  8. docker: number
  9. annotations:
  10. summary: '业务报警: 网站不可访问'
  11. description: '{{$labels.instance}} 不可访问,请及时查看,当前状态码为{{$value}}'

Prometheus 配置文件可以参考官方仓库:https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml
Blackbox 的配置文件可以参考官方参考:https://github.com/prometheus/blackbox_exporter/blob/master/example.yml