prometheus 的高级监控方案, 请参考官方文档 https://prometheus.io/docs/instrumenting/exporters/

prometheus数据写到es

可以使用 prometheus beat 将数据传送到es 中。

promethuesbeat 项目地址: https://github.com/infonova/prometheusbeat

  1. # promethuesbeat 项目 docker 启动
  2. docker run -d \
  3. --restart always \
  4. --name prometheusbeat \
  5. -p 8080:8080 \
  6. -v /etc/prometheusbeat/prometheusbeat.yml:/prometheusbeat.yml \
  7. infonova/prometheusbeat:latest
  8. # 在 prometheus 中加下配置
  9. remote_write:
  10. url: "http://{prometheusbeat_IP}:8080/prometheus"

prometheus 之 SNMP 监控

下面的方法虽然可以采集到数据,但是没有一个好的 grafana dashboard. 建议监控网络流量还是用 cacti 比较好。

参考文档: https://blog.csdn.net/YUKEKECHEN/article/details/85960248

安装

  1. # 安装 snmp_export
  2. # 项目地址: https://github.com/prometheus/snmp_exporter
  3. yum -y install net-snmp
  4. docker run -d \
  5. --restart always \
  6. --name snmp_export \
  7. -p 9116:9116 \
  8. prom/snmp-exporter
  9. # 在prometheus 中加如下配置:
  10. - job_name: 'snmp'
  11. static_configs:
  12. - targets:
  13. - 192.168.1.1 # 网关地址
  14. labels:
  15. tag: aliyun-hb2-10
  16. metrics_path: /snmp
  17. params:
  18. module: [if_mib]
  19. relabel_configs:
  20. - source_labels: [__address__]
  21. target_label: __param_target
  22. - source_labels: [__param_target]
  23. target_label: instance
  24. - target_label: __address__
  25. replacement: {snmp_export_IP}:9116

验证snmp监控数据

  1. curl http://{snmp_export_IP}:9116/snmp?target={交换机_snmp地址}&module=if_mib
  2. eg: curl http://172.25.20.90:9116/snmp?target=10.10.10.253&module=if_mib

配置snmp告警指标

vim /etc/prmetheus/rules/traffic.yml

  1. groups:
  2. - name: traffic
  3. rules:
  4. - record: traffic_out_bps
  5. expr: (ifHCOutOctets - (ifHCOutOctets offset 1m)) *8/60
  6. #expr: sum by (tag, job, instance, ifIndex) ((ifHCOutOctets - (ifHCOutOctets offset 1m)) *8/60)
  7. #labels:
  8. # instance: ""
  9. # ifIndex: ""
  10. - record: traffic_in_bps
  11. expr: (ifHCInOctets - (ifHCInOctets offset 1m)) *8/60
  12. ### alert
  13. - alert: BeijingProxyTrafficOutProblem
  14. expr: (sum by(tag) (avg_over_time(traffic_out_bps{ifIndex=~"7|9", tag=~"beijing.+"}[5m]) /1024/1024)) >= 200
  15. for: 2m
  16. labels:
  17. level: CRITICAL
  18. annotations:
  19. message: "traffic out has problem (network: , current: Mbps)"
  20. - alert: BeijingProxyTrafficInProblem
  21. expr: (sum by(tag) (avg_over_time(traffic_in_bps{ifIndex=~"7|9", tag=~"beijing.+"}[5m]) /1024/1024)) >= 500
  22. for: 2m
  23. labels:
  24. level: CRITICAL
  25. annotations:
  26. message: "traffic in has problem (network: , current: Mbps)"
  27. - alert: BeijingProxyWanTrafficOutProblem
  28. expr: (sum by(tag) (avg_over_time(traffic_out_bps{ifIndex=~"6|8", tag=~"beijing.+"}[5m]) /1024/1024)) >= 30
  29. for: 2m
  30. labels:
  31. level: CRITICAL
  32. annotations:
  33. message: "traffic out bond0 has problem (network: , current: Mbps)"
  34. - alert: BeijingProxyWanTrafficInProblem
  35. expr: (sum by(tag) (avg_over_time(traffic_in_bps{ifIndex=~"6|8", tag=~"beijing.+"}[5m]) /1024/1024)) >= 30
  36. for: 2m
  37. labels:
  38. level: CRITICAL
  39. annotations:
  40. message: "traffic in bond0 has problem (network: , current: Mbps)"
  41. - alert: AliyunProxyTrafficOutProblem
  42. expr: (sum by(tag) (avg_over_time(traffic_out_bps{ifIndex="2", tag=~"aliyun.+"}[5m]) /1024/1024)) > 200
  43. for: 2m
  44. labels:
  45. level: CRITICAL
  46. annotations:
  47. message: "traffic out has problem (network: , current: Mbps)"
  48. - alert: AliyunProxyTrafficInProblem
  49. expr: (sum by(tag) (avg_over_time(traffic_in_bps{ifIndex="2", tag=~"aliyun.+"}[5m]) /1024/1024)) > 200
  50. for: 2m
  51. labels:
  52. level: CRITICAL
  53. annotations:
  54. message: "traffic in has problem (network: , current: Mbps)"

prometheus 之 网络服务监控


Prometheus提供了一个blackbox_exporter可以实现网络监控,支持http、dns、tcp、icmp等监控

配置文件

blackbox_exporter 配置文件, blackbox.yml

  1. modules:
  2. http_2xx:
  3. prober: http
  4. timeout: 10s
  5. http:
  6. preferred_ip_protocol: "ip4" ##如果http监测是使用ipv4 就要写上,目前国内使用ipv6很少。
  7. http_post_2xx_query: ##用于post请求使用的模块)由于每个接口传参不同 可以定义多个module 用于不同接口(例如此命名为http_post_2xx_query 用于监测query.action接口
  8. prober: http
  9. timeout: 15s
  10. http:
  11. preferred_ip_protocol: "ip4" ##使用ipv4
  12. method: POST
  13. headers:
  14. Content-Type: application/json ##header头
  15. body: '{"hmac":"","params":{"publicFundsKeyWords":"xxx"}}' ##传参
  16. tcp_connect:
  17. prober: tcp
  18. pop3s_banner:
  19. prober: tcp
  20. tcp:
  21. query_response:
  22. - expect: "^+OK"
  23. tls: true
  24. tls_config:
  25. insecure_skip_verify: false
  26. ssh_banner:
  27. prober: tcp
  28. tcp:
  29. query_response:
  30. - expect: "^SSH-2.0-"
  31. irc_banner:
  32. prober: tcp
  33. tcp:
  34. query_response:
  35. - send: "NICK prober"
  36. - send: "USER prober prober prober :prober"
  37. - expect: "PING :([^ ]+)"
  38. send: "PONG ${1}"
  39. - expect: "^:[^ ]+ 001"
  40. # icmp:
  41. # prober: icmp
  42. # timeout: 5s
  43. # icmp:
  44. ping: # icmp 检测模块
  45. prober: icmp
  46. timeout: 5s
  47. icmp:
  48. preferred_ip_protocol: "ip4"

安装

  1. ### 启动blackbox_exporter
  2. docker run -d -p 9115:9115 --name blackbox_exporter \
  3. --restart=always \
  4. -v /etc/prometheus/blackbox.yml:/etc/prometheus/blackbox.yml \
  5. docker.io/prom/blackbox-exporter \
  6. --config.file=/etc/prometheus/blackbox.yml

对于没有使用docker 启动的用户要注意:

  • 一般情况下都会以非root用户运行blackbox_exporter,这里使用的prometheus用户,Wie了使用icmp prober,需要设置CAP_NET_RAW,即对可执行文件blackbox_exporter执行下面的命令:setcap cap_net_raw+ep blackbox_exporter

使用场景

ping 检测

prometheus 中加如下配置

  1. #### 网络服务监控 -- ping ####
  2. - job_name: 'ping_all'
  3. scrape_interval: 1m
  4. metrics_path: /probe
  5. params:
  6. module: [ping]
  7. static_configs:
  8. - targets:
  9. - 192.168.2.107
  10. labels:
  11. instance: test01
  12. - targets:
  13. - 192.168.2.108
  14. labels:
  15. instance: test02
  16. relabel_configs:
  17. - source_labels: [__address__]
  18. target_label: __param_target
  19. - target_label: __address__
  20. replacement: 172.25.20.91:9115 # blackbox_exporter的地址:端口
  • 验证:

curl "http://localhost:9115/probe?module=ping&target=192.168.2.107"
返回的是192.168.2.107这个target的metrics。

http 检测

以前面的最基本的module配置为例,在Prometheus的配置文件中配置使用http_2xx module:

prometheus 加入如下配置:

  1. ### http ###
  2. - job_name: 'blackbox-http'
  3. metrics_path: /probe
  4. params:
  5. module: [http_2xx] # Look for a HTTP 200 response.
  6. static_configs:
  7. - targets:
  8. - http://192.168.3.214:8803/zlead
  9. - http://prometheus.io # Target to probe with https.
  10. - https://prometheus.io # Target to probe with https.
  11. relabel_configs:
  12. - source_labels: [__address__]
  13. target_label: __param_target
  14. - source_labels: [__param_target]
  15. target_label: instance
  16. - target_label: __address__
  17. replacement: 172.25.20.91:9115 # The blackbox exporter's real hostname:port
  • 使配置生效

curl -X POST 172.25.20.90:9090/-/reload

  • 检验:

curl "http://localhost:9115/probe?module=http_2xx&target=prometheus.io" 或:
curl "http://localhost:9115/probe?target=prometheus.io&module=http_2xx&debug=true"

  • 指标中的 probe_success 1: http有效, 0: http无效。 可以通过此指标来进行监控。

TCP 测试

  • 业务组件端口状态监听
  • 应用层协议定义与监听

prometheus 中加入如下配置,

  1. ### TCP 端口监听 ###
  2. # 类似于telnet
  3. - job_name: "blackbox_telnet_port]"
  4. scrape_interval: 5s
  5. metrics_path: /probe
  6. params:
  7. module: [tcp_connect]
  8. static_configs:
  9. - targets: ['192.168.2.108:3306']
  10. labels:
  11. group: 'mysql-server'
  12. - targets: ['192.168.2.208:80']
  13. labels:
  14. group: 'Process status of nginx(main) server'
  15. relabel_configs:
  16. - source_labels: [__address__]
  17. target_label: __param_target
  18. - source_labels: [__param_target]
  19. target_label: instance
  20. - target_label: __address__
  21. replacement: 172.25.20.91:9115 # The blackbox exporter's real hostname:port

POST 测试

  • 接口联通性
  • 监听业务接口地址,用来判断接口是否在线
  • 相关代码块添加到 Prometheus 文件内
  • 对应 blackbox.yml文件的 http_post_2xx_query 模块(监听query.action这个接口)
  1. ### http-post ###
  2. - job_name: 'blackbox_http_2xx_post'
  3. scrape_interval: 10s
  4. metrics_path: /probe
  5. params:
  6. module: [http_post_2xx_query]
  7. static_configs:
  8. - targets:
  9. - http://lphr.com/#/login
  10. labels:
  11. group: 'Interface monitoring'
  12. relabel_configs:
  13. - source_labels: [__address__]
  14. target_label: __param_target
  15. - source_labels: [__param_target]
  16. target_label: instance
  17. - target_label: __address__
  18. replacement: 172.25.20.91:9115 # The blackbox exporter's real hostname:port


告警测试

网络服务告警

icmp、tcp、http、post 监测是否正常可以观察probe_success 这一指标

  • probe_success == 0 ##联通性异常
  • probe_success == 1 ##联通性正常

告警也是判断这个指标是否等于0,如等于0 则触发异常报警

在 /etc/prometheus/rules/ 下增加告警规则: blackbox-alert.yml

  1. groups:
  2. - name: blackbox_network_stats
  3. rules:
  4. - alert: blackbox_network_stats
  5. expr: probe_success == 0
  6. for: 1m
  7. labels:
  8. severity: critical
  9. annotations:
  10. summary: "Instance {{ $labels.instance }} is down"
  11. description: "This requires immediate action!"

https证书预警

http检测除了可以探测http服务的存活外,还可以根据指标probe_ssl_earliest_cert_expiry进行ssl证书有效期预警。

http://{prometheus_IP}:9090/graph 中输入 probe_ssl_earliest_cert_expiry 即可查看
image.png

在 /etc/prometheus/rules/ 下增加告警规则: blackbox-https-alert.yml

  1. groups:
  2. - name: ssl_expiry.rules
  3. rules:
  4. - alert: SSLCertExpiringSoon
  5. expr: probe_ssl_earliest_cert_expiry{job="blackbox-http"} - time() < 86400 * 30 # 过期前30天提醒
  6. for: 10m!

参考文档