prometheus的黑盒监控
常规的各种exporter都是和需要监控的机器一起安装的,如果需要监控一些tcp端口和七层应用层的状态呢,这个时候就需要黑盒监控了,不需要安装在目标机器上即可从外部去监控。<br /> 9115是它的http端点的默认监听端口,blackbox.yml它的配置文件里以基础的http、dns、tcp、icmp等prober定制配置出各种监测模块(module),在prometheus server的配置文件里声明用哪个模块去探测哪个targets,下面以docker-compose启动一组实例,docker的网络自带dns,所以里面全部用名字替代ip
docker-compose.yml
version: '3.4'services:prometheus:image: prom/prometheus:v2.15.1hostname: prometheusvolumes:- /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime:ro- ./prometheus.yml:/etc/prometheus/prometheus.yml- ./alert.rules:/etc/prometheus/alert.rules- prometheus_data:/prometheuscommand:- '--web.enable-lifecycle'- '--config.file=/etc/prometheus/prometheus.yml'ports:- '9090:9090'networks:prometheus:aliases:- prometheuslogging:driver: json-fileoptions:max-file: '3'max-size: 100mnode-exporter:image: prom/node-exporter:v0.18.1hostname: node-exportervolumes:- /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime:ro- /proc:/host/proc:ro- /sys:/host/sys:ro- /:/host/rootfs:rocommand:- '--path.procfs=/host/proc'- '--path.sysfs=/host/sys'ports:- '9100:9100'networks:prometheus:aliases:- exporterlogging:driver: json-fileoptions:max-file: '3'max-size: 100mblack-exporter:image: prom/blackbox-exporter:v0.16.0hostname: black-exportervolumes:- /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime:ro- ./blackbox.yml:/config/blackbox.ymlcommand:- '--config.file=/config/blackbox.yml'ports:- '9115:9115'networks:prometheus:aliases:- black-exporterlogging:driver: json-fileoptions:max-file: '3'max-size: 100mgrafana:image: grafana/grafana:6.5.2hostname: grafanavolumes:- /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime:ro- grafana_data:/var/lib/grafanaenvironment:- GF_SECURITY_ADMIN_PASSWORD=passdepends_on:- prometheusports:- '3000:3000'networks:prometheus:aliases:- grafanalogging:driver: json-fileoptions:max-file: '3'max-size: 100mnetworks:prometheus:driver: bridgevolumes:grafana_data: {}prometheus_data: {}
prometheus.yml
global:scrape_interval: 5sexternal_labels:monitor: 'my-monitor'scrape_configs:- job_name: 'prometheus'static_configs:- targets: ['prometheus:9090']- job_name: 'balck_box'scrape_interval: 10sstatic_configs:- targets: ['black-exporter:9115']- job_name: 'balck_test'metrics_path: /probeparams:module: [tcp_connect]static_configs:- targets:- 120.52.137.xxx:81- xxxxxx:123relabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: black-exporter:9115
balckbox.yml
modules:http_2xx_example: # 模块名字,符合规则随便命名即可prober: http # 探针类型timeout: 5shttp:valid_http_versions: ["HTTP/1.1", "HTTP/2"]valid_status_codes: [] # Defaults to 2xxmethod: GETheaders:Host: vhost.example.comAccept-Language: en-USno_follow_redirects: falsefail_if_ssl: falsefail_if_not_ssl: falsefail_if_matches_regexp:- "Could not connect to database"fail_if_not_matches_regexp:- "Download the latest version here"tls_config:insecure_skip_verify: falsepreferred_ip_protocol: "ip4" # defaults to "ip6"ip_protocol_fallback: false # no fallback to "ip6"http_post_2xx:prober: httptimeout: 5shttp:method: POSTheaders:Content-Type: application/jsonbody: '{}'http_basic_auth_example:prober: httptimeout: 5shttp:method: POSTheaders:Host: "login.example.com"basic_auth:username: "username"password: "mysecret"http_custom_ca_example:prober: httphttp:method: GETtls_config:ca_file: "/certs/my_cert.crt"tls_connect_tls:prober: tcptimeout: 5stcp:tls: truetcp_connect:prober: tcptimeout: 5simap_starttls:prober: tcptimeout: 5stcp:query_response:- expect: "OK.*STARTTLS"- send: ". STARTTLS"- expect: "OK"- starttls: true- send: ". capability"- expect: "CAPABILITY IMAP4rev1"smtp_starttls:prober: tcptimeout: 5stcp:query_response:- expect: "^220 ([^ ]+) ESMTP (.+)$"- send: "EHLO prober"- expect: "^250-STARTTLS"- send: "STARTTLS"- expect: "^220"- starttls: true- send: "EHLO prober"- expect: "^250-AUTH"- send: "QUIT"ssh_banner:prober: tcptcp:query_response:- expect: "^SSH-"irc_banner_example:prober: tcptimeout: 5stcp:query_response:- send: "NICK prober"- send: "USER prober prober prober :prober"- expect: "PING :([^ ]+)"send: "PONG ${1}"- expect: "^:[^ ]+ 001"icmp_example:prober: icmptimeout: 5sicmp:preferred_ip_protocol: "ip4"source_ip_address: "127.0.0.1"dns_udp_example:prober: dnstimeout: 5sdns:query_name: "www.prometheus.io"query_type: "A"valid_rcodes:- NOERRORvalidate_answer_rrs:fail_if_matches_regexp:- ".*127.0.0.1"fail_if_not_matches_regexp:- "www.prometheus.io.\t300\tIN\tA\t127.0.0.1"validate_authority_rrs:fail_if_matches_regexp:- ".*127.0.0.1"validate_additional_rrs:fail_if_matches_regexp:- ".*127.0.0.1"dns_soa:prober: dnsdns:query_name: "prometheus.io"query_type: "SOA"dns_tcp_example:prober: dnsdns:transport_protocol: "tcp" # defaults to "udp"preferred_ip_protocol: "ip4" # defaults to "ip6"query_name: "www.prometheus.io"
上面的探针定义参考官方的demo,其中在prometheus的配置文件里探测那部分是最终版本,如果要简单的探测可以先下面这样写
- job_name: 'balck_test'metrics_path: /probeparams:module: [tcp_connect]target:- 120.52.137.xxx:81- xxxx:44static_configs:- targets: ['black-exporter:9115']
params声明的参数将会是发送到黑盒的http接口当作参数,向black-exporter:9115 的web路由/probe发送参数包含module和探测的target.<br /> 所以我们可以用curl模拟http(prometheus拉取metrics也是发同样的http请求)请求能看到metrics信息输出,下面是一个curl获取黑盒监控使用ping模块去检测192.168.1返回的metrics的例子
$ curl "http://127.0.0.1:9115/probe?module=ping&target=192.168.1.2"# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds# TYPE probe_dns_lookup_time_seconds gaugeprobe_dns_lookup_time_seconds 2.6453e-05# HELP probe_duration_seconds Returns how long the probe took to complete in seconds# TYPE probe_duration_seconds gaugeprobe_duration_seconds 0.000351649# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6# TYPE probe_ip_protocol gaugeprobe_ip_protocol 4# HELP probe_success Displays whether or not the probe was a success# TYPE probe_success gaugeprobe_success 1
我提供的文件里涉及到relabel,向target发送请求,但是因为直接relabel替换最终会向黑盒探测的端口发送,这是常见的两种写法。但是如果不用relabel下,我们想给target加一些label呢,而params不支持添加labels,所以我们得利用prometheus的relabel实现,也就是我提供的文件里这部分
- job_name: 'balck_test'metrics_path: /probeparams:module: [tcp_connect]static_configs:- targets:- 120.52.137.xxx:81- xxxxxx:123relabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: black-exporter:9115
- 第一步获取targets的实例
address值写进__param_target,__param_<name>形式的标签里的name和它的值会被添加到发送到黑盒的http的header的params当作键值,例如__param_module对应params里的module - 第2步,获取
__param_target的值,并覆写到instance标签中 - 第3步,覆写Target实例的
__address__标签值为BlockBox Exporter实例的访问地址 第4部,向black-exporter:9115 发送请求获取实例的metrics信息
另外我们这边直接监控suse发现内核hang死了四层还是可达的,ssh的话和telnet都会回应openssh的字样,所以`ssh_banner`模块检测是认定为存活的,决定监控应用层。询问同事故障的现象是他用sap的客户端登陆报错,然后我上去tcpdump抓包导入wireshark把他登陆的http请求头写成了模块,后面内核hang死完全及时告警
```yaml http_post_sap: prober: http timeout: 3s http: method: POST headers:
POST: '/SAPControl HTTP/1.1'Accept: 'text/xml, text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2'Content-Type: 'text/xml; charset=utf-8'Cache-Control: 'no-cache'Pragma: 'no-cache'User-Agent: 'Java/1.8.0_172'Connection: 'keep-alive'Content-Length: '200'
body: |
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:q0="urn:SAPControl"><SOAP-ENV:Header/><SOAP-ENV:Body><q0:GetInstanceProperties/></SOAP-ENV:Body></SOAP-ENV:Envelope>
```yaml- job_name: 'hana_up'scrape_interval: 4smetrics_path: /probeparams:module: ['http_post_sap']static_configs:- targets:- "http://10.20.4.14:50013/SAPControl"- "http://10.20.4.4:50013/SAPControl"- "http://10.20.4.9:50013/SAPControl"relabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: black-exporter:9115
SSL证书过期时间监控
http的get请求就自带了证书过期时间的metrics值,主要是表达式
modules:http_2xx:prober: httptimeout: 10shttp:preferred_ip_protocol: "ip4" ##如果http监测是使用ipv4 就要写上,目前国内使用ipv6很少。
scrape_configs:- job_name: 'blackbox'metrics_path: /probeparams:module: [http_2xx] # Look for a HTTP 200 response.static_configs:- targets:- example.com # Target to proberelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: black-exporter:9115
告警规则
groups:- name: ssl_expiry.rulesrules:- alert: SSLCertExpiringSoonexpr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time() < 86400 * 30for: 20m
