需要prometheus和alertmanager通知组件。
具体软件参考:https://www.yuque.com/g/qinxi-cvygi/gndo6n/folder/19640486
正常运行进程如下:
$ ps -ef|grep promeroot 726 1 0 2020 ? 17:25:18 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheusroot 2563 758 0 10:52 ? 00:00:00 /bin/bash /data/shell/monitor_prometheus.shroot 2565 758 0 10:52 ? 00:00:00 /usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox.ymlroot 2566 758 0 10:52 ? 00:00:00 ./usr/local/alertmanager/webhook_dingtalk/dingtalk/prometheus-webhook-dingtalk --ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=8bc2cdc7d19d2448447b40f4c9bb19794dc3af0c572c45016ca6044e7c42361e
prometheus主配置文件:
# my global configglobal:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets: ["localhost:9093"]# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:- /usr/local/prometheus/rules/*.yml# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'static_configs:- targets: ['localhost:9090']labels:instance: prometheus# - job_name: 'hz-p-inner'# static_configs:# - targets: ['198.126.61.194:9100']# labels:# instance: hz-p-inner- job_name: 'blackbox'metrics_path: /probeparams:module: [http_2xx]static_configs:- targets:- https://p.coach.123.com- http://p.bdwechat.123.com/wechat- http://p.coach.123.com- http://klass.api.com/actuator/healthrelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 127.0.0.1:9115- job_name: 'blackbox_http_2xx_post'metrics_path: /probeparams:module: [http_post_2xx]static_configs:- targets:- https://www.123.com/api/new_receive_trial_klass- http://p.coach.123.com/mini_program/verification_code- http://p.coach.123.com/api/loginrelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 127.0.0.1:9115
supervisor守护进程配置:
ls /etc/supervisor/conf.d/alertmanager.conf blackbox_exporter.conf prometheus.conf web-hook-dingtalk.conf
具体守护进程配置文件如下:
$ cat alertmanager.conf
[program:alertmanager]dictory = /usr/local/alertmanagercommand = /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.ymlautostart = trueautorestart = truestartsecs = 3startretries = 20
$ cat blackbox_exporter.conf
[program:blackbox_exporter]dictory = /usr/local/prometheus/blackbox_exportercommand = /usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox.ymlautostart = trueautorestart = truestartsecs = 3startretries = 20
$ cat prometheus.conf
[program:monitor_prometheus]user = rootdictory = /data/shellcommand = /bin/bash /data/shell/monitor_prometheus.shstdout_logfile = /var/log/supervisor/monitor_prometheus.logstdout_logfile_maxbytes = 50MBstdout_logfile_backups = 10autostart = trueautorestart = truestartsecs = 3startretries = 20
$ cat /data/shell/monitor_prometheus.sh
while true;docount=$(ps -ef|grep prometheus.yml| grep -v "grep" | wc -l)echo $countsleep 5if [ $count -eq 0 ]; thenecho "$(date)-" >> /tmp/test.logcurl 'https://oapi.dingtalk.com/robot/send?access_token=a8ca044089002471**********2a7825632631' \-H 'Content-Type: application/json' \-d '{"msgtype": "text","text": {"content": "hz-prome promethues正在重启,Restarting..."}}'nohup /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus &fidone
$ cat web-hook-dingtalk.conf
[program:dingtalk]dictory = /usr/local/alertmanager/webhook_dingtalk/dingtalkcommand = ./usr/local/alertmanager/webhook_dingtalk/dingtalk/prometheus-webhook-dingtalk --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=8bc2cdc7d19d2448447b40f4**********6ca6044e7c42361e"stdout_logfile = /usr/local/alertmanager/webhook_dingtalk/dingtalk/dingtalk.logstdout_logfile_maxbytes = 50MBstdout_logfile_backups = 10autostart = trueautorestart = truestartsecs = 3startretries = 20
