需要prometheus和alertmanager通知组件。
具体软件参考:https://www.yuque.com/g/qinxi-cvygi/gndo6n/folder/19640486
正常运行进程如下:
$ ps -ef|grep prome
root 726 1 0 2020 ? 17:25:18 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus
root 2563 758 0 10:52 ? 00:00:00 /bin/bash /data/shell/monitor_prometheus.sh
root 2565 758 0 10:52 ? 00:00:00 /usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox.yml
root 2566 758 0 10:52 ? 00:00:00 ./usr/local/alertmanager/webhook_dingtalk/dingtalk/prometheus-webhook-dingtalk --ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=8bc2cdc7d19d2448447b40f4c9bb19794dc3af0c572c45016ca6044e7c42361e
prometheus主配置文件:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- /usr/local/prometheus/rules/*.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
labels:
instance: prometheus
# - job_name: 'hz-p-inner'
# static_configs:
# - targets: ['198.126.61.194:9100']
# labels:
# instance: hz-p-inner
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://p.coach.123.com
- http://p.bdwechat.123.com/wechat
- http://p.coach.123.com
- http://klass.api.com/actuator/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
- job_name: 'blackbox_http_2xx_post'
metrics_path: /probe
params:
module: [http_post_2xx]
static_configs:
- targets:
- https://www.123.com/api/new_receive_trial_klass
- http://p.coach.123.com/mini_program/verification_code
- http://p.coach.123.com/api/login
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
supervisor守护进程配置:
ls /etc/supervisor/conf.d/
alertmanager.conf blackbox_exporter.conf prometheus.conf web-hook-dingtalk.conf
具体守护进程配置文件如下:
$ cat alertmanager.conf
[program:alertmanager]
dictory = /usr/local/alertmanager
command = /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
autostart = true
autorestart = true
startsecs = 3
startretries = 20
$ cat blackbox_exporter.conf
[program:blackbox_exporter]
dictory = /usr/local/prometheus/blackbox_exporter
command = /usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox.yml
autostart = true
autorestart = true
startsecs = 3
startretries = 20
$ cat prometheus.conf
[program:monitor_prometheus]
user = root
dictory = /data/shell
command = /bin/bash /data/shell/monitor_prometheus.sh
stdout_logfile = /var/log/supervisor/monitor_prometheus.log
stdout_logfile_maxbytes = 50MB
stdout_logfile_backups = 10
autostart = true
autorestart = true
startsecs = 3
startretries = 20
$ cat /data/shell/monitor_prometheus.sh
while true;do
count=$(ps -ef|grep prometheus.yml| grep -v "grep" | wc -l)
echo $count
sleep 5
if [ $count -eq 0 ]; then
echo "$(date)-" >> /tmp/test.log
curl 'https://oapi.dingtalk.com/robot/send?access_token=a8ca044089002471**********2a7825632631' \
-H 'Content-Type: application/json' \
-d '
{"msgtype": "text",
"text": {
"content": "hz-prome promethues正在重启,Restarting..."
}
}'
nohup /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus &
fi
done
$ cat web-hook-dingtalk.conf
[program:dingtalk]
dictory = /usr/local/alertmanager/webhook_dingtalk/dingtalk
command = ./usr/local/alertmanager/webhook_dingtalk/dingtalk/prometheus-webhook-dingtalk --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=8bc2cdc7d19d2448447b40f4**********6ca6044e7c42361e"
stdout_logfile = /usr/local/alertmanager/webhook_dingtalk/dingtalk/dingtalk.log
stdout_logfile_maxbytes = 50MB
stdout_logfile_backups = 10
autostart = true
autorestart = true
startsecs = 3
startretries = 20