1:部署dingtalk
先钉钉发起群聊,建个群,点击群设置,智能群助手,添加一个自定义机器人,复制生成的token与secret
#一定要用1.3.0,高低都不好使docker pull timonwong/prometheus-webhook-dingtalk:v1.3.0docker run -d -p 8060:8060 --name dingtalk timonwong/prometheus-webhook-dingtalk:v1.3.0 --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=5fc1108fe6cdde33853b7bde77c27fe4c643bc83754dfbd4138763240d11a9d0"sudo docker exec -it dingtalk /bin/shcd /etc/prometheus-webhook-dingtalkvi config.yml##################改成内容如下 STARTtargets:webhook1:url: https://oapi.dingtalk.com/robot/send?access_token=5fc1108fe6cdde33853b7bde77c27fe4c643bc83754dfbd4138763240d11a9d0# secret for signaturesecret: SEC3c78c556e8ea85b71d050e12570e78be0aae3f950a8e3ac3eaf9eeacbeaf6f47##################改成内容如下 ENDdocker restart dingtalk
2:部署alertmanager
#mkdir -p /home/user00/prometheus/alertmanager,建个目录放数据docker run -p 9093:9093 --name alertmanager -v /home/user00/prometheus/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml -d prom/alertmanagery#alertmanager.yml 配置如下,9093########################################################route:receiver: 'default-receiver'group_by: ['alertname']group_wait: 10sgroup_interval: 10srepeat_interval: 1hroutes:- receiver: bnzq-javagroup_wait: 10smatch:team: BNZQ-javareceivers:- name: 'default-receiver'webhook_configs:- url: http://127.0.0.1:8060/dingtalk/webhook1/sendsend_resolved: true- name: 'bnzq-java'webhook_configs:- url: http://127.0.0.1:8060/dingtalk/webhook1/sendsend_resolved: true
3:配置grafana,如果要对ES日志监控,或其它各种非Prometheus数据源的数据进行监控告警,就需要配置这个
4:配置Prometheus接入alertmanagers
#加入一下配置alerting:alertmanagers:- static_configs:- targets: ['127.0.0.1:9093']
5:写告警规则,demo->监控bnzq-account这台机器宕机
groups:- name: testDownGrouprules:- alert: appDownexpr: up{job="bnzq-account",instance="10.5.2.130:19201"}==1for: 10slabels:team: BNZQ-javaannotations:summary: "Instance {{ $labels.instance }} 宕机了"description: "机器{{ $labels.job }} - {{ $labels.instance }}已经宕机超过10秒了!"
把规则文件搞到容器内
docker cp /home/test_down.yml alertmanager:/homesudo docker exec -it prometheus/bin/shmv /home/test_down.yml /etc/prometheus#修改配置文件prometheus.yml,加入规则rule_files:- "test_down.yml"
重启后等一会可以看到触发一个警报
