1:部署dingtalk
    先钉钉发起群聊,建个群,点击群设置,智能群助手,添加一个自定义机器人,复制生成的token与secret

    1. #一定要用1.3.0,高低都不好使
    2. docker pull timonwong/prometheus-webhook-dingtalk:v1.3.0
    3. docker run -d -p 8060:8060 --name dingtalk timonwong/prometheus-webhook-dingtalk:v1.3.0 --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=5fc1108fe6cdde33853b7bde77c27fe4c643bc83754dfbd4138763240d11a9d0"
    4. sudo docker exec -it dingtalk /bin/sh
    5. cd /etc/prometheus-webhook-dingtalk
    6. vi config.yml
    7. ##################改成内容如下 START
    8. targets:
    9. webhook1:
    10. url: https://oapi.dingtalk.com/robot/send?access_token=5fc1108fe6cdde33853b7bde77c27fe4c643bc83754dfbd4138763240d11a9d0
    11. # secret for signature
    12. secret: SEC3c78c556e8ea85b71d050e12570e78be0aae3f950a8e3ac3eaf9eeacbeaf6f47
    13. ##################改成内容如下 END
    14. docker restart dingtalk

    2:部署alertmanager

    1. #mkdir -p /home/user00/prometheus/alertmanager,建个目录放数据
    2. docker run -p 9093:9093 --name alertmanager -v /home/user00/prometheus/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml -d prom/alertmanagery
    3. #alertmanager.yml 配置如下,9093
    4. ########################################################
    5. route:
    6. receiver: 'default-receiver'
    7. group_by: ['alertname']
    8. group_wait: 10s
    9. group_interval: 10s
    10. repeat_interval: 1h
    11. routes:
    12. - receiver: bnzq-java
    13. group_wait: 10s
    14. match:
    15. team: BNZQ-java
    16. receivers:
    17. - name: 'default-receiver'
    18. webhook_configs:
    19. - url: http://127.0.0.1:8060/dingtalk/webhook1/send
    20. send_resolved: true
    21. - name: 'bnzq-java'
    22. webhook_configs:
    23. - url: http://127.0.0.1:8060/dingtalk/webhook1/send
    24. send_resolved: true

    3:配置grafana,如果要对ES日志监控,或其它各种非Prometheus数据源的数据进行监控告警,就需要配置这个
    image.png

    4:配置Prometheus接入alertmanagers

    1. #加入一下配置
    2. alerting:
    3. alertmanagers:
    4. - static_configs:
    5. - targets: ['127.0.0.1:9093']

    5:写告警规则,demo->监控bnzq-account这台机器宕机

    1. groups:
    2. - name: testDownGroup
    3. rules:
    4. - alert: appDown
    5. expr: up{job="bnzq-account",instance="10.5.2.130:19201"}==1
    6. for: 10s
    7. labels:
    8. team: BNZQ-java
    9. annotations:
    10. summary: "Instance {{ $labels.instance }} 宕机了"
    11. description: "机器{{ $labels.job }} - {{ $labels.instance }}已经宕机超过10秒了!"

    把规则文件搞到容器内

    1. docker cp /home/test_down.yml alertmanager:/home
    2. sudo docker exec -it prometheus/bin/sh
    3. mv /home/test_down.yml /etc/prometheus
    4. #修改配置文件prometheus.yml,加入规则
    5. rule_files:
    6. - "test_down.yml"

    重启后等一会可以看到触发一个警报
    image.png