前言:昨天已经大概介绍了Promethus+Grafana如何搭建,今天继续讲下Promethus报警如何设置,Promethus通过AlertManager实现报警。

一、安装AlertManager

AlertMnager安装方式基本同exporter,解压,修改配置文件,启动即可,下载地址:https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz

二、配置文件

alertmanager的配置文件目录在解压出来的文件夹内,我这里的路径在/usr/local/alertmanager/alertmanager.yml,配置如下:

  1. [root@db-monitor-01 alertmanager]# cat alertmanager.yml
  2. global:
  3. smtp_smarthost: 'xxx:25'
  4. smtp_from: 'xxx@8531.cn'
  5. smtp_auth_username: 'xxx@8531.cn'
  6. smtp_auth_password: 'xxx'
  7. smtp_require_tls: false
  8. templates:
  9. - '/usr/local/alertmanager/template/*.tmpl'
  10. route:
  11. group_by: ['alertname']
  12. repeat_interval: 1m
  13. receiver: xucl
  14. receivers:
  15. - name: 'xucl'
  16. email_configs:
  17. - to: 'xxx@8531.cn'
  18. html: '{{ template "alert.html" . }}'
  19. headers: { Subject: " {{ .CommonAnnotations.summary }}" }

三、告警规则

memory_over.yml

  1. [root@db-monitor-01 rules]# cat /usr/local/prometheus/rules/memory_over.yml
  2. groups:
  3. - name: NodeMemoryUsage
  4. rules:
  5. - alert: NodeMemoryUsage
  6. expr: round((node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100) > 80
  7. for: 1m
  8. labels:
  9. user: xucl
  10. annotations:
  11. summary: "{{$labels.instance}}: High Memory usage detected"
  12. description: "{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})"
  13. value: "{{ $value }}"

node_down.yml

  1. [root@db-monitor-01 rules]# cat /usr/local/prometheus/rules/node_down.yml
  2. groups:
  3. - name: InstanceDown
  4. rules:
  5. - alert: InstanceDown
  6. expr: up == 0
  7. for: 1m
  8. labels:
  9. user: xucl
  10. annotations:
  11. summary: "Instance {{ $labels.instance }} down"
  12. description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

四、修改promethus配置文件

在原先的promethus.yml最后增加如下

  1. alerting:
  2. alertmanagers:
  3. - static_configs:
  4. - targets: ["localhost:9093"]
  5. rule_files:
  6. - "rules/memory_over.yml"
  7. - "rules/node_down.yml"

五、模板文件

  1. [root@db-monitor-01 template]# cat /usr/local/alertmanager/template/alert.tmpl
  2. {{ define "alert.html" }}
  3. <style type="text/css">
  4. table
  5. {
  6. border-collapse: collapse;
  7. margin: 0 auto;
  8. text-align: center;
  9. }
  10. table td, table th
  11. {
  12. border: 1px solid #cad9ea;
  13. color: #666;
  14. height: 30px;
  15. }
  16. table thead th
  17. {
  18. background-color: #CCE8EB;
  19. width: 100px;
  20. }
  21. table tr:nth-child(odd)
  22. {
  23. background: #fff;
  24. }
  25. table tr:nth-child(even)
  26. {
  27. background: #F5FAFA;
  28. }
  29. </style>
  30. <table width="90%" class="table">
  31. <tr><td>报警项</td>
  32. <td>主机</td>
  33. <td>报警阀值</td>
  34. <td>开始时间</td>
  35. </tr>
  36. {{ range $i, $alert := .Alerts }}
  37. <tr><td>{{ index $alert.Labels "alertname" }}</td>
  38. <td>{{ index $alert.Labels "instance" }}</td>
  39. <td>{{ index $alert.Annotations "value" }}</td>
  40. <td>{{ $alert.StartsAt }}</td>
  41. </tr>
  42. {{ end }}
  43. </table>
  44. {{ end }}

六、启动测试

首先启动AlertManager
然后启动Promethus

  1. ./alertmanager --log.level=debug
  2. ./prometheus --config.file=prometheus.yml --storage.tsdb.path="/storage/data" --storage.tsdb.retention=30d

登陆平台查看
【MySQL】Promethus监控报警-AlertManager - 图1
看到报警规则已经生效
再看下报警
【MySQL】Promethus监控报警-AlertManager - 图2

可以尝试调低一点阈值,收到报警邮件如下
【MySQL】Promethus监控报警-AlertManager - 图3