Alertmanager 主要用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,例如邮件、微信、钉钉、Slack 等常用沟通工具,而且很容易做到告警信息进行去重,降噪,分组等,是一款很好用的告警通知系统。
安装 alertmanager
#1、部署alertmanager
官方下载网址:https://prometheus.io/download/
mkdir /usr/local/monitor/ && cd /usr/local/monitor/
wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz
tar xf alertmanager-0.22.2.linux-amd64.tar.gz
#2、通过systemctl管理alertmanager
#增加系统管理软件文件
vim/etc/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/monitor/alertmanager-0.22.2.linux-amd64/alertmanager --config.file=/usr/local/monitor/alertmanager-0.22.2.linux-amd64/alertmanager.yml
Rastart=on-failure
[Install]
WantedBy=multi-user.target
#3、修改alertmanager配置文件
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:8060/dingtalk/ops_dingding/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
#4、重新加载配置,启动服务
systemctl daemon-reload
systemctl enable alertmanager.service
systemctl restart alertmanager.service
systemctl status alertmanager.service
alertmanager默认启动9093和9094端口。
部署prometheus-webhook-dingtalk 钉钉告警插件
1.1、二进制部署
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
1.2、配置prometheus-webhook-dingtalk
$ cp config.example.yml config.yml
## Request timeout
# timeout: 5s
## Customizable templates path
templates: #去掉注释
- contrib/templates/legacy/template.tmpl #去掉注释
## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
# default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
## Targets, previously was known as "profiles"
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# secret for signature
secret: SEC000000000000000000000
webhook2:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
webhook_legacy:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# Customize template content
message:
# Use legacy template
title: '{{ template "legacy.title" . }}'
text: '{{ template "legacy.content" . }}'
webhook_mention_all:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
all: true
webhook_mention_users:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
mobiles: ['156xxxx8827', '189xxxx8325']
1.3、配置 告警模板
$ vim contrib/templates/legacy/template.tmpl
{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}
{{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
{{ end }}{{ end }}
{{ define "___text_alert_list" }}{{ range . }}
---
**告警主题:** {{ .Labels.alertname | upper }}
**告警级别:** {{ .Labels.severity | upper }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**事件信息:** {{ range .Annotations.SortedPairs }} {{ .Value | markdown | html }}
{{ end }}
**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{ define "___text_alertresovle_list" }}{{ range . }}
---
**告警主题:** {{ .Labels.alertname | upper }}
**告警级别:** {{ .Labels.severity | upper }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**结束时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
**事件信息:** {{ range .Annotations.SortedPairs }} {{ .Value | markdown | html }}
{{ end }}
**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{/* Default */}}
{{ define "_default.title" }}{{ template "__subject" . }}{{ end }}
{{ define "_default.content" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ if gt (len .Alerts.Firing) 0 -}}

**========告警触发========**
{{ template "___text_alert_list" .Alerts.Firing }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}

**========告警恢复========**
{{ template "___text_alertresovle_list" .Alerts.Resolved }}
{{- end }}
{{- end }}
{{/* Legacy */}}
{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
{{ define "legacy.content" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ template "__text_alert_list" .Alerts.Firing }}
{{- end }}
{{/* Following names for compatibility */}}
{{ define "_ding.link.title" }}{{ template "_default.title" . }}{{ end }}
{{ define "_ding.link.content" }}{{ template "_default.content" . }}{{ end }}
1.4、获取dingding webhook地址
配置钉钉机器人
创建一个钉钉群,可能需要管理员才能创建
点击群设置→智能群助手→添加机器人
在弹出的页面中选择添加机器人,然后选择自定义,然后点击添加
给机器人取个名字
安全设置这里至少选择一个,这里使用自定义关键字,然后点击完成
复制Webhook地址备用
1.5、 启动服务
$ vim /etc/systemd/system/prometheus-webhook-dingtalk.service
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/usr/local/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=88c328c8ac98*******95a15fe9c9"
[Install]
WantedBy=multi-user.target
注意替换Webhook的地址,这里的 -ding.profile 参数:为了支持同时往多个钉钉自定义机器人发送报警消息,因此 -ding.profile 可以在命令行中指定多次。
systemctl daemon-reload
systemctl enable prometheus-webhook-dingtalk.service
systemctl start prometheus-webhook-dingtalk.service
systemctl status prometheus-webhook-dingtalk.service
prometheus-webhook-dingtalk默认启动8060端口。
注意:有时候我们配置好了,但是钉钉接收不到报警信息,但是单独执行上面的ExecStart启动命令,可以接收到信息,这就和下载下来的 prometheus-webhook-dingtalk 二进制文件的权限有关系了,我们可以给这个文件授权root属组属主 chown root:root /usr/local/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk
配置Prometheus
完整的prometheus安装配置参考本文,增加钉钉报警需要修改prometheus配置文件。
示例:prometheus.yml
......
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
#- "localhost:9093"
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/usr/local/monitor/prometheus-2.29.2/rules/*.yml"
#- "second_rules.yml"
.....
在alertmanager configuration指定alertmanager的地址和端口,然后在rule_files里指定报警规则文件。
修改完prometheus.yml后,重启即可。
经常有触发了报警规则,但是没有报警的情况,有可能是钉钉的webhook中没有添加报警规则中的关键词。
我们可以通过下面命令来测试钉钉机器人能否接收报警通知。
curl 'https://oapi.dingtalk.com/robot/send?access_token=f0c2dd678*****2f75' \
-H 'Content-Type: application/json' \
-d '{"msgtype": "text",
"text": {
"content": "告警shooter钉钉机器人群消息测试"
}
}'