Alertmanager 主要用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,例如邮件、微信、钉钉、Slack 等常用沟通工具,而且很容易做到告警信息进行去重,降噪,分组等,是一款很好用的告警通知系统。
安装 alertmanager
#1、部署alertmanager
官方下载网址:https://prometheus.io/download/
mkdir /usr/local/monitor/ && cd /usr/local/monitor/wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gztar xf alertmanager-0.22.2.linux-amd64.tar.gz
#2、通过systemctl管理alertmanager
#增加系统管理软件文件vim/etc/systemd/system/alertmanager.service[Unit]Description=alertmanagerAfter=network.target[Service]Type=simpleUser=rootExecStart=/usr/local/monitor/alertmanager-0.22.2.linux-amd64/alertmanager --config.file=/usr/local/monitor/alertmanager-0.22.2.linux-amd64/alertmanager.ymlRastart=on-failure[Install]WantedBy=multi-user.target
#3、修改alertmanager配置文件
route:group_by: ['alertname']group_wait: 30sgroup_interval: 5mrepeat_interval: 1hreceiver: 'web.hook'receivers:- name: 'web.hook'webhook_configs:- url: 'http://localhost:8060/dingtalk/ops_dingding/send'send_resolved: trueinhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']
#4、重新加载配置,启动服务
systemctl daemon-reloadsystemctl enable alertmanager.servicesystemctl restart alertmanager.servicesystemctl status alertmanager.service
alertmanager默认启动9093和9094端口。
部署prometheus-webhook-dingtalk 钉钉告警插件
1.1、二进制部署
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
1.2、配置prometheus-webhook-dingtalk
$ cp config.example.yml config.yml## Request timeout# timeout: 5s## Customizable templates pathtemplates: #去掉注释- contrib/templates/legacy/template.tmpl #去掉注释## You can also override default template using `default_message`## The following example to use the 'legacy' template from v0.3.0# default_message:# title: '{{ template "legacy.title" . }}'# text: '{{ template "legacy.content" . }}'## Targets, previously was known as "profiles"targets:webhook1:url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx# secret for signaturesecret: SEC000000000000000000000webhook2:url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxwebhook_legacy:url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx# Customize template contentmessage:# Use legacy templatetitle: '{{ template "legacy.title" . }}'text: '{{ template "legacy.content" . }}'webhook_mention_all:url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxmention:all: truewebhook_mention_users:url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxmention:mobiles: ['156xxxx8827', '189xxxx8325']
1.3、配置 告警模板
$ vim contrib/templates/legacy/template.tmpl{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}{{ define "__text_alert_list" }}{{ range . }}**Labels**{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}{{ end }}**Annotations**{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}{{ end }}**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }}){{ end }}{{ end }}{{ define "___text_alert_list" }}{{ range . }}---**告警主题:** {{ .Labels.alertname | upper }}**告警级别:** {{ .Labels.severity | upper }}**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}**事件信息:** {{ range .Annotations.SortedPairs }} {{ .Value | markdown | html }}{{ end }}**事件标签:**{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }}> - {{ .Name }}: {{ .Value | markdown | html }}{{ end }}{{ end }}{{ end }}{{ end }}{{ define "___text_alertresovle_list" }}{{ range . }}---**告警主题:** {{ .Labels.alertname | upper }}**告警级别:** {{ .Labels.severity | upper }}**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}**结束时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}**事件信息:** {{ range .Annotations.SortedPairs }} {{ .Value | markdown | html }}{{ end }}**事件标签:**{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }}> - {{ .Name }}: {{ .Value | markdown | html }}{{ end }}{{ end }}{{ end }}{{ end }}{{/* Default */}}{{ define "_default.title" }}{{ template "__subject" . }}{{ end }}{{ define "_default.content" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**{{ if gt (len .Alerts.Firing) 0 -}}**========告警触发========**{{ template "___text_alert_list" .Alerts.Firing }}{{- end }}{{ if gt (len .Alerts.Resolved) 0 -}}**========告警恢复========**{{ template "___text_alertresovle_list" .Alerts.Resolved }}{{- end }}{{- end }}{{/* Legacy */}}{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}{{ define "legacy.content" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**{{ template "__text_alert_list" .Alerts.Firing }}{{- end }}{{/* Following names for compatibility */}}{{ define "_ding.link.title" }}{{ template "_default.title" . }}{{ end }}{{ define "_ding.link.content" }}{{ template "_default.content" . }}{{ end }}
1.4、获取dingding webhook地址
配置钉钉机器人创建一个钉钉群,可能需要管理员才能创建点击群设置→智能群助手→添加机器人在弹出的页面中选择添加机器人,然后选择自定义,然后点击添加给机器人取个名字安全设置这里至少选择一个,这里使用自定义关键字,然后点击完成复制Webhook地址备用
1.5、 启动服务
$ vim /etc/systemd/system/prometheus-webhook-dingtalk.service[Unit]Description=prometheus-webhook-dingtalkAfter=network-online.target[Service]Restart=on-failureExecStart=/usr/local/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=88c328c8ac98*******95a15fe9c9"[Install]WantedBy=multi-user.target
注意替换Webhook的地址,这里的 -ding.profile 参数:为了支持同时往多个钉钉自定义机器人发送报警消息,因此 -ding.profile 可以在命令行中指定多次。
systemctl daemon-reloadsystemctl enable prometheus-webhook-dingtalk.servicesystemctl start prometheus-webhook-dingtalk.servicesystemctl status prometheus-webhook-dingtalk.service
prometheus-webhook-dingtalk默认启动8060端口。
注意:有时候我们配置好了,但是钉钉接收不到报警信息,但是单独执行上面的ExecStart启动命令,可以接收到信息,这就和下载下来的 prometheus-webhook-dingtalk 二进制文件的权限有关系了,我们可以给这个文件授权root属组属主 chown root:root /usr/local/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk
配置Prometheus
完整的prometheus安装配置参考本文,增加钉钉报警需要修改prometheus配置文件。
示例:prometheus.yml
......# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets: ['localhost:9093']#- "localhost:9093"# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:- "/usr/local/monitor/prometheus-2.29.2/rules/*.yml"#- "second_rules.yml".....
在alertmanager configuration指定alertmanager的地址和端口,然后在rule_files里指定报警规则文件。
修改完prometheus.yml后,重启即可。
经常有触发了报警规则,但是没有报警的情况,有可能是钉钉的webhook中没有添加报警规则中的关键词。
我们可以通过下面命令来测试钉钉机器人能否接收报警通知。
curl 'https://oapi.dingtalk.com/robot/send?access_token=f0c2dd678*****2f75' \-H 'Content-Type: application/json' \-d '{"msgtype": "text","text": {"content": "告警shooter钉钉机器人群消息测试"}}'
