下载地址:https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
Alertmanager 安装
下载地址:https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
将安装包上传至服务器
[root@Prometheus software]# tar -zxvf alertmanager-0.24.0.linux-amd64.tar.gz -C /usr/local/
[root@Prometheus software]# cd /usr/local/
[root@Prometheus local]# mv alertmanager-0.24.0.linux-amd64 alertmanager
创建启动文件:
[root@Prometheus local]# vim /usr/lib/systemd/system/alertmanager.service
[Unit]Description=alertmanager SystemAfter=network.target[Service]Type=simpleExecStart=/usr/local/alertmanager/alertmanager --config.file /usr/local/alertmanager/alertmanager.ymlExecReload=/bin/kill -HUP $MAINPIDKillMode=processRestart=on-failure[Install]WantedBy=multi-user.target
启动服务:
[root@Prometheus local]# systemctl daemon-reload
[root@Prometheus local]# systemctl start alertmanager
[root@Prometheus local]# systemctl enable —now alertmanager
[root@Prometheus local]# systemctl status alertmanager
prometheus集成alertmanager:
[root@Prometheus local]# mkdir -p /usr/local/prometheus/rules
[root@Prometheus local]# vim /usr/local/prometheus/prometheus.yml
# 1. 修改 prometheus.yml 的 alerting 部分# 2. 修改 prometheus.yml 的 rule_files 部分alerting:alertmanagers:- static_configs:- targets:- 192.168.10.111:9093 # AlterManager 地址# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:- "rules/*.yml" # 定义告警文件# - "first_rules.yml"# - "second_rules.yml"
Alertmanager 邮件告警
163邮箱开启SMTP
配置邮件发送
[root@Prometheus local]# vim /usr/local/alertmanager/alertmanager.yml
global:resolve_timeout: 5msmtp_smarthost: 'smtp.163.com:25' # 使用 163 邮箱服务器发邮件smtp_from: 'muyaobin@163.com' # 发件人,填写你的 163 邮箱smtp_auth_username: 'muyaobin@163.com' # 与上面保持一致smtp_auth_password: 'LKPZVCYSLHVEHGRI' # 你 163 邮箱的授权码smtp_require_tls: false # 不使用加密认证route:group_by: ['alertname']group_wait: 10sgroup_interval: 10srepeat_interval: 1h # 1 小时重复一次报警receiver: 'email'receivers:- name: 'email'email_configs:- to: 'muyaobin@163.com'send_resolved: true # 故障恢复后发送邮件inhibit_rules: # 告警抑制规则- source_match:serverity: 'critical'target_match:serverity: 'warning'equal: ['alertname','dev','instance']
添加报警规则
[root@Prometheus local]# vim /usr/local/prometheus/rules/host_monitor.yml
groups:- name: node-downrules:- alert: node-downexpr: up == 0for: 5s # 评估等待时间,可选参数。用于表示只有当触发条件持续一段时间后才发送告警。在等待期间新产生告警的状态为 pendinglabels: # 自定义标签,允许用户指定要附加到告警上的一组附加标签severity: 1team: nodeannotations:summary: "{{$labels.instance}}"description: "{{$labels.instance}}:job {{$labels.job}} 已经停止5分钟以上"
验证配置文件
[root@Prometheus local]# /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
重启 Prometheus
[root@Prometheus local]# cd /usr/local/prometheus
[root@Prometheus prometheus]# pkill prometheus
[root@Prometheus prometheus]# lsof -i:9090
[root@Prometheus prometheus]# ./prometheus &
触发告警
当kill掉node_exporter的时候,会发送告警邮件
当重启node_exporter的时候,会发送恢复邮件
优化告警模板
新建模板文件:
[root@Prometheus ~]# vim /usr/local/alertmanager/email.tmpl
{{ define "email.to.html" }}{{ range .Alerts }}=========start==========<br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} 级 <br>告警类型: {{ .Labels.alertname }} <br>故障主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>触发时间: {{ .StartsAt }} <br>=========end==========<br>{{ end }}{{ end }}
修改配置文件使用模板:
[root@Prometheus ~]# vim /usr/local/alertmanager/alertmanager.yml
global:resolve_timeout: 5msmtp_smarthost: 'smtp.163.com:25' # 使用 163 邮箱服务器发邮件smtp_from: 'muyaobin@163.com' # 发件人,填写你的 163 邮箱smtp_auth_username: 'muyaobin@163.com' # 与上面保持一致smtp_auth_password: 'LKPZVCYSLHVEHGRI' # 你 163 邮箱的授权码smtp_require_tls: false # 不使用加密认证templates:- '/usr/local/alertmanager/email.tmpl'route:group_by: ['alertname']group_wait: 10sgroup_interval: 10srepeat_interval: 1h # 1 小时重复一次报警receiver: 'email' # 注意和下面的 receivers.name 同名receivers:- name: 'email'email_configs:- to: 'muyaobin@163.com'html: '{{ template "email.to.html" . }}' # 使用模板的方式发送send_resolved: true # 故障恢复后发送邮件inhibit_rules: # 告警抑制规则- source_match:serverity: 'critical'target_match:serverity: 'warning'equal: ['alertname','dev','instance']
alertmanager.yml配置文件检查:
[root@Prometheus ~]# /usr/local/alertmanager/amtool check-config /usr/local/alertmanager/alertmanager.yml
[root@Prometheus ~]# systemctl restart alertmanager
模拟宕机告警:
当kill掉node_exporter的时候,会发送告警邮件
修改模板添加恢复信息:
[root@Prometheus ~]# vim /usr/local/alertmanager/email.tmpl
{{ define "email.to.html" }}{{ if gt (len .Alerts.Firing) 0 }}{{ range .Alerts }}@告警信息: <br>告警程序: prometheus_alert <br>告警级别: {{ .Labels.severity }} 级 <br>告警类型: {{ .Labels.alertname }} <br>故障主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} 停止工作 <br>触发时间: {{ .StartsAt.Local.Format "2006-01-02 15:04:05" }} <br>{{ end }}{{ end }}{{ if gt (len .Alerts.Resolved) 0 }}{{ range .Alerts }}@恢复信息: <br>告警主机:{{ .Labels.instance }} <br>告警主题:{{ .Annotations.summary }} 恢复正常 <br>恢复时间: {{ .EndsAt.Local.Format "2006-01-02 15:04:05" }} <br>{{ end }}{{ end }}{{ end }}
模拟宕机告警:
当kill掉node_exporter的时候,会发送告警邮件
当重启node_exporter的时候,会发送恢复邮件



