1. 配置并启动 alertmanager
## 创建 alertmanager 数据目录mkdir -p /data/alertmanager/chmod -R 777 /data/alertmanager## 编辑 alertmanager 配置文件# 配置的详细说明,见官方文档: https://prometheus.io/docs/alerting/configuration/# 请参考附件1中的配置模板编写配置文件: /etc/alertmanager/config.yml## docker 启动 alertmanagerdocker run -d -p 9093:9093 \-v /etc/alertmanager/config.yml:/etc/alertmanager/config.yml \-v /data/alertmanager:/data/alertmanager \--name alertmanager \--restart=always \quay.io/prometheus/alertmanager \--config.file=/etc/alertmanager/config.yml \--storage.path=/data/alertmanager
2. prometheus 中加入alertmanager 配置
在 prometheus 的配置文件中加入如下配置:
# Alertmanager配置alerting:alertmanagers:- static_configs:- targets: ["localhost:9093"] # 设定alertmanager和prometheus交互的接口,即alertmanager监听的ip地址和端口
# 重启 prometheusdocker restart prometheus
3. prometheus 中配置rules
2 在 prometheus 配置文件加入 alert rules 配置
# alertmanager rulesrule_files:- "/etc/prometheus/rules/*.yml"
3 加入两条配置规则vim  /etc/prometheus/rules/testAlert.yml
groups:- name: ServiceStatus #规则组名称rules:- alert: ServiceStatusAlert #单个规则的名称expr: up == 0 #匹配规则, up==0, 1表示在线,0表示down机for: 10s #持续时间labels: #标签project: zhidaoAPP #自定义lablesannotations: #告警正文summary: "Instance {{ $labels.instance }} down"description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."- name: hostStatsAlertrules:- alert: hostCpuUsageAlertexpr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.85for: 1mlabels:severity: pageannotations:summary: "Instance {{ $labels.instance }} CPU usgae high"description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"- alert: hostMemUsageAlertexpr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85for: 1mlabels:severity: pageannotations:summary: "Instance {{ $labels.instance }} MEM usgae high"description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
4 重启 prometheus
docker restart prometheus
查看 prometheus 中的告警规则
http://{YOU_prometheus_IP}:9090/alerts
查看alert 信息
http://{YOU_alertmanager_IP}:9093/#/alerts 
4.增加钉钉告警功能
启动dingtalk 插件
docker run -d --name dingtalk \-p 8060:8060 \--restart=always \docker.io/timonwong/prometheus-webhook-dingtalk \--ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=139c51c0c3f8dabf9d0ea50b042ef6593bea61340a7d116ef2ce51e4e538b8a9" \--ding.profile="webhook2=https://oapi.dingtalk.com/robot/send?access_token=yyyyyyyyyyy" # 可以写多个
配置 alertmanager, 增加
参考文档:
附件:
1. Alertmanager 配置模板
配置示例: https://raw.githubusercontent.com/prometheus/alertmanager/master/doc/examples/simple.yml
vim /etc/alertmanager/config.yml
global:# 邮箱配置smtp_smarthost: smtp.ym.163.com:587 # 如果是企业邮箱一定要配置587端口, 456端口邮件会发送失败smtp_from: alert@xxx.comsmtp_auth_username: alert@xxx.comsmtp_auth_identity: alert@xxx.comsmtp_auth_password: XXXXXXXroute:## default receiverreceiver: 'default'group_wait: 30sgroup_interval: 1mrepeat_interval: 4hgroup_by: ['claster','alertname']routes:- receiver: webhookgroup_wait: 10smatch: # match_re: 正则匹配alertname: ServiceStatusAlert # 定义告警的匹配标签,来确定告警组的标实receivers:- name: defaultemail_configs:- to: 152xxxx8332@163.comsend_resolved: true# webhook——钉钉- name: webhookwebhook_configs:- url: http://{prometheus-webhook-dingtalk_IP}:8060/dingtalk/ops_dingding/send
