Alertmanager通过命令行标志和配置文件进行配置。 虽然命令行标志配置了不可变的系统参数,但配置文件定义了禁止规则,通知路由和通知接收器。

可视化编辑器可以帮助构建路由树。

要查看所有可用的命令行标志,请运行alertmanager -h

Alertmanager可以在运行时重新加载其配置。 如果新配置格式不正确,则不会应用更改并记录错误。 通过向进程发送SIGHUP或向/-/reload端点发送HTTP POST请求来触发配置重新加载。

一、配置文件

指定要加载的配置文件,使用--config.file标志

  1. ./alertmanager --config.file=simple.yml

该文件以YAML格式写入,由下面描述的方案定义。括号表示参数是可选的。对于非列表参数,该值设置为指定的默认值。

通用占位符定义如下:

  • <duration>:与正则表达式匹配的持续时间[0-9]+(ms|[smhdwy])
  • <labelname>:与正则表达式匹配的字符串[a-zA-Z _][a-zA-Z0-9 _]*
  • <labelvalue>:一串unicode字符
  • <filepath>:当前工作目录中的有效路径
  • <boolean>:一个可以取值为truefalse的布尔值
  • <string>:常规字符串
  • <secret>:一个秘密的常规字符串,例如密码
  • <tmpl_string>:在使用前进行模板扩展的字符串
  • <tmpl_secret>:在使用之前进行模板扩展的字符串,它是一个秘密 其他占位符是单独指定的。

可以在此处找到有效的示例文件。

全局配置指定在所有其他配置上下文中有效的参数。它们还可用作其他配置节的默认值。

  1. global:
  2. # ResolveTimeout is the time after which an alert is declared resolved
  3. # if it has not been updated.
  4. [ resolve_timeout: <duration> | default = 5m ]
  5. # The default SMTP From header field.
  6. [ smtp_from: <tmpl_string> ]
  7. # The default SMTP smarthost used for sending emails, including port number.
  8. # Port number usually is 25, or 587 for SMTP over TLS (sometimes referred to as STARTTLS).
  9. # Example: smtp.example.org:587
  10. [ smtp_smarthost: <string> ]
  11. # The default hostname to identify to the SMTP server.
  12. [ smtp_hello: <string> | default = "localhost" ]
  13. [ smtp_auth_username: <string> ]
  14. # SMTP Auth using LOGIN and PLAIN.
  15. [ smtp_auth_password: <secret> ]
  16. # SMTP Auth using PLAIN.
  17. [ smtp_auth_identity: <string> ]
  18. # SMTP Auth using CRAM-MD5.
  19. [ smtp_auth_secret: <secret> ]
  20. # The default SMTP TLS requirement.
  21. [ smtp_require_tls: <bool> | default = true ]
  22. # The API URL to use for Slack notifications.
  23. [ slack_api_url: <secret> ]
  24. [ victorops_api_key: <secret> ]
  25. [ victorops_api_url: <string> | default = "https://alert.victorops.com/integrations/generic/20131114/alert/" ]
  26. [ pagerduty_url: <string> | default = "https://events.pagerduty.com/v2/enqueue" ]
  27. [ opsgenie_api_key: <secret> ]
  28. [ opsgenie_api_url: <string> | default = "https://api.opsgenie.com/" ]
  29. [ hipchat_api_url: <string> | default = "https://api.hipchat.com/" ]
  30. [ hipchat_auth_token: <secret> ]
  31. [ wechat_api_url: <string> | default = "https://qyapi.weixin.qq.com/cgi-bin/" ]
  32. [ wechat_api_secret: <secret> ]
  33. [ wechat_api_corp_id: <string> ]
  34. # The default HTTP client configuration
  35. [ http_config: <http_config> ]
  36. # Files from which custom notification template definitions are read.
  37. # The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
  38. templates:
  39. [ - <filepath> ... ]
  40. # The root node of the routing tree.
  41. route: <route>
  42. # A list of notification receivers.
  43. receivers:
  44. - <receiver> ...
  45. # A list of inhibition rules.
  46. inhibit_rules:
  47. [ - <inhibit_rule> ... ]
二、<route>

路由块定义路由树中的节点及其子节点。 如果未设置,则其可选配置参数将从其父节点继承。

每个警报都在配置的顶级路由中进入路由树,该路由必须匹配所有警报(即没有任何已配置的匹配器)。 然后它遍历子节点。 如果将continue设置为false,则在第一个匹配的子项后停止。 如果匹配节点上的continue为true,则警报将继续与后续兄弟节点匹配。 如果警报与节点的任何子节点都不匹配(没有匹配的子节点,或者不存在),则根据当前节点的配置参数处理警报。

  1. [ receiver: <string> ]
  2. # The labels by which incoming alerts are grouped together. For example,
  3. # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  4. # be batched into a single group.
  5. #
  6. # To aggregate by all possible labels use the special value '...' as the sole label name, for example:
  7. # group_by: ['...']
  8. # This effectively disables aggregation entirely, passing through all
  9. # alerts as-is. This is unlikely to be what you want, unless you have
  10. # a very low alert volume or your upstream notification system performs
  11. # its own grouping.
  12. [ group_by: '[' <labelname>, ... ']' ]
  13. # Whether an alert should continue matching subsequent sibling nodes.
  14. [ continue: <boolean> | default = false ]
  15. # A set of equality matchers an alert has to fulfill to match the node.
  16. match:
  17. [ <labelname>: <labelvalue>, ... ]
  18. # A set of regex-matchers an alert has to fulfill to match the node.
  19. match_re:
  20. [ <labelname>: <regex>, ... ]
  21. # How long to initially wait to send a notification for a group
  22. # of alerts. Allows to wait for an inhibiting alert to arrive or collect
  23. # more initial alerts for the same group. (Usually ~0s to few minutes.)
  24. [ group_wait: <duration> | default = 30s ]
  25. # How long to wait before sending a notification about new alerts that
  26. # are added to a group of alerts for which an initial notification has
  27. # already been sent. (Usually ~5m or more.)
  28. [ group_interval: <duration> | default = 5m ]
  29. # How long to wait before sending a notification again if it has already
  30. # been sent successfully for an alert. (Usually ~3h or more).
  31. [ repeat_interval: <duration> | default = 4h ]
  32. # Zero or more child routes.
  33. routes:
  34. [ - <route> ... ]

例子:

  1. # The root route with all parameters, which are inherited by the child
  2. # routes if they are not overwritten.
  3. route:
  4. receiver: 'default-receiver'
  5. group_wait: 30s
  6. group_interval: 5m
  7. repeat_interval: 4h
  8. group_by: [cluster, alertname]
  9. # All alerts that do not match the following child routes
  10. # will remain at the root node and be dispatched to 'default-receiver'.
  11. routes:
  12. # All alerts with service=mysql or service=cassandra
  13. # are dispatched to the database pager.
  14. - receiver: 'database-pager'
  15. group_wait: 10s
  16. match_re:
  17. service: mysql|cassandra
  18. # All alerts with the team=frontend label match this sub-route.
  19. # They are grouped by product and environment rather than cluster
  20. # and alertname.
  21. - receiver: 'frontend-pager'
  22. group_by: [product, environment]
  23. match:
  24. team: frontend
三、<inhibit_rule>

当存在与另一组匹配器匹配的警报(源)时,禁止规则将匹配一组匹配器的警报(目标)静音。 目标和源警报必须具有相同列表中标签名称的equal标签值。

从语义上讲,缺少标签和具有空值的标签是equal的。 因此,如果源和目标警报中都缺少所有相同的标签名称,则禁用规则将适用。

为了防止警报抑制自身,禁止规则将永远不会禁止与规则的目标和源侧匹配的警报。 但是,我们建议以警报永远不会匹配双方的方式选择目标和源匹配器。 理由更容易,并且不会触发这种特殊情况。

  1. # Matchers that have to be fulfilled in the alerts to be muted.
  2. target_match:
  3. [ <labelname>: <labelvalue>, ... ]
  4. target_match_re:
  5. [ <labelname>: <regex>, ... ]
  6. # Matchers for which one or more alerts have to exist for the
  7. # inhibition to take effect.
  8. source_match:
  9. [ <labelname>: <labelvalue>, ... ]
  10. source_match_re:
  11. [ <labelname>: <regex>, ... ]
  12. # Labels that must have an equal value in the source and target
  13. # alert for the inhibition to take effect.
  14. [ equal: '[' <labelname>, ... ']' ]
四、<http_config>

http_config允许配置接收器用于与基于HTTP的API服务通信的HTTP客户端。

  1. # Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are
  2. # mutually exclusive.
  3. # Sets the `Authorization` header with the configured username and password.
  4. # password and password_file are mutually exclusive.
  5. basic_auth:
  6. [ username: <string> ]
  7. [ password: <secret> ]
  8. [ password_file: <string> ]
  9. # Sets the `Authorization` header with the configured bearer token.
  10. [ bearer_token: <secret> ]
  11. # Sets the `Authorization` header with the bearer token read from the configured file.
  12. [ bearer_token_file: <filepath> ]
  13. # Configures the TLS settings.
  14. tls_config:
  15. [ <tls_config> ]
  16. # Optional proxy URL.
  17. [ proxy_url: <string> ]
五、<tls_config>

tls_config允许配置TLS连接。

  1. # CA certificate to validate the server certificate with.
  2. [ ca_file: <filepath> ]
  3. # Certificate and key files for client cert authentication to the server.
  4. [ cert_file: <filepath> ]
  5. [ key_file: <filepath> ]
  6. # ServerName extension to indicate the name of the server.
  7. # http://tools.ietf.org/html/rfc4366#section-3.1
  8. [ server_name: <string> ]
  9. # Disable validation of the server certificate.
  10. [ insecure_skip_verify: <boolean> | default = false]
六、<receiver>

Receiver是一个或多个通知集成的命名配置。

我们没有主动添加新的接收器,我们建议通过webhook接收器实现自定义通知集成。

  1. # The unique name of the receiver.
  2. name: <string>
  3. # Configurations for several notification integrations.
  4. email_configs:
  5. [ - <email_config>, ... ]
  6. hipchat_configs:
  7. [ - <hipchat_config>, ... ]
  8. pagerduty_configs:
  9. [ - <pagerduty_config>, ... ]
  10. pushover_configs:
  11. [ - <pushover_config>, ... ]
  12. slack_configs:
  13. [ - <slack_config>, ... ]
  14. opsgenie_configs:
  15. [ - <opsgenie_config>, ... ]
  16. webhook_configs:
  17. [ - <webhook_config>, ... ]
  18. victorops_configs:
  19. [ - <victorops_config>, ... ]
  20. wechat_configs:
  21. [ - <wechat_config>, ... ]
七、<email_config>
  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = false ]
  3. # The email address to send notifications to.
  4. to: <tmpl_string>
  5. # The sender address.
  6. [ from: <tmpl_string> | default = global.smtp_from ]
  7. # The SMTP host through which emails are sent.
  8. [ smarthost: <string> | default = global.smtp_smarthost ]
  9. # The hostname to identify to the SMTP server.
  10. [ hello: <string> | default = global.smtp_hello ]
  11. # SMTP authentication information.
  12. [ auth_username: <string> | default = global.smtp_auth_username ]
  13. [ auth_password: <secret> | default = global.smtp_auth_password ]
  14. [ auth_secret: <secret> | default = global.smtp_auth_secret ]
  15. [ auth_identity: <string> | default = global.smtp_auth_identity ]
  16. # The SMTP TLS requirement.
  17. [ require_tls: <bool> | default = global.smtp_require_tls ]
  18. # TLS configuration.
  19. tls_config:
  20. [ <tls_config> ]
  21. # The HTML body of the email notification.
  22. [ html: <tmpl_string> | default = '{{ template "email.default.html" . }}' ]
  23. # The text body of the email notification.
  24. [ text: <tmpl_string> ]
  25. # Further headers email header key/value pairs. Overrides any headers
  26. # previously set by the notification implementation.
  27. [ headers: { <string>: <tmpl_string>, ... } ]
八、<hipchat_config>

HipChat通知使用Build Your Own集成。

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = false ]
  3. # The HipChat Room ID.
  4. room_id: <tmpl_string>
  5. # The auth token.
  6. [ auth_token: <secret> | default = global.hipchat_auth_token ]
  7. # The URL to send API requests to.
  8. [ api_url: <string> | default = global.hipchat_api_url ]
  9. # See https://www.hipchat.com/docs/apiv2/method/send_room_notification
  10. # A label to be shown in addition to the sender's name.
  11. [ from: <tmpl_string> | default = '{{ template "hipchat.default.from" . }}' ]
  12. # The message body.
  13. [ message: <tmpl_string> | default = '{{ template "hipchat.default.message" . }}' ]
  14. # Whether this message should trigger a user notification.
  15. [ notify: <boolean> | default = false ]
  16. # Determines how the message is treated by the alertmanager and rendered inside HipChat. Valid values are 'text' and 'html'.
  17. [ message_format: <string> | default = 'text' ]
  18. # Background color for message.
  19. [ color: <tmpl_string> | default = '{{ if eq .Status "firing" }}red{{ else }}green{{ end }}' ]
  20. # The HTTP client's configuration.
  21. [ http_config: <http_config> | default = global.http_config ]
九、<pagerduty_config>

PagerDuty通知通过PagerDuty API发送。 PagerDuty提供了有关如何在此集成的文档。

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = true ]
  3. # The following two options are mutually exclusive.
  4. # The PagerDuty integration key (when using PagerDuty integration type `Events API v2`).
  5. routing_key: <tmpl_secret>
  6. # The PagerDuty integration key (when using PagerDuty integration type `Prometheus`).
  7. service_key: <tmpl_secret>
  8. # The URL to send API requests to
  9. [ url: <string> | default = global.pagerduty_url ]
  10. # The client identification of the Alertmanager.
  11. [ client: <tmpl_string> | default = '{{ template "pagerduty.default.client" . }}' ]
  12. # A backlink to the sender of the notification.
  13. [ client_url: <tmpl_string> | default = '{{ template "pagerduty.default.clientURL" . }}' ]
  14. # A description of the incident.
  15. [ description: <tmpl_string> | default = '{{ template "pagerduty.default.description" .}}' ]
  16. # Severity of the incident.
  17. [ severity: <tmpl_string> | default = 'error' ]
  18. # A set of arbitrary key/value pairs that provide further detail
  19. # about the incident.
  20. [ details: { <string>: <tmpl_string>, ... } | default = {
  21. firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
  22. resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
  23. num_firing: '{{ .Alerts.Firing | len }}'
  24. num_resolved: '{{ .Alerts.Resolved | len }}'
  25. } ]
  26. # Images to attach to the incident.
  27. images:
  28. [ <image_config> ... ]
  29. # Links to attach to the incident.
  30. links:
  31. [ <link_config> ... ]
  32. # The HTTP client's configuration.
  33. [ http_config: <http_config> | default = global.http_config ]
9.1 <image_config>

这些字段记录在PagerDuty API文档中。

  1. source: <tmpl_string>
  2. alt: <tmpl_string>
  3. text: <tmpl_string>
9.2 <link_config>

这些字段记录在PagerDuty API文档中。

  1. href: <tmpl_string>
  2. text: <tmpl_string>
十、<pushover_config>

推送通知通过Pushover API发送。

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = true ]
  3. # The recipient user’s user key.
  4. user_key: <secret>
  5. # Your registered application’s API token, see https://pushover.net/apps
  6. token: <secret>
  7. # Notification title.
  8. [ title: <tmpl_string> | default = '{{ template "pushover.default.title" . }}' ]
  9. # Notification message.
  10. [ message: <tmpl_string> | default = '{{ template "pushover.default.message" . }}' ]
  11. # A supplementary URL shown alongside the message.
  12. [ url: <tmpl_string> | default = '{{ template "pushover.default.url" . }}' ]
  13. # Priority, see https://pushover.net/api#priority
  14. [ priority: <tmpl_string> | default = '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}' ]
  15. # How often the Pushover servers will send the same notification to the user.
  16. # Must be at least 30 seconds.
  17. [ retry: <duration> | default = 1m ]
  18. # How long your notification will continue to be retried for, unless the user
  19. # acknowledges the notification.
  20. [ expire: <duration> | default = 1h ]
  21. # The HTTP client's configuration.
  22. [ http_config: <http_config> | default = global.http_config ]
十一、<slack_config>

Slack通知通过Slack webhooks发送。 通知包含附件

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = false ]
  3. # The Slack webhook URL.
  4. [ api_url: <secret> | default = global.slack_api_url ]
  5. # The channel or user to send notifications to.
  6. channel: <tmpl_string>
  7. # API request data as defined by the Slack webhook API.
  8. [ icon_emoji: <tmpl_string> ]
  9. [ icon_url: <tmpl_string> ]
  10. [ link_names: <boolean> | default = false ]
  11. [ username: <tmpl_string> | default = '{{ template "slack.default.username" . }}' ]
  12. # The following parameters define the attachment.
  13. actions:
  14. [ <action_config> ... ]
  15. [ callback_id: <tmpl_string> | default = '{{ template "slack.default.callbackid" . }}' ]
  16. [ color: <tmpl_string> | default = '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ]
  17. [ fallback: <tmpl_string> | default = '{{ template "slack.default.fallback" . }}' ]
  18. fields:
  19. [ <field_config> ... ]
  20. [ footer: <tmpl_string> | default = '{{ template "slack.default.footer" . }}' ]
  21. [ pretext: <tmpl_string> | default = '{{ template "slack.default.pretext" . }}' ]
  22. [ short_fields: <boolean> | default = false ]
  23. [ text: <tmpl_string> | default = '{{ template "slack.default.text" . }}' ]
  24. [ title: <tmpl_string> | default = '{{ template "slack.default.title" . }}' ]
  25. [ title_link: <tmpl_string> | default = '{{ template "slack.default.titlelink" . }}' ]
  26. [ image_url: <tmpl_string> ]
  27. [ thumb_url: <tmpl_string> ]
  28. # The HTTP client's configuration.
  29. [ http_config: <http_config> | default = global.http_config ]
11.1 <action_config>

这些字段记录在Slack API文档中。

  1. type: <tmpl_string>
  2. text: <tmpl_string>
  3. url: <tmpl_string>
  4. [ style: <tmpl_string> [ default = '' ]
11.2 <field_config>

这些字段记录在Slack API文档中。

  1. title: <tmpl_string>
  2. value: <tmpl_string>
  3. [ short: <boolean> | default = slack_config.short_fields ]
十二、<opsgenie_config>

OpsGenie通知通过OpsGenie API发送。

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = true ]
  3. # The API key to use when talking to the OpsGenie API.
  4. [ api_key: <secret> | default = global.opsgenie_api_key ]
  5. # The host to send OpsGenie API requests to.
  6. [ api_url: <string> | default = global.opsgenie_api_url ]
  7. # Alert text limited to 130 characters.
  8. [ message: <tmpl_string> ]
  9. # A description of the incident.
  10. [ description: <tmpl_string> | default = '{{ template "opsgenie.default.description" . }}' ]
  11. # A backlink to the sender of the notification.
  12. [ source: <tmpl_string> | default = '{{ template "opsgenie.default.source" . }}' ]
  13. # A set of arbitrary key/value pairs that provide further detail
  14. # about the incident.
  15. [ details: { <string>: <tmpl_string>, ... } ]
  16. # Comma separated list of team responsible for notifications.
  17. [ teams: <tmpl_string> ]
  18. # Comma separated list of tags attached to the notifications.
  19. [ tags: <tmpl_string> ]
  20. # Additional alert note.
  21. [ note: <tmpl_string> ]
  22. # Priority level of alert. Possible values are P1, P2, P3, P4, and P5.
  23. [ priority: <tmpl_string> ]
  24. # The HTTP client's configuration.
  25. [ http_config: <http_config> | default = global.http_config ]
十三、<victorcops_config>

VictorOps通知通过VictorOps API发送出去

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = true ]
  3. # The API key to use when talking to the VictorOps API.
  4. [ api_key: <secret> | default = global.victorops_api_key ]
  5. # The VictorOps API URL.
  6. [ api_url: <string> | default = global.victorops_api_url ]
  7. # A key used to map the alert to a team.
  8. routing_key: <tmpl_string>
  9. # Describes the behavior of the alert (CRITICAL, WARNING, INFO).
  10. [ message_type: <tmpl_string> | default = 'CRITICAL' ]
  11. # Contains summary of the alerted problem.
  12. [ entity_display_name: <tmpl_string> | default = '{{ template "victorops.default.entity_display_name" . }}' ]
  13. # Contains long explanation of the alerted problem.
  14. [ state_message: <tmpl_string> | default = '{{ template "victorops.default.state_message" . }}' ]
  15. # The monitoring tool the state message is from.
  16. [ monitoring_tool: <tmpl_string> | default = '{{ template "victorops.default.monitoring_tool" . }}' ]
  17. # The HTTP client's configuration.
  18. [ http_config: <http_config> | default = global.http_config ]
十四、<webhook_config>

webhook接收器允许配置通用接收器。

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = true ]
  3. # The endpoint to send HTTP POST requests to.
  4. url: <string>
  5. # The HTTP client's configuration.
  6. [ http_config: <http_config> | default = global.http_config ]

Alertmanager将以下列JSON格式将HTTP POST请求发送到配置的端点:

  1. {
  2. "version": "4",
  3. "groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
  4. "status": "<resolved|firing>",
  5. "receiver": <string>,
  6. "groupLabels": <object>,
  7. "commonLabels": <object>,
  8. "commonAnnotations": <object>,
  9. "externalURL": <string>, // backlink to the Alertmanager.
  10. "alerts": [
  11. {
  12. "status": "<resolved|firing>",
  13. "labels": <object>,
  14. "annotations": <object>,
  15. "startsAt": "<rfc3339>",
  16. "endsAt": "<rfc3339>",
  17. "generatorURL": <string> // identifies the entity that caused the alert
  18. },
  19. ...
  20. ]
  21. }

有一个与此功能集成的列表。

十五、<wechat_config>

微信通知通过微信API发送。

  1. # Whether or not to notify about resolved alerts.
  2. [ send_resolved: <boolean> | default = false ]
  3. # The API key to use when talking to the WeChat API.
  4. [ api_secret: <secret> | default = global.wechat_api_secret ]
  5. # The WeChat API URL.
  6. [ api_url: <string> | default = global.wechat_api_url ]
  7. # The corp id for authentication.
  8. [ corp_id: <string> | default = global.wechat_api_corp_id ]
  9. # API request data as defined by the WeChat API.
  10. [ message: <tmpl_string> | default = '{{ template "wechat.default.message" . }}' ]
  11. [ agent_id: <string> | default = '{{ template "wechat.default.agent_id" . }}' ]
  12. [ to_user: <string> | default = '{{ template "wechat.default.to_user" . }}' ]
  13. [ to_party: <string> | default = '{{ template "wechat.default.to_party" . }}' ]
  14. [ to_tag: <string> | default = '{{ template "wechat.default.to_tag" . }}' ]``