1. Prometheus支持两种类型的规则,他们可以配置,然后定期评估规则:记录规则和报警规则。要在Prometheus中包含规则,请创建一个包含必要规则语句的文件,并让Prometheus通过Prometheus配置中的`rule_files`字段加载该文件。rule 文件用YAML格式。<br />通过向Prometheus进程发送`SIGHUP`消息,可以在运行时重新加载规则文件。仅当所有规则文件的格式都正确时,才应用更改。

检查rules语法是否正确

在不启动Prometheus服务器的情况下快速检查规则文件的语法是否正确,可以使用Prometheus的promtool命令行工具:

  1. promtool check rules /path/to/example.rules.yml

promtool二进制文件是prometheus二进制包里包含的。
当文件在语法上有效时,检查器将解析规则的文本表示形式打印到标准输出,然后以0返回状态退出。
如果存在任何语法错误或无效的输入参数,它会将错误消息打印到标准错误并以1返回状态退出。

记录规则 recording rules

记录规则可以让你提前计算一些需要频繁计算(经常使用的表达式)或复杂计算(耗费性能的表达式)的表达式,并将结果保存为一组新的时间序列。这样利用新的时间序列去查询结果会比原来每次都需要执行表达式快的多。这对视图查看非常有用(grafana查看监控图),因为每次刷新视图都会重复查询相同的表达式。
记录规则和警报规则都存在于规则组中。组中的规则以相同的时间间隔定期运行。记录规则的名称必须是有效的metric name。警报规则的名称必须是有效的标签值

rule文件的语法为:

  1. groups:
  2. [ - <rule_group> ]
  1. 一个简单的rule文件示例:
  1. groups:
  2. - name: example
  3. rules:
  4. - record: job:http_inprogress_requests:sum
  5. expr: sum by (job) (http_inprogress_requests)

  1. # The name of the group. Must be unique within a file.
  2. name: <string>
  3. # How often rules in the group are evaluated.
  4. [ interval: <duration> | default = global.evaluation_interval ]
  5. rules:
  6. [ - <rule> ... ]

recording rules的语法为:

  1. # The name of the time series to output to. Must be a valid metric name.
  2. record: <string>
  3. # The PromQL expression to evaluate. Every evaluation cycle this is
  4. # evaluated at the current time, and the result recorded as a new set of
  5. # time series with the metric name as given by 'record'.
  6. expr: <string>
  7. # Labels to add or overwrite before storing the result.
  8. labels:
  9. [ <labelname>: <labelvalue> ]
  1. alerting rules的语法为:
  1. # The name of the alert. Must be a valid label value.
  2. alert: <string>
  3. # The PromQL expression to evaluate. Every evaluation cycle this is
  4. # evaluated at the current time, and all resultant time series become
  5. # pending/firing alerts.
  6. expr: <string>
  7. # Alerts are considered firing once they have been returned for this long.
  8. # Alerts which have not yet fired for long enough are considered pending.
  9. [ for: <duration> | default = 0s ]
  10. # Labels to add or overwrite for each alert.
  11. labels:
  12. [ <labelname>: <tmpl_string> ]
  13. # Annotations to add to each alert.
  14. annotations:
  15. [ <labelname>: <tmpl_string> ]