你能用promtool去检查你的rules是否正确。

  1. # For a single test file.
  2. ./promtool test rules test.yml
  3. # If you have multiple test files, say test1.yml,test2.yml,test2.yml
  4. ./promtool test rules test1.yml test2.yml test3.yml

测试文件的格式

  1. # This is a list of rule files to consider for testing. Globs are supported.
  2. rule_files:
  3. [ - <file_name> ]
  4. [ evaluation_interval: <duration> | default = 1m ]
  5. # The order in which group names are listed below will be the order of evaluation of
  6. # rule groups (at a given evaluation time). The order is guaranteed only for the groups mentioned below.
  7. # All the groups need not be mentioned below.
  8. group_eval_order:
  9. [ - <group_name> ]
  10. # All the tests are listed here.
  11. tests:
  12. [ - <test_group> ]

<test_group>

  1. # Series data
  2. interval: <duration>
  3. input_series:
  4. [ - <series> ]
  5. # Name of the test group
  6. [ name: <string> ]
  7. # Unit tests for the above data.
  8. # Unit tests for alerting rules. We consider the alerting rules from the input file.
  9. alert_rule_test:
  10. [ - <alert_test_case> ]
  11. # Unit tests for PromQL expressions.
  12. promql_expr_test:
  13. [ - <promql_test_case> ]
  14. # External labels accessible to the alert template.
  15. external_labels:
  16. [ <labelname>: <string> ... ]

<series>

  1. # This follows the usual series notation '<metric name>{<label name>=<label value>, ...}'
  2. # Examples:
  3. # series_name{label1="value1", label2="value2"}
  4. # go_goroutines{job="prometheus", instance="localhost:9090"}
  5. series: <string>
  6. # This uses expanding notation.
  7. # Expanding notation:
  8. # 'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) … a+(c*b)'
  9. # 'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) … a-(c*b)'
  10. # Examples:
  11. # 1. '-2+4x3' becomes '-2 2 6 10'
  12. # 2. ' 1-2x4' becomes '1 -1 -3 -5 -7'
  13. values: <string>

<alert_test_case>
普罗米修斯允许您为不同的警报规则使用相同的警报名称。因此,在这个单元测试中,您必须在单个下列出alertname的所有触发警报的联合。

  1. # The time elapsed from time=0s when the alerts have to be checked.
  2. eval_time: <duration>
  3. # Name of the alert to be tested.
  4. alertname: <string>
  5. # List of expected alerts which are firing under the given alertname at
  6. # given evaluation time. If you want to test if an alerting rule should
  7. # not be firing, then you can mention the above fields and leave 'exp_alerts' empty.
  8. exp_alerts:
  9. [ - <alert> ]

<alert>

  1. # Expression to evaluate
  2. expr: <string>
  3. # The time elapsed from time=0s when the expression has to be evaluated.
  4. eval_time: <duration>
  5. # Expected samples at the given evaluation time.
  6. exp_samples:
  7. [ - <sample> ]

<sample>

  1. # Labels of the sample in usual series notation '<metric name>{<label name>=<label value>, ...}'
  2. # Examples:
  3. # series_name{label1="value1", label2="value2"}
  4. # go_goroutines{job="prometheus", instance="localhost:9090"}
  5. labels: <string>
  6. # The expected value of the PromQL expression.
  7. value: <number>

示例

这是一个单元测试的示例文件。test.yml是遵循上述语法的测试文件,其中alerts.yml包含报警规则。请和alerts.yml在同一个目录下,运行./promtool test rules test.yml
test.yml

  1. # This is the main input for unit testing.
  2. # Only this file is passed as command line argument.
  3. rule_files:
  4. - alerts.yml
  5. evaluation_interval: 1m
  6. tests:
  7. # Test 1.
  8. - interval: 1m
  9. # Series data.
  10. input_series:
  11. - series: 'up{job="prometheus", instance="localhost:9090"}'
  12. values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
  13. - series: 'up{job="node_exporter", instance="localhost:9100"}'
  14. values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
  15. - series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
  16. values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
  17. - series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
  18. values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130
  19. # Unit test for alerting rules.
  20. alert_rule_test:
  21. # Unit test 1.
  22. - eval_time: 10m
  23. alertname: InstanceDown
  24. exp_alerts:
  25. # Alert 1.
  26. - exp_labels:
  27. severity: page
  28. instance: localhost:9090
  29. job: prometheus
  30. exp_annotations:
  31. summary: "Instance localhost:9090 down"
  32. description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
  33. # Unit tests for promql expressions.
  34. promql_expr_test:
  35. # Unit test 1.
  36. - expr: go_goroutines > 5
  37. eval_time: 4m
  38. exp_samples:
  39. # Sample 1.
  40. - labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
  41. value: 50
  42. # Sample 2.
  43. - labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
  44. value: 50

alerts.yml

  1. # This is the rules file.
  2. groups:
  3. - name: example
  4. rules:
  5. - alert: InstanceDown
  6. expr: up == 0
  7. for: 5m
  8. labels:
  9. severity: page
  10. annotations:
  11. summary: "Instance {{ $labels.instance }} down"
  12. description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  13. - alert: AnotherInstanceDown
  14. expr: up == 0
  15. for: 10m
  16. labels:
  17. severity: page
  18. annotations:
  19. summary: "Instance {{ $labels.instance }} down"
  20. description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."