你能用promtool去检查你的rules是否正确。
# For a single test file../promtool test rules test.yml# If you have multiple test files, say test1.yml,test2.yml,test2.yml./promtool test rules test1.yml test2.yml test3.yml
测试文件的格式
# This is a list of rule files to consider for testing. Globs are supported.rule_files:[ - <file_name> ][ evaluation_interval: <duration> | default = 1m ]# The order in which group names are listed below will be the order of evaluation of# rule groups (at a given evaluation time). The order is guaranteed only for the groups mentioned below.# All the groups need not be mentioned below.group_eval_order:[ - <group_name> ]# All the tests are listed here.tests:[ - <test_group> ]
<test_group>
# Series datainterval: <duration>input_series:[ - <series> ]# Name of the test group[ name: <string> ]# Unit tests for the above data.# Unit tests for alerting rules. We consider the alerting rules from the input file.alert_rule_test:[ - <alert_test_case> ]# Unit tests for PromQL expressions.promql_expr_test:[ - <promql_test_case> ]# External labels accessible to the alert template.external_labels:[ <labelname>: <string> ... ]
<series>
# This follows the usual series notation '<metric name>{<label name>=<label value>, ...}'# Examples:# series_name{label1="value1", label2="value2"}# go_goroutines{job="prometheus", instance="localhost:9090"}series: <string># This uses expanding notation.# Expanding notation:# 'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) … a+(c*b)'# 'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) … a-(c*b)'# Examples:# 1. '-2+4x3' becomes '-2 2 6 10'# 2. ' 1-2x4' becomes '1 -1 -3 -5 -7'values: <string>
<alert_test_case>
普罗米修斯允许您为不同的警报规则使用相同的警报名称。因此,在这个单元测试中,您必须在单个
# The time elapsed from time=0s when the alerts have to be checked.eval_time: <duration># Name of the alert to be tested.alertname: <string># List of expected alerts which are firing under the given alertname at# given evaluation time. If you want to test if an alerting rule should# not be firing, then you can mention the above fields and leave 'exp_alerts' empty.exp_alerts:[ - <alert> ]
<alert>
# Expression to evaluateexpr: <string># The time elapsed from time=0s when the expression has to be evaluated.eval_time: <duration># Expected samples at the given evaluation time.exp_samples:[ - <sample> ]
<sample>
# Labels of the sample in usual series notation '<metric name>{<label name>=<label value>, ...}'# Examples:# series_name{label1="value1", label2="value2"}# go_goroutines{job="prometheus", instance="localhost:9090"}labels: <string># The expected value of the PromQL expression.value: <number>
示例
这是一个单元测试的示例文件。test.yml是遵循上述语法的测试文件,其中alerts.yml包含报警规则。请和alerts.yml在同一个目录下,运行./promtool test rules test.yml。test.yml
# This is the main input for unit testing.# Only this file is passed as command line argument.rule_files:- alerts.ymlevaluation_interval: 1mtests:# Test 1.- interval: 1m# Series data.input_series:- series: 'up{job="prometheus", instance="localhost:9090"}'values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'- series: 'up{job="node_exporter", instance="localhost:9100"}'values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0- series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130- series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130# Unit test for alerting rules.alert_rule_test:# Unit test 1.- eval_time: 10malertname: InstanceDownexp_alerts:# Alert 1.- exp_labels:severity: pageinstance: localhost:9090job: prometheusexp_annotations:summary: "Instance localhost:9090 down"description: "localhost:9090 of job prometheus has been down for more than 5 minutes."# Unit tests for promql expressions.promql_expr_test:# Unit test 1.- expr: go_goroutines > 5eval_time: 4mexp_samples:# Sample 1.- labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'value: 50# Sample 2.- labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'value: 50
alerts.yml
# This is the rules file.groups:- name: examplerules:- alert: InstanceDownexpr: up == 0for: 5mlabels:severity: pageannotations:summary: "Instance {{ $labels.instance }} down"description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."- alert: AnotherInstanceDownexpr: up == 0for: 10mlabels:severity: pageannotations:summary: "Instance {{ $labels.instance }} down"description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
