Prometheus - prometheus / metrics - 《Dev Ops》

启动参数：
./prometheus \
—web.listen-address=”0.0.0.0:9090” #监听端口 \
—config.file=”prometheus.yml” #指定配置文件 \
—web.read-timeout=5m #http连接资源回收时间 \
—web.max-connections=512 #web最大连接数 \
—storage.tsdb.retention=15d #监控数据保留时间 \
—storage.tsdb.path=”/data/devops/promethues/server/data” #数据存储目录 \
—query.max-concurrency=20 #用户查询时, 最大查询用户数 \
—query.timeout=2m #用户查询时, 查询超时时间

数据类型：
Gauge — 可增可减的仪表盘
Counter — 只增不减的计数器
Histogram/Summary — 分析型数据类型(最大最小/中位数/数据分段)

当一个exporter采集到数据类型

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.2356e-05
go_gc_duration_seconds{quantile="0.25"} 3.6165e-05
go_gc_duration_seconds{quantile="0.5"} 5.8682e-05
go_gc_duration_seconds{quantile="0.75"} 0.000122074
go_gc_duration_seconds{quantile="1"} 0.008177793
go_gc_duration_seconds_sum 13.664656914
go_gc_duration_seconds_count 57586
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 249
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0

exporter采集器
官方提供很多exporter例如:
node_exporter
nginx_exporter
mysqld_exporter

pushgateway采集器
方便灵活简单

查询语法：
node_cpu 查询指标
node_cpu{mode=’idle’} 空闲cpu
increase(node_cpu{mode=’idle’}{1m}) 空闲cpu一分钟增量
sum(increase(node_cpu{mode=’idle’}{1m})) 空闲cpu求和(包含所有机器)
sum(increase(node_cpu{mode=’idle’}{1m})) by(instance) cpu求和按机器名进行拆分

lab标签：
默认采集数据都会提供标签
例如显示服务器名称是web开头的服务器指标
count_netstat_wait_connections(exported_instance=~”web.*”)

函数：
rate( [Nm]) — 速率 N分钟内平均每秒的增量公式 ( N分钟后数据 - N分钟前数据 ) / 60
rate(count_netstat_wait_connctions[1m]) 1分钟内平均每秒钟的增量
increase( [Nm]) — 增量 N分钟内数据的增量公式 N分钟后数据 - N分钟前数据
采集频率较高时，使用rate，采集频率较低是，使用increase
sum() — 求和同一时间，数据相加
一般结合by使用
topk(N, ) — 取前N位的最高值
topk可以使用 gauge数据，可以增对counter使用(counter需要使用rate/increase)
count() — 求数量
count(count_netstat_wait_connections > 200) 统计连接数大于200的服务器台数