说明

instance

  1. - Prometheus术语来说,可以抓取的端点称为实例 instance

job

  1. - 具有相同目的的实例的集合(例如,出于可伸缩性或可靠性而复制的过程)称为job
  2. ##
  3. 举例
  4. - job_name: 'pushgateway'
  5. honor_timestamps: true
  6. scrape_interval: 15s
  7. scrape_timeout: 10s
  8. metrics_path: /metrics
  9. scheme: http
  10. static_configs:
  11. - targets:
  12. - 172.20.70.205:9091
  13. - 172.20.70.205:9092
  14. - 172.20.70.215:9091

自动生成的标签和时间序列

  1. Prometheus抓取目标时,它会自动在抓取的时间序列上附加一些标签,以识别被抓取的目标:
  2. - job:目标所属的已配置作业名称。
  3. - instance:<host>:<port>抓取的目标网址的一部分。
  4. - up{job="<job-name>", instance="<instance-id>"}:1实例是否正常(即可达)或0刮取失败。
  5. - - - 设置告警查看采集失败的实例 `up==0`
  6. - scrape_duration_seconds{job="<job-name>", instance="<instance-id>"}:刮擦的耗时
  7. -

举例

  1. scrape_duration_seconds{instance="172.20.70.205", job="blackbox-ssh"} 0.001817932
  2. scrape_duration_seconds{instance="172.20.70.205:3000", job="single-targets"} 0.005416658
  3. scrape_duration_seconds{instance="172.20.70.205:9091", job="pushgateway"} 0.002726714
  4. scrape_duration_seconds{instance="172.20.70.205:9092", job="pushgateway"} 0.000506256
  5. scrape_duration_seconds{instance="172.20.70.205:9100", job="single-targets"} 0.012790691
  6. scrape_duration_seconds{instance="172.20.70.205:9104", job="single-targets"} 0.021421043
  7. scrape_duration_seconds{instance="172.20.70.205:9115", job="blackbox-http-targets"} 0.00427973
  1. 用途:统计job中采集比较耗时的instance ,
  2. - 为什么慢
  3. - 网络质量
  4. - metrics数据量太大
  5. - prometheus采集端有瓶颈了,需要扩容
  6. - 上次采集最慢的五个 job+instance topk(5,scrape_duration_seconds)
  7. - 采集时间超过3秒的 scrape_duration_seconds > 3
  8. - scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"}:relabel之后剩余的重新标记后剩余的样本数
  9. - 何为样本:简单理解就是 标签组唯一
  10. - scrape_samples_scraped{job="<job-name>", instance="<instance-id>"}:目标暴露的样本数

举例 topk(5,scrape_samples_scraped)

  1. scrape_samples_scraped{instance="172.20.70.205:9256", job="single-targets"} 1691
  2. scrape_samples_scraped{instance="172.20.70.215:9256", job="single-targets"} 1010
  3. scrape_samples_scraped{instance="172.20.70.205:9104", job="single-targets"} 816
  4. scrape_samples_scraped{instance="172.20.70.215:9100", job="single-targets"} 500
  5. scrape_samples_scraped{instance="172.20.70.205:9100", job="single-targets"} 500
  • 用途: 统计样本数量按 job+instance分类

    按job排序 topk(5,sum(scrape_samples_scraped) by (job))

  1. {job="single-targets"} 4957
  2. {job="redis_exporter_targets"} 299
  3. {job="pushgateway"} 102
  4. {job="blackbox-http-targets"} 72
  5. {job="blackbox-ssh"} 6
  1. - scrape_series_added{job="<job-name>", instance="<instance-id>"}:此抓取中新系列的大概数量。v2.10的新功能
  2. - 用途 统计新增的metrics,可以用来查看写峰
  3. - 大部分情况应该都是旧的metrics append写入
  4. #

prometheus特殊tag说明

  1. - __address__ 采集endpoint的地址
  2. - __name__ metrics 的名称
  3. - instance endpoint最后的tag
  4. - job 任务
  5. - __metrics_path__ 采集的http path /metrics /cadvisor/metrics