监控 - Prometheus部署 - 《运维大世界》

title: Prometheus部署 #标题tags: Prometheus #标签
date: 2020-08-26
categories: 监控 # 分类

title: Prometheus部署 #标题tags: Prometheus #标签
date: 2020-08-26
categories: 监控 # 分类

系统化学习Prometheus第一天，跟我一起部署它！
参考官方文档。

下载及调整目录

$ wget https://github.com/prometheus/prometheus/releases/download/v2.20.1/prometheus-2.20.1.linux-amd64.tar.gz
$ tar zxf prometheus-2.20.1.linux-amd64.tar.gz -C /opt/
$ ln -sf /opt/prometheus-2.20.1.linux-amd64 /opt/prometheus
$ ln -sf /opt/prometheus-2.20.1.linux-amd64/prometheus /usr/local/bin/

分析配置文件

$ cd /opt/prometheus
$ cat prometheus.yml       # 查看默认配置文件，# 号开头表示注释行
# 配置文件分为三个模块，global、rule_file及scrape_configs。
# global一般不用改动
# my global config
global:
# 设置每15s更新一次监控数据
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
# rule_files块指定了我们希望Prometheus服务器加载的任何报警规则的位置。
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# scrape_configs控制Prometheus监视哪些资源。
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['192.168.20.10:9090']   # 这里需要将localhost改为本机地址，否则有些小bug

配置文件支持很多东西，建议阅读下官方配置文档。

Prometheus的启停方式

# 启动Prometheus
$ prometheus --config.file="/opt/prometheus/prometheus.yml" --web.enable-lifecycle --storage.tsdb.path=/opt/prometheus/data --storage.tsdb.retention.time=30d &  # 启动Prometheus
# --web.enable-lifecycle  表示允许通过web接口重载配置文件
# --storage.tsdb.path=/opt/prometheus/data   指定数据存储目录，默认是当前目录的data目录
# --storage.tsdb.retention.time     指定数据保留时间，默认为15d。
$ pkill prometheus          # 停止Prometheus
# 重载Prometheus
# 命令行中给发送hup信号至Prometheus进程
$ kill -HUP $(ps -ef | grep prometheus | grep -v grep  | awk '{print $2}')
# 通过web接口重载，启动时需增加选项  --web.enable-lifecycle
$ curl -XPOST http://192.168.20.10:9090/-/reload
# 如果配置文件有误，重启时将会看到类似以下的错误信息
ERRO[0161] Error reloading config: couldn't load configuration (-config.file=prometheus.yml): unknown fields in scrape_config: job_nae  source=main.go:146

输出如下，表示启动成功:

Prometheus部署 - 图1

$ ss -lnput | grep 9090    # 确定端口在监听
tcp    LISTEN     0      128      :::9090                 :::*                   users:(("prometheus",pid=16429,fd=11))

访问web界面

访问Prometheus主机的 9090 端口，点击如下，即可看到监控主机的相关信息（Prometheus默认监控本机）:
Prometheus部署 - 图2

管理API

健康检查

$ curl  http://192.168.20.10:9090/-/healthy

准备检查

$ curl  http://192.168.20.10:9090/-/ready
Prometheus is Ready.

重载配置文件

$ curl -XPOST http://192.168.20.10:9090/-/reload
# 需启动时增加 --web.enable-lifecycle  选项。

停止Prometheus

$ curl -XPUT http://192.168.20.10:9090/-/quit
# 或者
$ curl -XPOST http://192.168.20.10:9090/-/quit
# 同样需启动时增加 --web.enable-lifecycle  选项。