介绍 - 初识Prometheus - 《Prometheus中文文档》

下载Prometheus
配置Prometheus
启动Prometheus
使用表达式浏览数据
使用图形界面
监控其它的目标
概括

    欢迎大家来学习Prometheus！Prometheus是一个监控平台，它通过http模式采集后端的metrics的指标去监控目标。本指南将向你展示如何使用Prometheus安装、配置和监控我们的第一个资源。你将下载、安装和 运行Prometheus。你还需要下载、安装一个exporter，该工具可以暴露主机和服务的时序数据（也就是监控数据）。我们的第一个exporter就是Prometheus本身，它提供有内存使用、垃圾回收等主机级别的监控指标。

下载Prometheus

下载一个适合你平台的最新版本的Prometheus，然后解压它：

tar xvfz prometheus-*.tar.gz
cd prometheus-*

Prometheus server 是一个被称为`prometheus`的二进制文件（在Windows上为`prometheus.exe`）。我们可以运行这个二进制文件并通过传递`--help`标志来查看它的帮助。

./prometheus --help
usage: prometheus [<flags>]
The Prometheus monitoring server
. . .

在启动Prometheus之前，让我们配置一下它。

配置Prometheus

Prometheus的配置为YAML格式。下载Prometheus后，会有一个名为prometheus.yml的示例配置文件。这个配置文件可以让你快速上手。
我们这里删除了示例文件中的大多注释，使其更加简洁的呈现出来（注释是带有#前缀的行）。

global:
  scrape_interval:     15s
  evaluation_interval: 15s
rule_files:
  # - "first.rules"
  # - "second.rules"
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

示例配置文件包含三个配置块：`global`，`rule_files`和`scrape_configs`。<br />`global`块控制Prometheus server 的全局配置，我们有两个选项。第一个`scrape_interval`，控制Prometheus 多久采集一次指标。你可以为单独一个服务去设置这个参数，他将覆盖这个全局指标。在示例中设置的为15s采集一次指标。第二个`evaluation_interval`，控制Prometheus 多久根据rules对指标进行评估一次。Prometheus使用rules来创建新的时序数据和生成报警。<br />`rule_files`块指定我们期望Prometheus加载的任何本地的规则。现在我们还没有任何规则。<br />最后这个`scrape_configs`块，控制Prometheus监控哪些资源。由于Prometheus还可以将自身暴露为http端点，因此它还可以监控自己的健康状况。在这个默认配置中，有一个名叫prometheus 的job，它会采集Prometheus server自身暴露出的时序数据。这个job包含一个简单的静态配置(static_config)，采集目标(targets)为本机的9090端口(localost:9090)。Prometheus期望在/metrics 路径上采集指标。因此，默认的job 采集数据的URL为： `http://localhost:9090/metrics`<br />返回的时序数据将详细的说明prometheus server 的状态和性能。<br />有关配置选项的完整说明，请参阅[配置文档](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)。

启动Prometheus

用我们新创建的配置文件去启动Prometheus，请切换到包含Prometheus 二进制文件的文件夹并运行：

./prometheus --config.file=prometheus.yml

Prometheus已经启动，你还能去`http://localhost:9090`浏览关于自身的状态页。给它大约30的时间，让它从http端点去收集有关自身的数据。<br />你还能通过访问这个地址去验证Prometheus是否有提供自身的指标：`http://localhost:9090/metrics`

使用表达式浏览数据

让我试着查看Prometheus收集到的自身的数据。用使用Prometheus的内置表达式查看，进入这个地址http://localhost:9090/graph，然后选择Graph这个选项卡。
正如你在http://localhost:9090/metrics看到的那样，Prometheus会暴露一个有关自身的指标，称为：promhttp_metric_handler_requests_total(prometheus已经处理的总请求数)。继续并将其输入到控制台中：

promhttp_metric_handler_requests_total

这将返回多个不同的时间序列（以及每个时间序列记录的最新值），所有时间序列的metrics名称均为promhttp_metric_handler_requests_total，但具有不同的标签。这些标签指定不同的请求状态。
如果我们只对http状态码200的感兴趣，则可以这样查询来检索该信息：

promhttp_metric_handler_requests_total{code="200"}

要计算返回多少时间序列。可以这样查看：

count(promhttp_metric_handler_requests_total)

有关表达语法的更多信息，请参见表达语法文档。

使用图形界面

要使用图形界面，请使用地址 http://localhost:9090/graph，并选择Graph选项。
例如，输入以下表达式，图形将显示Prometheus每分钟状态码为200的请求的速率。

rate(promhttp_metric_handler_requests_total{code="200"}[1m])

你可以尝试使用图形界面的时间范围参数和其它的设置。

监控其它的目标

仅从Prometheus收集指标并不能很好的展示Prometheus的能力。为了更好的了解Prometheus可以做什么，我们建议你浏览其它exporter的文档。对于新手来说，用node-exporter监控Linux或macOS是一个很好的指南。

概括

在本指南中，您安装了Prometheus，配置了Prometheus实例以监视资源，并了解了在Prometheus表达式浏览器中使用时间序列数据的一些基础知识。要继续学习Prometheus，请查看“概述”以获取有关接下来要探索的内容的一些想法。