安装
官方下载地址:https://prometheus.io/download/
Github下载地址:https://github.com/prometheus/prometheus/releases
二进制文件安装
根据宿主机的系统版本选择下载文件
如果是Linux系统
$ wget https://github.com/prometheus/prometheus/releases/download/v2.21.0/prometheus-2.21.0.linux-amd64.tar.gz
$ tar -zxvf prometheus-2.21.0.linux-amd64.tar.gz
如果是mac os 系统则选择darwin-amd64
$ wget https://github.com/prometheus/prometheus/releases/download/v2.17.1/prometheus-2.17.1.darwin-amd64.tar.gz
$ tar -zxvf prometheus-2.17.1.darwin-amd64.tar.gz -C ./
$ cd prometheus-2.17.1.darwin-amd64
$ tree
.
├── LICENSE
├── NOTICE
├── console_libraries
│ ├── menu.lib
│ └── prom.lib
├── consoles
│ ├── index.html.example
│ ├── node-cpu.html
│ ├── node-disk.html
│ ├── node-overview.html
│ ├── node.html
│ ├── prometheus-overview.html
│ └── prometheus.html
├── prometheus
├── prometheus.yml
├── promtool
└── tsdb
2 directories, 15 files
Docker安装
dockerhub 地址:https://hub.docker.com/r/prom/prometheus/,执行下面的docker命令,然后通过 http://localhost:9090/ 来访问容器内的 Prometheus 服务
$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
官方Dockfile文件:https://github.com/prometheus/prometheus/blob/master/Dockerfile
增加配置文件本地路径
docker run \
-p 9090:9090 \
-v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
源码安装
$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus -config.file=your_config.yml
启动参数
帮助文档
$ ./prometheus -h
usage: prometheus [<flags>]
The Prometheus monitoring server
Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
--config.file="prometheus.yml"
Prometheus configuration file path.
--web.listen-address="0.0.0.0:9090"
Address to listen on for UI, API, and telemetry.
--web.read-timeout=5m Maximum duration before timing out read of the request, and closing idle connections.
--web.max-connections=512 Maximum number of simultaneous connections.
--web.external-url=<URL> The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a
reverse proxy). Used for generating relative and absolute links back to Prometheus itself. If the URL
has a path portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted,
relevant URL components will be derived automatically.
--web.route-prefix=<path> Prefix for the internal routes of web endpoints. Defaults to path of --web.external-url.
--web.user-assets=<path> Path to static asset directory, available at /user.
--web.enable-lifecycle Enable shutdown and reload via HTTP request.
--web.enable-admin-api Enable API endpoints for admin control actions.
--web.console.templates="consoles"
Path to the console template directory, available at /consoles.
--web.console.libraries="console_libraries"
Path to the console library directory.
--web.page-title="Prometheus Time Series Collection and Processing Server"
Document title of Prometheus instance.
--web.cors.origin=".*" Regex for CORS origin. It is fully anchored. Example: 'https?://(domain1|domain2)\.com'
--storage.tsdb.path="data/"
Base path for metrics storage.
--storage.tsdb.retention=STORAGE.TSDB.RETENTION
[DEPRECATED] How long to retain samples in storage. This flag has been deprecated, use
"storage.tsdb.retention.time" instead.
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME
How long to retain samples in storage. When this flag is set it overrides "storage.tsdb.retention". If
neither this flag nor "storage.tsdb.retention" nor "storage.tsdb.retention.size" is set, the retention
time defaults to 15d. Units Supported: y, w, d, h, m, s, ms.
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE
[EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. Units supported: KB, MB, GB, TB,
PB. This flag is experimental and can be changed in future releases.
--storage.tsdb.no-lockfile
Do not create lockfile in data directory.
--storage.tsdb.allow-overlapping-blocks
[EXPERIMENTAL] Allow overlapping blocks, which in turn enables vertical compaction and vertical query
merge.
--storage.tsdb.wal-compression
Compress the tsdb WAL.
--storage.remote.flush-deadline=<duration>
How long to wait flushing sample on shutdown or config reload.
--storage.remote.read-sample-limit=5e7
Maximum overall number of samples to return via the remote read interface, in a single query. 0 means no
limit. This limit is ignored for streamed response types.
--storage.remote.read-concurrent-limit=10
Maximum number of concurrent remote read calls. 0 means no limit.
--storage.remote.read-max-bytes-in-frame=1048576
Maximum number of bytes in a single frame for streaming remote read response types before marshalling.
Note that client might have limit on frame size as well. 1MB as recommended by protobuf by default.
--rules.alert.for-outage-tolerance=1h
Max time to tolerate prometheus outage for restoring "for" state of alert.
--rules.alert.for-grace-period=10m
Minimum duration between alert and restored "for" state. This is maintained only for alerts with
configured "for" time greater than grace period.
--rules.alert.resend-delay=1m
Minimum amount of time to wait before resending an alert to Alertmanager.
--alertmanager.notification-queue-capacity=10000
The capacity of the queue for pending Alertmanager notifications.
--alertmanager.timeout=10s
Timeout for sending alerts to Alertmanager.
--query.lookback-delta=5m The maximum lookback duration for retrieving metrics during expression evaluations and federation.
--query.timeout=2m Maximum time a query may take before being aborted.
--query.max-concurrency=20
Maximum number of queries executed concurrently.
--query.max-samples=50000000
Maximum number of samples a single query can load into memory. Note that queries will fail if they try
to load more samples than this into memory, so this also limits the number of samples a query can
return.
--log.level=info Only log messages with the given severity or above. One of: [debug, info, warn, error]
--log.format=logfmt Output format of log messages. One of: [logfmt, json]
数据存储
默认的存储路径为data/
,通过参数--storage.tsdb.path="data/"
修改本地数据存储的路径。
mkdir -p data
配置文件
当前通过二进制压缩文件安装在 解压完成的目录会包含默认的Prometheus配置文件promethes.yml,
$ ls -l
total 161140
drwxr-xr-x 2 ubuntu ubuntu 4096 Sep 11 21:29 console_libraries
drwxr-xr-x 2 ubuntu ubuntu 4096 Sep 11 21:29 consoles
-rw-r--r-- 1 ubuntu ubuntu 11357 Sep 11 21:29 LICENSE
-rw-r--r-- 1 ubuntu ubuntu 3420 Sep 11 21:29 NOTICE
-rwxr-xr-x 1 ubuntu ubuntu 88471209 Sep 11 19:37 prometheus
-rw-r--r-- 1 ubuntu ubuntu 926 Sep 11 21:29 prometheus.yml
-rwxr-xr-x 1 ubuntu ubuntu 76493104 Sep 11 19:39 promtool
加载自定义的配置文件路径
./prometheus --config.file=启动配置文件.yml
默认启动
启动prometheus服务,其会默认加载当前路径下的prometheus.yaml文件
$ ./prometheus
启动信息
level=info ts=2020-10-11T06:36:14.389Z caller=main.go:310 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-10-11T06:36:14.389Z caller=main.go:346 msg="Starting Prometheus" version="(version=2.21.0, branch=HEAD, revision=e83ef207b6c2398919b69cd87d2693cfc2fb4127)"
level=info ts=2020-10-11T06:36:14.389Z caller=main.go:347 build_context="(go=go1.15.2, user=root@a4d9bea8479e, date=20200911-11:35:02)"
level=info ts=2020-10-11T06:36:14.389Z caller=main.go:348 host_details="(Linux 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 VM-0-2-ubuntu (none))"
level=info ts=2020-10-11T06:36:14.389Z caller=main.go:349 fd_limits="(soft=1024, hard=1048576)"
level=info ts=2020-10-11T06:36:14.389Z caller=main.go:350 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-10-11T06:36:14.396Z caller=main.go:701 msg="Starting TSDB ..."
level=info ts=2020-10-11T06:36:14.397Z caller=web.go:523 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-10-11T06:36:14.402Z caller=head.go:644 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-10-11T06:36:14.402Z caller=head.go:658 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=4.345µs
level=info ts=2020-10-11T06:36:14.403Z caller=head.go:664 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-10-11T06:36:14.403Z caller=head.go:716 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-10-11T06:36:14.403Z caller=head.go:719 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=32.293µs wal_replay_duration=194.417µs total_replay_duration=277.95µs
level=info ts=2020-10-11T06:36:14.404Z caller=main.go:721 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-10-11T06:36:14.404Z caller=main.go:724 msg="TSDB started"
level=info ts=2020-10-11T06:36:14.404Z caller=main.go:850 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2020-10-11T06:36:14.409Z caller=main.go:881 msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=4.560441ms remote_storage=2.148µs web_handler=676ns query_engine=1.153µs scrape=2.568956ms scrape_sd=36.388µs notify=1.581758ms notify_sd=47.157µs rules=1.829µs
level=info ts=2020-10-11T06:36:14.409Z caller=main.go:673 msg="Server is ready to receive web requests."
Expression Browser
Prometheus的-web.listen-address=
监听的就是PromQL浏览器的地址,例如:-web.listen-address=:9090
,就以http://serverip:9090/graph
访问PromQL浏览器。
启动完成后,如果是本地自检环境可以通过 http://localhost:9090/graph
http://localhost:9090/metrics
其输入表达式控制台:
promhttp_metric_handler_requests_total
部署优化
远端存储
prometheus默认是将监控数据保存在本地磁盘中的,当然在分布式架构环境下,这样是不太可取的。不过它支持远端存储,可与远端存储系统集成。
Prometheus integrates with remote storage systems in two ways:
- Prometheus can write samples that it ingests to a remote URL in a standardized format.
- Prometheus can read (back) sample data from a remote URL in a standardized format.
目前支持的远端存储系统如下:
The remote write and remote read features of Prometheus allow transparently sending and receiving samples. This is primarily intended for long term storage. It is recommended that you perform careful evaluation of any solution in this space to confirm it can handle your data volumes.
- AppOptics: write
- Chronix: write
- Cortex: read and write
- CrateDB: read and write
- Elasticsearch: write
- Gnocchi: write
- Graphite: write
- InfluxDB: read and write
- IRONdb: read and write
- M3DB: read and write
- OpenTSDB: write
- PostgreSQL/TimescaleDB: read and write
- SignalFx: write
联邦模式
如果prometheus仅能够中心化地进行数据采集存储、分析,不支持集群模式,带来的性能问题显而易见。Prometheus给出了一种联邦的部署方式,就是Prometheus server可以从其他的Prometheus server采集数据,实施步骤直接参考官方文档。参考