监控概述
- 系统底层监控
- CPU
- 内存
- 网卡
- 磁盘
- 带宽利用率
- 延迟、丢包
- 交换机,路由器,防火墙等基础设施
- web监控
- web打开速度
- URL状态码
- API接口可用性
- 业务监控
- 订单交易量
- 活跃用户
- 可用性
中间件监控
基础设施监控
- 业务维度监控
Prometheus
简介
- 官网:https://prometheus.io/
github:https://github.com/prometheus/prometheus
组件介绍
prometheus server: 数据query,store,scripy
- prometheus targets: 静态收集的目标服务数据
- service discovery: 动态发现服务
- prometheus alerting: 报警通知
- push gateway: 数据收集代理服务
- data visualization andexport: 数据可视化与数据导出
部署方式
- 部署官方站点: https://prometheus.io/docs/prometheus/latest/installation/
-
基于docker容器部署
部署节点主机名&ip: prometheus-server1 10.168.56.111
docker 安装
root@prometheus-server1:~# curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
配置镜像加速并重启docker ```bash tee /etc/docker/daemon.json <<-‘EOF’ { “registry-mirrors”: [“https://0b8hhs68.mirror.aliyuncs.com“], “data-root”: “/data/docker” } EOF
systemctl restart docker
3. 基于docker部署prometheus
```bash
root@prometheus-server1:~#root@prometheus-server1:/data/apps/prometheus# docker run -p 9090:9090 prom/prometheus
- web访问
基于k8s operator部署
- https://github.com/prometheus-operator/kube-prometheus
- 按照k8s版本部署对应的kube-prometheus
需要将prometheus, grafana, alertmanager服务通过nodePort做暴露即可
apt 安装
apt update -y apt search prometheus apt install prometheus -y
基于二进制方式部署
安装prometheus
root@prometheus-server1:~# cd /data/apps/ root@prometheus-server1:/data/apps# wget https://github.com/prometheus/prometheus/releases/download/v2.33.1/prometheus-2.33.1.linux-amd64.tar.gz root@prometheus-server1:/data/apps# tar xf prometheus-2.33.1.linux-amd64.tar.gz root@prometheus-server1:/data/apps# ln -sv prometheus-2.33.1.linux-amd64 prometheus
配置systemctl 并做开机自启动 ```bash root@prometheus-server1:~# cd /data/apps/prometheus root@prometheus-server1:/data/apps/prometheus# mkdir data root@prometheus-server1:/data/apps/prometheus# tee /etc/systemd/system/prometheus.service <<-‘EOF’ [Unit] Description=”prometheus” Documentation=https://prometheus.io/ After=network.target
[Service] Type=simple ExecStart=/data/apps/prometheus/prometheus —config.file=/data/apps/prometheus/prometheus.yml —storage.tsdb.path=/data/apps/prometheus/data —web.enable-lifecycle
Restart=on-failure RestartSecs=5s SuccessExitStatus=0 LimitNOFILE=65536 StandardOutput=syslog StandardError=syslog SyslogIdentifier=prometheus
[Install] WantedBy=multi-user.target EOF
root@prometheus-server1:/data/apps/prometheus# systemctl enable prometheus.service root@prometheus-server1:/data/apps/prometheus# systemctl start prometheus.service root@prometheus-server1:/data/apps/prometheus# systemctl status prometheus.service
3. web节点访问
![image.png](https://cdn.nlark.com/yuque/0/2022/png/2391625/1644247353236-8bf836f5-0b5e-4ded-b745-6e637829b504.png#clientId=u8130affa-b051-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=461&id=uf864154d&margin=%5Bobject%20Object%5D&name=image.png&originHeight=461&originWidth=1183&originalType=binary&ratio=1&rotation=0&showTitle=false&size=43726&status=done&style=none&taskId=uacafc351-6b01-49a5-bcb1-f08f9092e17&title=&width=1183)
4. 动态reload命令
```bash
root@prometheus-server1:/data/apps/prometheus# curl -X POST http://localhost:9090/-/reload
- 部署node-exporter并设置开机自启动(部署在10.168.56.111,10.168.56.112)
```bash
root@prometheus-server1:~# cd /data/apps/
root@prometheus-server1:/data/apps# wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
root@prometheus-server1:/data/apps# ln -sv node_exporter-1.3.1.linux-amd64 node_exporter
root@prometheus-server1:/data/apps# cat <
/etc/systemd/system/node_exporter.service [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target
[Service] ExecStart=/data/apps/node_exporter/node_exporter StandardOutput=syslog StandardError=syslog SyslogIdentifier=node_exporter [Install] WantedBy=default.target EOF
root@prometheus-server1:/data/apps# systemctl enable node_exporter.service root@prometheus-server1:/data/apps# systemctl start node_exporter.service root@prometheus-server1:/data/apps# systemctl status node_exporter.service
6. prometheus配置监控node节点
```bash
root@prometheus-server1:~# vim /data/apps/prometheus/prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "first_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_exporter"
static_configs:
- targets: ["10.168.56.111:9100","10.168.56.112:9100"]
root@prometheus-server1:~# curl -X POST http://localhost:9090/-/reload
- 查看target
blackbox Exporter
- 监控对象
- http/https: url/api可用性检测
- tcp: 端口监听检测
- icmp: 主机存活检测
- DNS: 域名解析
- 部署blackbox exporter(10.168.56.112)
```bash
root@blackbox-exporter:~# mkdir /data/apps/ && cd /data/apps/
root@blackbox-exporter:/data/apps# wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.19.0/blackbox_exporter-0.19.0.linux-amd64.tar.gz
root@blackbox-exporter:/data/apps# tar xf blackbox_exporter-0.19.0.linux-amd64.tar.gz
root@blackbox-exporter:/data/apps# ln -sv blackbox_exporter-0.19.0.linux-amd64 blackbox_exporter
root@blackbox-exporter:/data/apps# cat <
/etc/systemd/system/blackbox.service [Unit] Description=blackbox_exporter Exporter Wants=network-online.target After=network-online.target
[Service] ExecStart=/data/apps/blackbox_exporter/blackbox_exporter —config.file=/data/apps/blackbox_exporter/blackbox.yml StandardOutput=syslog StandardError=syslog SyslogIdentifier=blackbox_exporter [Install] WantedBy=default.target EOF
root@blackbox-exporter:/data/apps# systemctl enable blackbox.service root@blackbox-exporter:/data/apps# systemctl start blackbox.service root@blackbox-exporter:/data/apps# systemctl status blackbox.service
2. prometheus配置blackbox监控http状态, icmp,tcp端口
```bash
root@prometheus-server1:~# vim /data/apps/prometheus/prometheus.yml
...
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_exporter"
static_configs:
- targets: ["10.168.56.111:9100","10.168.56.112:9100"]
- job_name: 'http_status'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets: ['http://www.tianyancha.com','http://10.168.56.112:9090','http://www.baidu.com']
labels:
group: web
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: url
- target_label: __address__
replacement: 10.168.56.112:9115
- job_name: "ping"
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets: ['10.168.56.112','223.6.6.6']
labels:
instance: 'icmp-ping'
group: 'icmp'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: ip
- target_label: __address__
replacement: 10.168.56.112:9115
- job_name: "tcp_connect"
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets: ['10.168.56.112:9100','10.168.56.111:80']
labels:
instance: 'tcp_connect'
group: 'tcp'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: ip
- target_label: __address__
replacement: 10.168.56.112:9115
- 查看状态
Grafana
- 图形界面,可以将promethues作为数据源,展示其数据
安装grafana(10.168.56.112节点)
apt-get install -y adduser libfontconfig1 wget https://dl.grafana.com/enterprise/release/grafana-enterprise_7.5.13_amd64.deb dpkg -i grafana-enterprise_7.5.13_amd64.deb systemctl enable grafana-server.service systemctl start grafana-server.service systemctl status grafana-server.service
浏览器访问,用户名密码都是admin,首次登录需要修改密码
- 添加prometheus数据源
- 导入node-exporter模板:https://grafana.com/grafana/dashboards/11174
- 导入blackbox exporter模板:httphttps://grafana.com/grafana/dashboards/13659