监控概述

  • 系统底层监控
    • CPU
    • 内存
    • 网卡
    • 磁盘
    • 带宽利用率
    • 延迟、丢包
    • 交换机,路由器,防火墙等基础设施
  • web监控
    • web打开速度
    • URL状态码
    • API接口可用性
  • 业务监控
    • 订单交易量
    • 活跃用户
    • 可用性
  • 中间件监控

    • 数据库
    • MQ

      监控规划

  • 基础设施监控

image.png

  • 业务维度监控

image.png

Prometheus

简介

  • 官网:https://prometheus.io/
  • github:https://github.com/prometheus/prometheus

    组件介绍

  • prometheus server: 数据query,store,scripy

  • prometheus targets: 静态收集的目标服务数据
  • service discovery: 动态发现服务
  • prometheus alerting: 报警通知
  • push gateway: 数据收集代理服务
  • data visualization andexport: 数据可视化与数据导出

image.png

部署方式

  1. docker 安装

    1. root@prometheus-server1:~# curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
  2. 配置镜像加速并重启docker ```bash tee /etc/docker/daemon.json <<-‘EOF’ { “registry-mirrors”: [“https://0b8hhs68.mirror.aliyuncs.com“], “data-root”: “/data/docker” } EOF

systemctl restart docker


3. 基于docker部署prometheus
```bash
root@prometheus-server1:~#root@prometheus-server1:/data/apps/prometheus# docker run -p 9090:9090 prom/prometheus
  1. web访问

image.png

基于k8s operator部署

  1. 安装prometheus

    root@prometheus-server1:~# cd /data/apps/
    root@prometheus-server1:/data/apps# wget https://github.com/prometheus/prometheus/releases/download/v2.33.1/prometheus-2.33.1.linux-amd64.tar.gz
    root@prometheus-server1:/data/apps# tar xf prometheus-2.33.1.linux-amd64.tar.gz 
    root@prometheus-server1:/data/apps# ln -sv prometheus-2.33.1.linux-amd64 prometheus
    
  2. 配置systemctl 并做开机自启动 ```bash root@prometheus-server1:~# cd /data/apps/prometheus root@prometheus-server1:/data/apps/prometheus# mkdir data root@prometheus-server1:/data/apps/prometheus# tee /etc/systemd/system/prometheus.service <<-‘EOF’ [Unit] Description=”prometheus” Documentation=https://prometheus.io/ After=network.target

[Service] Type=simple ExecStart=/data/apps/prometheus/prometheus —config.file=/data/apps/prometheus/prometheus.yml —storage.tsdb.path=/data/apps/prometheus/data —web.enable-lifecycle

Restart=on-failure RestartSecs=5s SuccessExitStatus=0 LimitNOFILE=65536 StandardOutput=syslog StandardError=syslog SyslogIdentifier=prometheus

[Install] WantedBy=multi-user.target EOF

root@prometheus-server1:/data/apps/prometheus# systemctl enable prometheus.service root@prometheus-server1:/data/apps/prometheus# systemctl start prometheus.service root@prometheus-server1:/data/apps/prometheus# systemctl status prometheus.service


3. web节点访问

![image.png](https://cdn.nlark.com/yuque/0/2022/png/2391625/1644247353236-8bf836f5-0b5e-4ded-b745-6e637829b504.png#clientId=u8130affa-b051-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=461&id=uf864154d&margin=%5Bobject%20Object%5D&name=image.png&originHeight=461&originWidth=1183&originalType=binary&ratio=1&rotation=0&showTitle=false&size=43726&status=done&style=none&taskId=uacafc351-6b01-49a5-bcb1-f08f9092e17&title=&width=1183)

4. 动态reload命令
```bash
root@prometheus-server1:/data/apps/prometheus# curl -X POST http://localhost:9090/-/reload
  1. 部署node-exporter并设置开机自启动(部署在10.168.56.111,10.168.56.112) ```bash root@prometheus-server1:~# cd /data/apps/ root@prometheus-server1:/data/apps# wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz root@prometheus-server1:/data/apps# ln -sv node_exporter-1.3.1.linux-amd64 node_exporter root@prometheus-server1:/data/apps# cat < /etc/systemd/system/node_exporter.service [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target

[Service] ExecStart=/data/apps/node_exporter/node_exporter StandardOutput=syslog StandardError=syslog SyslogIdentifier=node_exporter [Install] WantedBy=default.target EOF

root@prometheus-server1:/data/apps# systemctl enable node_exporter.service root@prometheus-server1:/data/apps# systemctl start node_exporter.service root@prometheus-server1:/data/apps# systemctl status node_exporter.service


6. prometheus配置监控node节点
```bash
root@prometheus-server1:~# vim /data/apps/prometheus/prometheus.yml 
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
rule_files:
  # - "first_rules.yml"
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "node_exporter"
    static_configs:
      - targets: ["10.168.56.111:9100","10.168.56.112:9100"]

root@prometheus-server1:~# curl -X POST http://localhost:9090/-/reload
  1. 查看target

image.png

blackbox Exporter

  • 监控对象
    • http/https: url/api可用性检测
    • tcp: 端口监听检测
    • icmp: 主机存活检测
    • DNS: 域名解析
  1. 部署blackbox exporter(10.168.56.112) ```bash root@blackbox-exporter:~# mkdir /data/apps/ && cd /data/apps/ root@blackbox-exporter:/data/apps# wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.19.0/blackbox_exporter-0.19.0.linux-amd64.tar.gz root@blackbox-exporter:/data/apps# tar xf blackbox_exporter-0.19.0.linux-amd64.tar.gz root@blackbox-exporter:/data/apps# ln -sv blackbox_exporter-0.19.0.linux-amd64 blackbox_exporter root@blackbox-exporter:/data/apps# cat < /etc/systemd/system/blackbox.service [Unit] Description=blackbox_exporter Exporter Wants=network-online.target After=network-online.target

[Service] ExecStart=/data/apps/blackbox_exporter/blackbox_exporter —config.file=/data/apps/blackbox_exporter/blackbox.yml StandardOutput=syslog StandardError=syslog SyslogIdentifier=blackbox_exporter [Install] WantedBy=default.target EOF

root@blackbox-exporter:/data/apps# systemctl enable blackbox.service root@blackbox-exporter:/data/apps# systemctl start blackbox.service root@blackbox-exporter:/data/apps# systemctl status blackbox.service


2. prometheus配置blackbox监控http状态, icmp,tcp端口
```bash
root@prometheus-server1:~# vim /data/apps/prometheus/prometheus.yml
...
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "node_exporter"
    static_configs:
      - targets: ["10.168.56.111:9100","10.168.56.112:9100"]
  - job_name: 'http_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['http://www.tianyancha.com','http://10.168.56.112:9090','http://www.baidu.com']
        labels:
          group: web
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: url
      - target_label: __address__
        replacement: 10.168.56.112:9115
  - job_name: "ping"
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: ['10.168.56.112','223.6.6.6']
        labels:
          instance: 'icmp-ping'
          group: 'icmp'
    relabel_configs:
     - source_labels: [__address__]
       target_label: __param_target
     - source_labels: [__param_target]
       target_label: ip
     - target_label: __address__
       replacement: 10.168.56.112:9115
  - job_name: "tcp_connect"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: ['10.168.56.112:9100','10.168.56.111:80']
        labels:
          instance: 'tcp_connect'
          group: 'tcp'
    relabel_configs:
     - source_labels: [__address__]
       target_label: __param_target
     - source_labels: [__param_target]
       target_label: ip
     - target_label: __address__
       replacement: 10.168.56.112:9115
  1. 查看状态

image.png

Grafana

  • 图形界面,可以将promethues作为数据源,展示其数据
  1. 安装grafana(10.168.56.112节点)

    apt-get install -y adduser libfontconfig1
    wget https://dl.grafana.com/enterprise/release/grafana-enterprise_7.5.13_amd64.deb
    dpkg -i grafana-enterprise_7.5.13_amd64.deb
    systemctl enable grafana-server.service
    systemctl start grafana-server.service
    systemctl status grafana-server.service
    
  2. 浏览器访问,用户名密码都是admin,首次登录需要修改密码

image.png

  1. 添加prometheus数据源

image.png
image.png

  1. 导入node-exporter模板:https://grafana.com/grafana/dashboards/11174

image.png

  1. 导入blackbox exporter模板:httphttps://grafana.com/grafana/dashboards/13659