需要做时间同步

下载from:github/tuna

  1. mkdir -p /app/
  2. tar xf prometheus-2.33.3.linux-amd64.tar.gz -C /app/
  3. ln -s /app/prometheus-2.33.3.linux-amd64/ /app/prometheus
  4. ln -s /app/prometheus/prometheus /bin/
  5. # 启动
  6. prometheus --config.file="/app/prometheus/prometheus.yml" &>/var/log/prometheus.log &
  7. #访问
  8. ip9090
  9. # 开机自启动(或者systemctl p80
  10. vim /etc/rc.local
  11. # 指定配置文件目录,允许外网访问:9090 最大连接数
  12. prometheus --config.file="/app/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090" --web.max-connections=512 &>>/var/log/prometheus.log &

原理/export

image.png

下载各种export(没有的话去git node_exporter & ( 配置开机自启动 p80 配置prometheus.yml中的组和主机

页面操作

必须勾选使用本地时间
可以勾选历史查询

查询语句

案例:node_cpu_seconds_total{mode!="idle",instance="gra.oldboylinux.cn:9100"}
=~ 匹配,正则
!~ 不匹配,正则
函数:https://prometheus.io/docs/prometheus/2.33/querying/functions/

查询语句可以做加减乘除运算,用括号指定优先级

pushgateway

  1. tar xf pushgateway-1.4.1.linux-amd64.tar.gz -C /app/prometheus/
  2. ln -s /app/prometheus/pushgateway-1.4.1.linux-amd64/pushgateway /bin
  3. pushgateway &>>/var/log/pushgateway.log &
  4. ps -ef |grep push
  5. # 然后修改prometheus.yml, 重启prometheus
  1. job_name=pushgateway
  2. instance_name=web01
  3. disk_sda_root_total=`df |awk '$NFՎҧ"/"{print $2}'`
  4. disk_sda_root_free=`df |awk '$NFՎҧ"/"{print $4}'`
  5. disk_sda_root_used=`df |awk '$NFՎҧ"/"{print $3}'`
  6. # 发出post请求 数据来自管道
  7. echo "oldboy_disk_sda_total $disk_sda_root_total" |curl --data-binary @-http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name
  8. echo "oldboy_disk_sda_free $disk_sda_root_free"|curl --data-binary @-
  9. http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name
  10. echo "oldboy_disk_sda_used $disk_sda_root_used"|curl --data-binary @- http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name
  11. # 之后将这个脚本写入定时任务

Grafana

插件自带,只要添加数据源。输入http://localhost:9090 即可
(time() - node_boot_time_seconds{instance="localhost:9100", job="oldboy-prometheus-all-node_exporter-two"})# 校准机器启动时间

Altermanager

下载:官网或git

  1. tar xf alertmanager-0.24.0.linux-amd64.tar.gz -C /app/prometheus
  2. ln -s /app/prometheus/alertmanager-0.24.0.linux-amd64/ /app/prometheus/alertmanager
  3. ln -s /app/prometheus/alertmanager/alertmanager /bin/
  4. alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml --web.listen-address=":9093"

配置 altermanager 脚本


global:
  resolve_timeout:   5m
  smtp_from:          'lidao996@163.com'
  smtp_smarthost:     'smtp.163.com:465'
  smtp_hello:         '163.com'
  smtp_auth_username: 'lidao996@163.com'
  smtp_auth_password: 'MMNKQYUHMJON'
  smtp_require_tls:   false
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
  - name: "email"
    email_configs:
    - to: 'youjiu_linux@qq.com'
      send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

服务端配置文件修改

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - "主机名:9093"
rule_files:
  - "/app/prometheus/prometheus_alert_rules.yml"



vim  /app/prometheus/prometheus_alert_rules.yml
groups:
- name: check_node_status
  rules:
  - alert: check_node_is_up
    expr: up{job="oldboy-prometheus-all-node_exporter-two"} == 0 
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: " {{ $labels.instance }} 节点停止运行超过15秒!!! "