需要做时间同步

下载from：github/tuna

mkdir -p /app/
tar xf prometheus-2.33.3.linux-amd64.tar.gz -C /app/
ln -s /app/prometheus-2.33.3.linux-amd64/ /app/prometheus
ln -s /app/prometheus/prometheus /bin/
# 启动
prometheus --config.file="/app/prometheus/prometheus.yml" &>/var/log/prometheus.log &
#访问
ip：9090
# 开机自启动（或者systemctl p80
vim /etc/rc.local
# 指定配置文件目录，允许外网访问:9090 最大连接数
prometheus --config.file="/app/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090" --web.max-connections=512 &>>/var/log/prometheus.log &

原理/export

下载各种export（没有的话去git node_exporter & （配置开机自启动 p80 配置prometheus.yml中的组和主机

页面操作

必须勾选使用本地时间
可以勾选历史查询

查询语句

案例：node_cpu_seconds_total{mode!="idle",instance="gra.oldboylinux.cn:9100"}
=~ 匹配，正则
!~ 不匹配，正则
函数：https://prometheus.io/docs/prometheus/2.33/querying/functions/

查询语句可以做加减乘除运算，用括号指定优先级

pushgateway

tar xf pushgateway-1.4.1.linux-amd64.tar.gz -C /app/prometheus/
ln -s /app/prometheus/pushgateway-1.4.1.linux-amd64/pushgateway /bin
pushgateway &>>/var/log/pushgateway.log &
ps -ef |grep push
# 然后修改prometheus.yml, 重启prometheus

job_name=pushgateway
instance_name=web01
disk_sda_root_total=`df |awk '$NFՎҧ"/"{print $2}'`
disk_sda_root_free=`df |awk '$NFՎҧ"/"{print $4}'`
disk_sda_root_used=`df |awk '$NFՎҧ"/"{print $3}'`
#                                                      发出post请求  数据来自管道
echo "oldboy_disk_sda_total $disk_sda_root_total" |curl --data-binary @-http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name
echo "oldboy_disk_sda_free $disk_sda_root_free"|curl --data-binary @-
http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name
echo "oldboy_disk_sda_used $disk_sda_root_used"|curl --data-binary @- http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name
# 之后将这个脚本写入定时任务

Grafana

插件自带，只要添加数据源。输入http://localhost:9090 即可
(time() - node_boot_time_seconds{instance="localhost:9100", job="oldboy-prometheus-all-node_exporter-two"})# 校准机器启动时间

Altermanager

下载：官网或git

tar xf alertmanager-0.24.0.linux-amd64.tar.gz -C /app/prometheus
ln -s /app/prometheus/alertmanager-0.24.0.linux-amd64/ /app/prometheus/alertmanager
ln -s /app/prometheus/alertmanager/alertmanager /bin/
alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml --web.listen-address=":9093"

配置 altermanager 脚本


global:
  resolve_timeout:   5m
  smtp_from:          'lidao996@163.com'
  smtp_smarthost:     'smtp.163.com:465'
  smtp_hello:         '163.com'
  smtp_auth_username: 'lidao996@163.com'
  smtp_auth_password: 'MMNKQYUHMJON'
  smtp_require_tls:   false
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
  - name: "email"
    email_configs:
    - to: 'youjiu_linux@qq.com'
      send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

服务端配置文件修改

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - "主机名:9093"
rule_files:
  - "/app/prometheus/prometheus_alert_rules.yml"



vim  /app/prometheus/prometheus_alert_rules.yml
groups:
- name: check_node_status
  rules:
  - alert: check_node_is_up
    expr: up{job="oldboy-prometheus-all-node_exporter-two"} == 0 
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: " {{ $labels.instance }} 节点停止运行超过15秒!!! "