需要做时间同步
下载from:github/tuna
mkdir -p /app/tar xf prometheus-2.33.3.linux-amd64.tar.gz -C /app/ln -s /app/prometheus-2.33.3.linux-amd64/ /app/prometheusln -s /app/prometheus/prometheus /bin/# 启动prometheus --config.file="/app/prometheus/prometheus.yml" &>/var/log/prometheus.log &#访问ip:9090# 开机自启动(或者systemctl p80vim /etc/rc.local# 指定配置文件目录,允许外网访问:9090 最大连接数prometheus --config.file="/app/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090" --web.max-connections=512 &>>/var/log/prometheus.log &
原理/export

下载各种export(没有的话去git node_exporter & ( 配置开机自启动 p80 配置prometheus.yml中的组和主机
页面操作
查询语句
案例:node_cpu_seconds_total{mode!="idle",instance="gra.oldboylinux.cn:9100"}
=~ 匹配,正则
!~ 不匹配,正则
函数:https://prometheus.io/docs/prometheus/2.33/querying/functions/
查询语句可以做加减乘除运算,用括号指定优先级
pushgateway
tar xf pushgateway-1.4.1.linux-amd64.tar.gz -C /app/prometheus/ln -s /app/prometheus/pushgateway-1.4.1.linux-amd64/pushgateway /binpushgateway &>>/var/log/pushgateway.log &ps -ef |grep push# 然后修改prometheus.yml, 重启prometheus
job_name=pushgatewayinstance_name=web01disk_sda_root_total=`df |awk '$NFՎҧ"/"{print $2}'`disk_sda_root_free=`df |awk '$NFՎҧ"/"{print $4}'`disk_sda_root_used=`df |awk '$NFՎҧ"/"{print $3}'`# 发出post请求 数据来自管道echo "oldboy_disk_sda_total $disk_sda_root_total" |curl --data-binary @-http://$instance_name:9091/metrics/job/$job_name/instance/$instance_nameecho "oldboy_disk_sda_free $disk_sda_root_free"|curl --data-binary @-http://$instance_name:9091/metrics/job/$job_name/instance/$instance_nameecho "oldboy_disk_sda_used $disk_sda_root_used"|curl --data-binary @- http://$instance_name:9091/metrics/job/$job_name/instance/$instance_name# 之后将这个脚本写入定时任务
Grafana
插件自带,只要添加数据源。输入http://localhost:9090 即可(time() - node_boot_time_seconds{instance="localhost:9100", job="oldboy-prometheus-all-node_exporter-two"})# 校准机器启动时间
Altermanager
下载:官网或git
tar xf alertmanager-0.24.0.linux-amd64.tar.gz -C /app/prometheusln -s /app/prometheus/alertmanager-0.24.0.linux-amd64/ /app/prometheus/alertmanagerln -s /app/prometheus/alertmanager/alertmanager /bin/alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml --web.listen-address=":9093"
配置 altermanager 脚本
global:
resolve_timeout: 5m
smtp_from: 'lidao996@163.com'
smtp_smarthost: 'smtp.163.com:465'
smtp_hello: '163.com'
smtp_auth_username: 'lidao996@163.com'
smtp_auth_password: 'MMNKQYUHMJON'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'email'
receivers:
- name: "email"
email_configs:
- to: 'youjiu_linux@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
服务端配置文件修改
alerting:
alertmanagers:
- static_configs:
- targets:
- "主机名:9093"
rule_files:
- "/app/prometheus/prometheus_alert_rules.yml"
vim /app/prometheus/prometheus_alert_rules.yml
groups:
- name: check_node_status
rules:
- alert: check_node_is_up
expr: up{job="oldboy-prometheus-all-node_exporter-two"} == 0
for: 15s
labels:
severity: 1
team: node
annotations:
summary: " {{ $labels.instance }} 节点停止运行超过15秒!!! "
