1 pushgateway的概念
2 pushgateway的利弊
- （1）利
- （2）弊
3 pushgateway部署
- 配置prometheus与pushgateway连接通信
4 pushgateway脚本示例
- (1)TCP连接
!/bin/bash
获取hostname，且host不能为localhost
For waiting connections
!/bin/bash
!/bin/bash
prometheus 基于时间序列，只接收数值类型数据，不能接收字符串类型

1 pushgateway的概念

客户端或者服务端安装pushgateway插件，被监控端使用运维自行开发的各种脚本把监控数据组织成K/V的形式 metrics形式发送给pushgateway，之后prometheus来pushgateway端拉取相关采集指标数据。
与exporter相反，pushgateway相当于prometheus与被监控端之间的代理，pushgateway只负责被动接收客户端运行脚本发送过来的metrics，不负责对客户端进行探测，主动采集。

2 pushgateway的利弊

（1）利

exporter虽然采集类型很丰富，但是我们依然需要很多自制的监控数据
exporter由于数据类型采集量大，其实很多数据我们监控中用不到，用pushgateway定制一项数据就节省一份采集资源
exporter虽然数据很丰富，但是依然无法提供一些我们需要的采集形式，使用pushgateway就可以使采集的数据形式任意灵活
一个新的自定义pushgateway脚本比开发一个全新的exporter简单快速

（2）弊

将多个节点数据汇总到 pushgateway, 如果 pushgateway 挂了，受影响比多个 target 大。
Prometheus 拉取状态 up 只针对 pushgateway, 无法做到对每个节点有效。
Pushgateway 可以持久化推送给它的所有监控数据，因此，即使你的监控已经下线，prometheus 还会拉取到旧的监控数据，需要手动清理 pushgateway 不要的数据
3 pushgateway部署
```
wget -c https://github.com/prometheus/pushgateway/releases/download/v1.2.0/pushgateway-1.2.0.linux-amd64.tar.gz
tar xf pushgateway-1.2.0.linux-amd64.tar.gz -C /opt
cd /opt && mv pushgateway-1.2.0.linux-amd64 pushgateway
```
systemd启动方式
```
cat <<EOF >/usr/lib/systemd/system/pushgateway.service
[Unit]
Description=prometheus
After=network.target
[Service]
WorkingDirectory=/opt/pushgateway
ExecStart=/opt/pushgateway/pushgateway \
--web.enable-admin-api \
--persistence.file="pushfile.txt" \
--persistence.interval=10m
[Install]
WantedBy=multi-user.target
EOF
#启动
systemctl start pushgateway && systemctl enable pushgateway
```
pushgateway默认监听本地9091端口，可通过web访问，在web上可以查询到客户端推送到pushgateway上的指标数据
配置prometheus与pushgateway连接通信
配置prometheus.yml，scrape.configs段添加一下内容
```
job_name: pushgateway
static_configs:
targets: ["172.16.0.9:9091"]
```
4 pushgateway脚本示例
(1)TCP连接
pushgateway本身没有任何抓取监控数据的功能，它只能被动地等待数据被推送过来，故需要用户自行编写数据采集脚本。
例：采集TCP waiting_connection瞬时数量

自定义采集脚本 ``` mkdir -p /app/scripts/pushgateway

cat </app/scripts/pushgateway/tcp_waiting_connection.sh

!/bin/bash

获取hostname，且host不能为localhost

instance_name=hostname -f | cut -d '.' -f 1 if [ $instance_name = “localhost” ];then echo “Must FQDN hostname” exit 1 fi

For waiting connections

label=”count_netstat_wait_connetions” count_netstat_wait_connetions=netstat -an | grep -i wait | wc -l echo “$label:$count_netstat_wait_connetions” echo “$label $count_netstat_wait_connetions” | curl —data-binary @- http://localhost:9091/metrics/job/pushgateway/instance/$instance_name

EOF

chmod +x /app/scripts/pushgateway/tcp_waiting_connection.sh

1)netstat -an | grep -i wait | wc -l该自定义监控的取值方法<br />2)实际上就是将K/V键值对通过POST方式推送给pushgateway，格式如下：
- http://localhost:9091/metrics  pushgateway url
- job/pushgateway数据推送过去的第一个label，即exported_job="pushgateway"（类似prometheus.yml中定义的job）
- instance/$instance_name数据推送过去的第一个label，即exported_instance="deepin-PC"
2.定时执行脚本

crontab -e

/app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1

prometheus默认每15秒从pushgateway获取一次数据，而cron定时任务最小精度是每分钟执行一次，若想没15秒执行一次，则：<br />方法1：sleep：定义多条定时任务

- - - - /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
- - - - sleep 15; /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
- - - - sleep 30; /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
- - - - sleep 45; /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
        方法2：for循环
        cat </app/scripts/pushgateway/tcp_waiting_connection.sh
        !/bin/bash
        time=15 for (( i=0; i<60; i=i+time )); do instance_name=hostname -f | cut -d '.' -f 1 if [ $instance_name = “localhost” ];then echo “Must FQDN hostname” exit 1 fi label=”count_netstat_wait_connetions” count_netstat_wait_connetions=netstat -an | grep -i wait | wc -l echo “$label:$count_netstat_wait_connetions” echo “$label $count_netstat_wait_connetions” | curl —data-binary @- http://localhost:9091/metrics/job/pushgateway/instance/$instance_name
sleep $time
done exit 0

EOF

此时cron定时任务只需要定义一条：

crontab -e

/app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1

注：若解释器使用#!/bin/bash，则调试时使用全路径或相对路径或者bash /app/scripts/pushgateway/tcp_waiting_connection.sh执行脚本；若解释器使用#!/bin/sh，则调试时使用sh /app/scripts/pushgateway/tcp_waiting_connection.sh执行脚本，否则出现错误：Syntax error: Bad for loop variable<br />3.promethues查看监控值count_netstat_wait_connetions<br />4.TCP等待连接数：count_netstat_wait_connetions（通过自定义脚本实现，通过node_exporter也可实现）<br />处于各种wait状态的TCP连接（close_wait，time_wait等）也是日常排查负载（网络负载，服务器负载，数据库负载等）的一个重要指标：一般wait类型的TCP过大时，一定说明系统网络负载（流量负载）出现了问题；原因多样（网络问题，访问请求量，DDOS流量，数据库，CPU等都有可能）
<a name="CCiN6"></a>
## (2)网络丢包率和延时
主要是服务器内网流量、ping延迟、丢包率<br />1.自定义采集脚本

!/bin/bash

lostpk=timeout 5 ping -q -A -s 500 -W 1000 -c 100 10.0.0.1 | grep transmitted | awk '{print $6}' rrt=timeout 5 ping -a -A -s 500 -W 1000 -c 100 10.0.0.1 | grep transmitted | awk '{print $10}'

prometheus 基于时间序列，只接收数值类型数据，不能接收字符串类型

value_lostpk=echo $lostpk | sed "s/%//g" value_rrt=echo $rrt | sed "s/ms//g"

echo “lostpk${instance_name}_to_prometheus: $value_lostpk” echo “lostpk${instance_name}_to_prometheus $value_lostpk” | curl —data-binary @- http://localhost:9091/metrics/job/pushgateway/instance/localhost:9091

echo “rrt${instance_name}_to_prometheus: $value_rrt” echo “rrt${instance_name}_to_prometheus $value_rrt” | curl —data-binary @- http://localhost:9091/metrics/job/pushgateway/instance/localhost:9091 ``` lostpk 丢包率
rrt 延迟
默认的ping www.baidu.com只能检测网络连通性，通过指定一些参数多发送一些大一点数据包模拟发送请求，便于鉴别当前网络状况：

-s一个ping包大小
-W延迟timeout
-c发送多少个数据包

此时监控的为内网延时，内网分为物理内网和逻辑内网（云计算中，跨地域建立的逻辑内网，实际上是通过公网传输；用户无法知晓真实底层网络状态，有可能一台交换机连着数量庞大的服务器，广播风暴等影响网络）。内网丢包率发生概率很低，如果内网丢包严重，则网络也就无法使用了。
对于网络的监控，可以使用smokeping等更专业的网络监控工具来采集数据。smokeping自身具有数据可视化功能，但也可将smokeping采集的数据通过脚本输入至pushgateway中，统一做可视化和报警。
注： https://blog.51cto.com/root/3033785

09 pushgateway自定义脚本采集监控数据

1 pushgateway的概念

2 pushgateway的利弊

（1）利

（2）弊

3 pushgateway部署

配置prometheus与pushgateway连接通信

4 pushgateway脚本示例

(1)TCP连接

!/bin/bash

获取hostname，且host不能为localhost

For waiting connections

!/bin/bash

!/bin/bash

prometheus 基于时间序列，只接收数值类型数据，不能接收字符串类型