prometheus基础

1、prometheus服务
- prometheus流程图
1.1下载镜像
1.2编辑配置文件
1.3启动prometheus
- 打开ip:9090
3、监控node-export服务
2、alertmanager服务
- 2.1 alertmanager下载
3、pushgateway服务
3.3 pushgateway抛送数据举例
- 使用curl的post功能抛送数据
- 查看页面

1、prometheus服务

prometheus流程图

prometheus基础 - 图1

多维度数据模型（时序列数据又metrics名和一组key/value组成）
灵活的查询语言PromQL
不依赖分布式存储，单节点工作
通过基于HTTP的pull方式采集数据
还可以通过push gateway进行时序列数据推送（pushing）
支持grafana多图标展示
prometheus通过安装在远程机器上的export来监控数据

1.1下载镜像

#下载镜像
docker pull prom/prometheus:v2.11.0

1.2编辑配置文件

#创建文件夹
mkdir /etc/prometheus
#编辑/etc/prometheus/prometheus.yml文件
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 设置抓取间隔，默认为1分钟
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  估算规则的默认周期，每15秒计算一次规则。默认1分钟
  # scrape_timeout is set to the global default (10s).  默认抓取超时，默认为10s
# Alertmanager configuration Alertmanager相关配置
# alerting:
#  alertmanagers:
#  - static_configs:
#    - targets:
#      # - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.  规则文件列表，使用'evaluation_interval' 参数去抓取
rule_files:
  - "/etc/prometheus/*.rules"
  # - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself. 抓取配置列表
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'server'
    static_configs:
      - targets: ['172.31.243.137:9100']
alerting:  #告警相关配置
  alertmanagers:
  - scheme: http
  - static_configs:
    - targets: ["172.31.243.137:9093"]

1.3启动prometheus

#启动容器镜像, 此目录/data/prometheus/为本地存储路径
docker run -d -p 9090:9090 -v  /etc/prometheus/:/etc/prometheus/ -v /data/prometheus/:/prometheus  prom/prometheus:v2.11.0
#查看启动端口
netstat -nltup | grep 9090
tcp6       0      0 :::9090                 :::*                    LISTEN      3737/docker-proxy   
#查看进程
docker ps -a | grep prometheus
4af17fb8fe53        prom/prometheus:v2.11.0                    "/bin/prometheus -..."   About an hour ago   Up About an hour            0.0.0.0:9090->9090/tcp   kind_leakey

打开ip:9090

3、监控node-export服务

vim node_export.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    name: node-exporter
spec:
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: quay.io/prometheus/node-exporter:v0.18.1
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /     
#启动kubectl apply -f ./node_export.yaml
[root@hf-aipaas-172-31-243-137 home]# netstat -nltup |grep 9100
tcp6       0      0 :::9100                 :::*                    LISTEN      17093/node_exporter

2、alertmanager服务

2.1 alertmanager下载

#下载镜像
docker pull quay.io/prometheus/alertmanager:v0.20.0

2.2 编辑配置文件

vim /etc/alertmanager/config.yml
# 全局配置项
global: 
  resolve_timeout: 5m #处理超时时间，默认为5min
  smtp_smarthost: 'smtp.sina.com:25' # 邮箱smtp服务器代理
  smtp_from: '******@sina.com' # 发送邮箱名称
  smtp_auth_username: '******@sina.com' # 邮箱名称
  smtp_auth_password: '******' # 邮箱密码或授权码
  smtp_require_tls: false
# 定义模板信息
templates:
  - 'template/*.tmpl'
# 定义路由树信息
route:
  group_by: ['alertname'] # 报警分组依据
  group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 1m    # 在发送新警报前的等待时间
  repeat_interval: 1m  # 发送重复警报的周期 对于email配置中，此项不可以设置过低，否则将会由于邮件发送太多频繁，被smtp服务器拒绝
  receiver: 'mail-receiver'  # 发送警报的接收者的名称，以下receivers name的名称
# 定义警报接收者信息
receivers:
  - name: 'mail-receiver'  #名字
    email_configs:        #配置
    - to: 'ycli15@iflytek.com'  # 接收警报的email配置
      html: '{{ template "test.html" . }}' # 设定邮箱的内容模板
      headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题
    #webhook_configs: # webhook配置
    #- url: 'http://127.0.0.1:5001'
    #send_resolved: true
# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下，使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。 
inhibit_rules:
  - source_match: 
     severity: 'critical' 
    target_match: 
     severity: 'warning' 
    equal: ['alertname', 'dev', 'instance']
#模板
vim /etc/alertmanager/template/test.tmpl
{{ define "test.html" }}
<table border="1">
        <tr>
                <td>报警项</td>
                <td>实例</td>
                <td>报警阀值</td>
                <td>开始时间</td>
        </tr>
        {{ range $i, $alert := .Alerts }}
                <tr>
                        <td>{{ index $alert.Labels "alertname" }}</td>
                        <td>{{ index $alert.Labels "instance" }}</td>
                        <td>{{ index $alert.Annotations "value" }}</td>
                        <td>{{ $alert.StartsAt }}</td>
                </tr>
        {{ end }}
</table>
{{ end }}

2.3启动alertmanager服务

docker run -d -p 9093:9093 -v /etc/alertmanager/:/etc/alertmanager/ quay.io/prometheus/alertmanager:v0.20.0 --config.file=/etc/alertmanager/config.yml

2.4配置prometheus告警规则

vim /etc/prometheus/alert.rules 
groups:
- name: example
  rules:
  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 1
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  # Alert for any instance that has a median request latency >1s.
  - alert: NODE_NETWORK_UP_DOWN
    expr: node_network_up{device="docker0",instance="172.31.243.137:9100",job="server"}  > 0
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

触发告警

查看prometheus界面Ip:9090

查看alertmanager告警

3、pushgateway服务

3.1 pushgateway下载

docker pull prom/pushgateway:v1.2.0

3.2 安装部署

docker run -d   --name=pg   -p 9091:9091   prom/pushgateway:v1.2.0

3.2 配置prometheus

增加scrapeconfigs配置：job->pushgateway
记得重启prometheus容器_


scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'server'
    static_configs:
      - targets: ['172.31.243.137:9100']
  - job_name: pushgateway
    static_configs:
      - targets: ['172.31.243.137:9091']
        labels:
          instance: pushgateway

查看prometheus监控界面IP:9090

3.3 pushgateway抛送数据举例

使用curl的post功能抛送数据

 cat <<EOF | curl --data-binary @- http://172.31.243.137:9091/metrics/job/some_job/instance/some_instance
> # TYPE some_metric counter
> some_metric{label="val1"} 42
> # TYPE another_metric gauge
> # HELP another_metric Just an example.
> another_metric 2398.283
> EOF

1、prometheus服务

prometheus流程图

1.1下载镜像

1.2编辑配置文件

1.3启动prometheus

打开ip:9090

3、监控node-export服务

2、alertmanager服务

2.1 alertmanager下载

2.2 编辑配置文件

2.3启动alertmanager服务

2.4配置prometheus告警规则

触发告警

查看prometheus界面Ip:9090

查看alertmanager告警

3、pushgateway服务

3.1 pushgateway下载

3.2 安装部署

3.2 配置prometheus

查看prometheus监控界面IP:9090

3.3 pushgateway抛送数据举例

使用curl的post功能抛送数据

查看页面