基础设置

  1. 配置hosts
  2. 172.16.58.78 prome-master01
  3. 172.16.58.79 prome-node01
  4. master上生成ssh key 并拷贝到node
  5. ssh-keygen
  6. ssh-copy-id prome_node_01
  7. # 测试ssh联通
  8. ssh prome_node_01

master安装ansible

  1. yum install -y ansible
  2. # 关闭hostcheck
  3. vim /etc/ansible/ansible.cfg
  4. ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyCheckingno
  5. playbook执行时需要设置机器文件
  6. cat <<EOF > /opt/tgzs/host_file
  7. prome-master01
  8. prome-node01
  9. EOF
  10. 设置syslog logrotate服务
  11. ansible-playbook -i host_file init_syslog_logrotate.yaml
  12. 编写ansible 发布服务脚本
  13. ansible-playbook -i host_file service_deploy.yaml -e "tgz=node_exporter-1.1.2.linux-amd64.tar.gz" -e "app=node_exporter"
  14. 检查node_exporter服务状态
  15. ansible -i host_file all -m shell -a " ps -ef |grep node_exporter|grep -v grep "

init_syslog_logrotate.yaml
service_deploy.yaml

浏览器访问

  1. IP:9100/metrics
  2. IP:9100/metrics

本机访问数据

  1. [root@prome_master_01 tgzs]# curl -s localhost:9100/metrics |grep node_ |head -20
  2. # HELP node_arp_entries ARP entries by device
  3. # TYPE node_arp_entries gauge
  4. node_arp_entries{device="eth0"} 3
  5. # HELP node_boot_time_seconds Node boot time, in unixtime.
  6. # TYPE node_boot_time_seconds gauge
  7. node_boot_time_seconds 1.616987084e+09
  8. # HELP node_context_switches_total Total number of context switches.
  9. # TYPE node_context_switches_total counter
  10. node_context_switches_total 2.105979e+06
  11. # HELP node_cooling_device_cur_state Current throttle state of the cooling device
  12. # TYPE node_cooling_device_cur_state gauge
  13. node_cooling_device_cur_state{name="0",type="Processor"} 0
  14. node_cooling_device_cur_state{name="1",type="Processor"} 0
  15. node_cooling_device_cur_state{name="2",type="Processor"} 0
  16. node_cooling_device_cur_state{name="3",type="Processor"} 0
  17. # HELP node_cooling_device_max_state Maximum throttle state of the cooling device
  18. # TYPE node_cooling_device_max_state gauge
  19. node_cooling_device_max_state{name="0",type="Processor"} 0
  20. node_cooling_device_max_state{name="1",type="Processor"} 0
  21. node_cooling_device_max_state{name="2",type="Processor"} 0

默认开启的采集项目介绍

image.png

关闭默认开启采集项

  1. --no-collector.<name> flag
  2. # 未开启前
  3. [root@prome_master_01 node_exporter]# curl -s localhost:9100/metrics |grep node_cpu
  4. # HELP node_cpu_guest_seconds_total Seconds the CPUs spent in guests (VMs) for each mode.
  5. # TYPE node_cpu_guest_seconds_total counter
  6. node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
  7. node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
  8. node_cpu_guest_seconds_total{cpu="1",mode="nice"} 0
  9. node_cpu_guest_seconds_total{cpu="1",mode="user"} 0
  10. node_cpu_guest_seconds_total{cpu="2",mode="nice"} 0
  11. node_cpu_guest_seconds_total{cpu="2",mode="user"} 0
  12. node_cpu_guest_seconds_total{cpu="3",mode="nice"} 0
  13. node_cpu_guest_seconds_total{cpu="3",mode="user"} 0
  14. # HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
  15. # TYPE node_cpu_seconds_total counter
  16. node_cpu_seconds_total{cpu="0",mode="idle"} 17691.27
  17. node_cpu_seconds_total{cpu="0",mode="iowait"} 8.9
  18. node_cpu_seconds_total{cpu="0",mode="irq"} 0
  19. node_cpu_seconds_total{cpu="0",mode="nice"} 0.32
  20. node_cpu_seconds_total{cpu="0",mode="softirq"} 0.28
  21. node_cpu_seconds_total{cpu="0",mode="steal"} 2.7
  22. # 关闭cpu采集
  23. ./node_exporter --no-collector.cpu
  24. curl -s localhost:9100/metrics |grep node_cpu

关闭默认采集器项而开机器某些采集

  1. --collector.disable-defaults --collector.<name> .
  2. # 只开启mem采集
  3. ./node_exporter --collector.disable-defaults --collector.meminfo
  4. # 只开启mem 和cpu 采集
  5. ./node_exporter --collector.disable-defaults --collector.meminfo --collector.cpu

默认关闭的关闭原因

  • 太重
  • 太慢
  • 太多的开销

image.png

禁用golang sdk 指标

  • 使用 --web.disable-exporter-metrics
  • promhttp_ 代表访问/metrics 的http情况
    1. [root@prome_master_01 tgzs]# curl -s localhost:9100/metrics |grep promhttp_
    2. # HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
    3. # TYPE promhttp_metric_handler_errors_total counter
    4. promhttp_metric_handler_errors_total{cause="encoding"} 0
    5. promhttp_metric_handler_errors_total{cause="gathering"} 0
    6. # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
    7. # TYPE promhttp_metric_handler_requests_in_flight gauge
    8. promhttp_metric_handler_requests_in_flight 1
    9. # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
    10. # TYPE promhttp_metric_handler_requests_total counter
    11. promhttp_metric_handler_requests_total{code="200"} 8
    12. promhttp_metric_handler_requests_total{code="500"} 0
    13. promhttp_metric_handler_requests_total{code="503"} 0
    go_代表goruntime信息等 ```bash

    HELP go_goroutines Number of goroutines that currently exist.

    TYPE go_goroutines gauge

    go_goroutines 7

    HELP go_info Information about the Go environment.

    TYPE go_info gauge

    go_info{version=”go1.15.8”} 1

    HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.

    TYPE go_memstats_alloc_bytes gauge

    go_memstats_alloc_bytes 2.781752e+06
  1. - process 代表进程信息等
  2. ```bash
  3. # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
  4. # TYPE process_cpu_seconds_total counter
  5. process_cpu_seconds_total 0.54
  6. # HELP process_max_fds Maximum number of open file descriptors.
  7. # TYPE process_max_fds gauge
  8. process_max_fds 1024
  9. # HELP process_open_fds Number of open file descriptors.
  10. # TYPE process_open_fds gauge
  11. process_open_fds 9
  12. # HELP process_resident_memory_bytes Resident memory size in bytes.
  13. # TYPE process_resident_memory_bytes gauge
  14. process_resident_memory_bytes 1.5720448e+07

节点上自打点数据上报

  • —collector.textfile.directory=”” 配置本地采集目录
  • 在采集目录里创建.prom文件 ```bash

    创建目录

    mkdir ./text_file_dir

    准备 prom文件

    cat < ./text_file_dir/test.prom

    HELP nyy_test_metric just test

    TYPE nyy_test_metric gauge

    nyy_test_metric{method=”post”,code=”200”} 1027 EOF

启动服务

./node_exporter —collector.textfile.directory=./text_file_dir

curl查看数据

[root@prome_master_01 tgzs]# curl -s localhost:9100/metrics |grep nyy

HELP nyy_test_metric just test

TYPE nyy_test_metric gauge

nyy_test_metric{code=”200”,method=”post”} 1027

  1. http传入参数,按采集器过滤指标
  2. 原理:通过http请求参数过滤采集器
  3. ```go
  4. func (h *handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
  5. filters := r.URL.Query()["collect[]"]
  6. level.Debug(h.logger).Log("msg", "collect query:", "filters", filters)
  7. if len(filters) == 0 {
  8. // No filters, use the prepared unfiltered handler.
  9. h.unfilteredHandler.ServeHTTP(w, r)
  10. return
  11. }
  12. // To serve filtered metrics, we create a filtering handler on the fly.
  13. filteredHandler, err := h.innerHandler(filters...)
  14. if err != nil {
  15. level.Warn(h.logger).Log("msg", "Couldn't create filtered metrics handler:", "err", err)
  16. w.WriteHeader(http.StatusBadRequest)
  17. w.Write([]byte(fmt.Sprintf("Couldn't create filtered metrics handler: %s", err)))
  18. return
  19. }
  20. filteredHandler.ServeHTTP(w, r)
  21. }

http访问

  1. # 只看cpu采集器的指标
  2. http://IP:9100/metrics?collect[]=cpu
  3. # 只看cpu和mem采集器的指标
  4. http://IP:9100/metrics?collect[]=cpu&collect[]=meminfo

prometheus配置

  1. params:
  2. collect[]:
  3. - cpu
  4. - meminfo
  5. 两种导入模式
  6. - url导入
  7. - json文件导入

两种导入模式
- url导入
- json文件导入
https://grafana.com/grafana/dashboards/8919

配置数据源

image.png

image.png

验证

image.png
image.png

手动安装node exporter

上传软件包到/server下

  1. tar xf node_exporter-1.1.2.linux-amd64.tar.gz -C /opt
  2. cd /opt/node_exporter-1.1.2.linux-amd64/
  3. mv node_exporter /usr/local/bin/
  4. cat /etc/systemd/system/node_exporter.service
  5. [Unit]
  6. Description=Node Exporter
  7. After=network.target
  8. [Service]
  9. User=root
  10. Group=root
  11. Type=simple
  12. ExecStart=/usr/local/bin/node_exporter
  13. [Install]
  14. WantedBy=multi-user.target
  15. 1016 systemctl daemon-reload
  16. 1017 systemctl start node_exporter
  17. 1018 systemctl status node_exporter.service
  18. 1019 systemctl enable node_exporter
  19. 1020 curl localhost:9100/metrics

普罗米修斯端配置

  1. [root@Server-d0449e6d-d612-49ba-80cc-070b579955d6 prometheus-2.25.2.linux-amd64]# cat prometheus.yml
  2. - job_name: node_exporter
  3. honor_timestamps: true
  4. scrape_interval: 15s
  5. scrape_timeout: 10s
  6. metrics_path: /metrics
  7. scheme: http 采集方式默认http
  8. static_configs:
  9. - targets: 采集两台node
  10. - 192.168.0.107:7100
  11. - 192.168.0.56:9100

image.png

targets解说

  1. - job 分组情况
  2. - endpoint 实例地址
  3. - state 采集是否成功
  4. - label 标签组
  5. - Last Scrape 上次采集到现在的间隔时间
  6. - Scrape Duration 上次采集耗时
  7. - Error 采集错误