Prometheus Expressions
The PromQL expressions in this doc can be used to configure alerts.
此文档中的PromQL表达式可用于配置警报。
Before expression can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring at the cluster level or at the project level.
在警报中使用expression之前,必须启用监控服务。有关更多信息,请参阅有关在集群级别或项目级别启用监视的文档。
For more information about querying Prometheus, refer to the official Prometheus documentation.
有关查询普罗米修斯的更多信息,请参考普罗米修斯官方文档。、
Cluster Metrics 集群指标
- Cluster CPU Utilization 集群CPU使用率
- Cluster Load Average 集群负载
- Cluster Memory Utilization 集群内存使用率
- Cluster Disk Utilization 集群磁盘使用率
- Cluster Disk I/O 集群磁盘I/O
- Cluster Network Packets 集群网络数据包
- Cluster Network I/O 集群网络I/O
Node Metrics 节点指标
- Node CPU Utilization 节点CPU使用率
- Node Load Average 节点负载
- Node Memory Utilization 节点内存使用率
- Node Disk Utilization 磁盘使用率
- Node Disk I/O 磁盘I/O
- Node Network Packets 节点网络数据包
- Node Network I/O 节点网络I/O
Etcd Metrics etcd指标
- Etcd Has a Leader etcd有leader
- Number of Times the Leader Changes leader的变更次数
- Number of Failed Proposals proposal失败的次数
- GRPC Client Traffic GRPC客户端阻塞
- Peer Traffic 对等(peer)流量
- DB Size DB大小
- Active Streams 激活的流
- Raft Proposals Raft提案
- RPC Rate RPC率
- Disk Operations 磁盘操作
- Disk Sync Duration 磁盘同步耗时
Kubernetes Components Metrics k8s组件指标
- API Server Request Latency API服务请求延时
- API Server Request Rate API服务请求速度
- Scheduling Failed Pods pod调度失败
- Controller Manager Queue Depth Controller Manager队列深度
- Scheduler E2E Scheduling Latency 调度器E2E调度延时
- Scheduler Preemption Attempts 调度器抢占尝试次数?
- Ingress Controller Connections Ingress控制器的链接数
- Ingress Controller Request Process Time Ingress控制器请求处理时间
Rancher Logging Metrics Rancher日志指标
- Fluentd Buffer Queue Rate Fluentd缓冲区队列速度
- Fluentd Input Rate 输入速度
- Fluentd Output Errors Rate 输出错误率
- Fluentd Output Rate 输出速度
Workload Metrics 工作负载指标
- Workload CPU Utilization 工作负载CPU使用率
- Workload Memory Utilization 内存使用率
- Workload Network Packets 网络数据包
- Workload Network I/O 网络I/O
- Workload Disk I/O 磁盘I/O
Pod Metrics pod指标
- Pod CPU Utilization pod CPU使用率
- Pod Memory Utilization pod内存使用率
- Pod Network Packets 网络数据包
- Pod Network I/O 网络I/O
- Pod Disk I/O 磁盘I/O
Container Metrics 容器指标
- Container CPU Utilization 容器CPU使用率
- Container Memory Utilization 内存使用率
- Container Disk I/O 磁盘I/O
Cluster Metrics
Cluster CPU Utilization
CATALOG | EXPRESSION |
---|---|
Detail | 1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) |
Summary | 1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))) |
Cluster Load Average
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | load1 | sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance) |
|
—- | —- | ||
load5 | sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance) |
||
load15 | sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance) |
|
| Summary | | load1 | sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"})
|
| —- | —- |
| load5 | sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"})
|
| load15 | sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"})
|
|
Cluster Memory Utilization
CATALOG | EXPRESSION |
---|---|
Detail | 1 - sum(node_memory_MemAvailable_bytes) by (instance) / sum(node_memory_MemTotal_bytes) by (instance) |
Summary | 1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes) |
Cluster Disk Utilization
CATALOG | EXPRESSION |
---|---|
Detail | (sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance) - sum(node_filesystem_free_bytes{device!="rootfs"}) by (instance)) / sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance) |
Summary | (sum(node_filesystem_size_bytes{device!="rootfs"}) - sum(node_filesystem_free_bytes{device!="rootfs"})) / sum(node_filesystem_size_bytes{device!="rootfs"}) |
Cluster Disk I/O
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | read | sum(rate(node_disk_read_bytes_total[5m])) by (instance) |
|
—- | —- | ||
written | sum(rate(node_disk_written_bytes_total[5m])) by (instance) |
|
| Summary | | read | sum(rate(node_disk_read_bytes_total[5m]))
|
| —- | —- |
| written | sum(rate(node_disk_written_bytes_total[5m]))
|
|
Cluster Network Packets
CATALOG | EXPRESSION | |||||||
---|---|---|---|---|---|---|---|---|
Detail | receive-dropped | `sum(rate(node_network_receive_drop_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr.”}[5m])) by (instance)` | |
—- | —- | |||||||
receive-errs | `sum(rate(node_network_receive_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”}[5m])) by (instance)` | ||
receive-packets | `sum(rate(node_network_receive_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”}[5m])) by (instance)` | ||
transmit-dropped | `sum(rate(node_network_transmit_drop_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”}[5m])) by (instance)` | ||
transmit-errs | `sum(rate(node_network_transmit_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”}[5m])) by (instance)` | ||
transmit-packets | `sum(rate(node_network_transmit_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“}[5m])) by (instance)` |
|
| Summary | | receive-dropped | sum(rate(node_network_receive_drop_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”}[5m]))_
|
| —- | —- |
| receive-errs | sum(rate(node_network_receive_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_
|
| receive-packets | sum(rate(node_network_receive_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_
|
| transmit-dropped | sum(rate(node_network_transmit_drop_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_
|
| transmit-errs | sum(rate(node_network_transmit_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_
|
| transmit-packets | sum(rate(node_network_transmit_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“}[5m]))
|
|
Cluster Network I/O
CATALOG | EXPRESSION | |||||||
---|---|---|---|---|---|---|---|---|
Detail | receive | `sum(rate(node_network_receive_bytes_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr.”}[5m])) by (instance)` | |
—- | —- | |||||||
transmit | `sum(rate(node_network_transmit_bytes_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“}[5m])) by (instance)` |
|
| Summary | | receive | sum(rate(node_network_receive_bytes_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”}[5m]))_
|
| —- | —- |
| transmit | sum(rate(node_network_transmit_bytes_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“}[5m]))
|
|
Node Metrics
Node CPU Utilization
CATALOG | EXPRESSION |
---|---|
Detail | avg(irate(node_cpu_seconds_total{mode!="idle", instance=~"$instance"}[5m])) by (mode) |
Summary | 1 - (avg(irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m]))) |
Node Load Average
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | load1 | sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"}) |
|
—- | —- | ||
load5 | sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"}) |
||
load15 | sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"}) |
|
| Summary | | load1 | sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})
|
| —- | —- |
| load5 | sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})
|
| load15 | sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})
|
|
Node Memory Utilization
CATALOG | EXPRESSION |
---|---|
Detail | 1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"}) |
Summary | 1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"}) |
Node Disk Utilization
CATALOG | EXPRESSION |
---|---|
Detail | (sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"}) by (device)) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device) |
Summary | (sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"})) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) |
Node Disk I/O
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | read | sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m])) |
|
—- | —- | ||
written | sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m])) |
|
| Summary | | read | sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))
|
| —- | —- |
| written | sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))
|
|
Node Network Packets
CATALOG | EXPRESSION | |||||||
---|---|---|---|---|---|---|---|---|
Detail | receive-dropped | `sum(rate(node_network_receive_drop_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr.”,instance=~“$instance”}[5m])) by (device)` | |
—- | —- | |||||||
receive-errs | `sum(rate(node_network_receive_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”,instance=~“$instance”}[5m])) by (device)` | ||
receive-packets | `sum(rate(node_network_receive_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”,instance=~“$instance”}[5m])) by (device)` | ||
transmit-dropped | `sum(rate(node_network_transmit_drop_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”,instance=~“$instance”}[5m])) by (device)` | ||
transmit-errs | `sum(rate(node_network_transmit_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.”,instance=~“$instance”}[5m])) by (device)` | ||
transmit-packets | `sum(rate(node_network_transmit_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“,instance=~“$instance”}[5m])) by (device)` |
|
| Summary | | receive-dropped | sum(rate(node_network_receive_drop_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_
|
| —- | —- |
| receive-errs | sum(rate(node_network_receive_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_
|
| receive-packets | sum(rate(node_network_receive_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_
|
| transmit-dropped | sum(rate(node_network_transmit_drop_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_
|
| transmit-errs | sum(rate(node_network_transmit_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_
|
| transmit-packets | sum(rate(node_network_transmit_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“,instance=~“$instance”}[5m]))
|
|
Node Network I/O
CATALOG | EXPRESSION | |||||||
---|---|---|---|---|---|---|---|---|
Detail | receive | `sum(rate(node_network_receive_bytes_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr.”,instance=~“$instance”}[5m])) by (device)` | |
—- | —- | |||||||
transmit | `sum(rate(node_network_transmit_bytes_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“,instance=~“$instance”}[5m])) by (device)` |
|
| Summary | | receive | sum(rate(node_network_receive_bytes_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_
|
| —- | —- |
| transmit | sum(rate(node_network_transmit_bytes_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“,instance=~“$instance”}[5m]))
|
|
Etcd Metrics
Etcd Has a Leader
max(etcd_server_has_leader)
Number of Times the Leader Changes
max(etcd_server_leader_changes_seen_total)
Number of Failed Proposals
sum(etcd_server_proposals_failed_total)
GRPC Client Traffic
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | in | sum(rate(etcd_network_client_grpc_received_bytes_total[5m])) by (instance) |
|
—- | —- | ||
out | sum(rate(etcd_network_client_grpc_sent_bytes_total[5m])) by (instance) |
|
| Summary | | in | sum(rate(etcd_network_client_grpc_received_bytes_total[5m]))
|
| —- | —- |
| out | sum(rate(etcd_network_client_grpc_sent_bytes_total[5m]))
|
|
Peer Traffic
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | in | sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance) |
|
—- | —- | ||
out | sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance) |
|
| Summary | | in | sum(rate(etcd_network_peer_received_bytes_total[5m]))
|
| —- | —- |
| out | sum(rate(etcd_network_peer_sent_bytes_total[5m]))
|
|
DB Size
CATALOG | EXPRESSION |
---|---|
Detail | sum(etcd_debugging_mvcc_db_total_size_in_bytes) by (instance) |
Summary | sum(etcd_debugging_mvcc_db_total_size_in_bytes) |
Active Streams
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | lease-watch | sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance) |
|
—- | —- | ||
watch | sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance) |
|
| Summary | | lease-watch | sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"})
|
| —- | —- |
| watch | sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"})
|
|
Raft Proposals
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | applied | sum(increase(etcd_server_proposals_applied_total[5m])) by (instance) |
|
—- | —- | ||
committed | sum(increase(etcd_server_proposals_committed_total[5m])) by (instance) |
||
pending | sum(increase(etcd_server_proposals_pending[5m])) by (instance) |
||
failed | sum(increase(etcd_server_proposals_failed_total[5m])) by (instance) |
|
| Summary | | applied | sum(increase(etcd_server_proposals_applied_total[5m]))
|
| —- | —- |
| committed | sum(increase(etcd_server_proposals_committed_total[5m]))
|
| pending | sum(increase(etcd_server_proposals_pending[5m]))
|
| failed | sum(increase(etcd_server_proposals_failed_total[5m]))
|
|
RPC Rate
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | total | sum(rate(grpc_server_started_total{grpc_type="unary"}[5m])) by (instance) |
|
—- | —- | ||
fail | sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m])) by (instance) |
|
| Summary | | total | sum(rate(grpc_server_started_total{grpc_type="unary"}[5m]))
|
| —- | —- |
| fail | sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m]))
|
|
Disk Operations
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | commit-called-by-backend | sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m])) by (instance) |
|
—- | —- | ||
fsync-called-by-wal | sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m])) by (instance) |
|
| Summary | | commit-called-by-backend | sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m]))
|
| —- | —- |
| fsync-called-by-wal | sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m]))
|
|
Disk Sync Duration
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | wal | histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le)) |
|
—- | —- | ||
db | histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le)) |
|
| Summary | | wal | sum(histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le)))
|
| —- | —- |
| db | sum(histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le)))
|
|
Kubernetes Components Metrics
API Server Request Latency 请求的延时
CATALOG | EXPRESSION |
---|---|
Detail | avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance, verb) /1e+06 平均延时时间 |
Summary | avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance) /1e+06 |
API Server Request Rate
CATALOG | EXPRESSION |
---|---|
Detail | sum(rate(apiserver_request_count[5m])) by (instance, code) 过去5分钟的请求率 |
Summary | sum(rate(apiserver_request_count[5m])) by (instance) |
Scheduling Failed Pods
CATALOG | EXPRESSION |
---|---|
Detail | sum(kube_pod_status_scheduled{condition="false"}) pod调度失败的次数 |
Summary | sum(kube_pod_status_scheduled{condition="false"}) |
Controller Manager Queue Depth 控制器管理器队列深度
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | volumes | sum(volumes_depth) by instance |
|
—- | —- | ||
deployment | sum(deployment_depth) by instance |
||
replicaset | sum(replicaset_depth) by instance |
||
service | sum(service_depth) by instance |
||
serviceaccount | sum(serviceaccount_depth) by instance |
||
endpoint | sum(endpoint_depth) by instance |
||
daemonset | sum(daemonset_depth) by instance |
||
statefulset | sum(statefulset_depth) by instance |
||
replicationmanager | sum(replicationmanager_depth) by instance |
|
| Summary | | volumes | sum(volumes_depth)
|
| —- | —- |
| deployment | sum(deployment_depth)
|
| replicaset | sum(replicaset_depth)
|
| service | sum(service_depth)
|
| serviceaccount | sum(serviceaccount_depth)
|
| endpoint | sum(endpoint_depth)
|
| daemonset | sum(daemonset_depth)
|
| statefulset | sum(statefulset_depth)
|
| replicationmanager | sum(replicationmanager_depth)
|
|
Scheduler E2E Scheduling Latency
CATALOG | EXPRESSION |
---|---|
Detail | histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06 |
Summary | sum(histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06) |
Scheduler Preemption Attempts 调度器抢占尝试
CATALOG | EXPRESSION |
---|---|
Detail | sum(rate(scheduler_total_preemption_attempts[5m])) by (instance) 各个调度器过去5分钟调度器尝试抢占频率之和 |
Summary | sum(rate(scheduler_total_preemption_attempts[5m])) |
Ingress Controller Connections
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | reading | sum(nginx_ingress_controller_nginx_process_connections{state="reading"}) by (instance) |
|
—- | —- | ||
waiting | sum(nginx_ingress_controller_nginx_process_connections{state="waiting"}) by (instance) |
||
writing | sum(nginx_ingress_controller_nginx_process_connections{state="writing"}) by (instance) |
||
accepted | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m]))) by (instance) |
||
active | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m]))) by (instance) |
||
handled | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m]))) by (instance) |
|
| Summary | | reading | sum(nginx_ingress_controller_nginx_process_connections{state="reading"})
|
| —- | —- |
| waiting | sum(nginx_ingress_controller_nginx_process_connections{state="waiting"})
|
| writing | sum(nginx_ingress_controller_nginx_process_connections{state="writing"})
|
| accepted | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m])))
|
| active | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m])))
|
| handled | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m])))
|
|
Ingress Controller Request Process Time
CATALOG | EXPRESSION |
---|---|
Detail | topk(10, histogram_quantile(0.95,sum by (le, host, path)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m])))) |
Summary | topk(10, histogram_quantile(0.95,sum by (le, host)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m])))) |
Rancher Logging Metrics
Fluentd Buffer Queue Rate
CATALOG | EXPRESSION |
---|---|
Detail | sum(rate(fluentd_output_status_buffer_queue_length[5m])) by (instance) 各个fluentd实例过去5分钟的缓存队列长度频率的总和 |
Summary | sum(rate(fluentd_output_status_buffer_queue_length[5m])) |
Fluentd Input Rate
CATALOG | EXPRESSION |
---|---|
Detail | sum(rate(fluentd_input_status_num_records_total[5m])) by (instance) |
Summary | sum(rate(fluentd_input_status_num_records_total[5m])) |
Fluentd Output Errors Rate 输出错误率
CATALOG | EXPRESSION |
---|---|
Detail | sum(rate(fluentd_output_status_num_errors[5m])) by (type) 过去5分钟各类型日志输出状态的错误率总和 |
Summary | sum(rate(fluentd_output_status_num_errors[5m])) |
Fluentd Output Rate 日志输出率
CATALOG | EXPRESSION |
---|---|
Detail | sum(rate(fluentd_output_status_num_records_total[5m])) by (instance) 过去5分钟每个fluentd的输出率的总和 |
Summary | sum(rate(fluentd_output_status_num_records_total[5m])) |
Workload Metrics
Workload CPU Utilization
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
—- | —- | ||
user seconds | sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
||
system seconds | sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
||
usage seconds | sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
| Summary | | cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| —- | —- |
| user seconds | sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| system seconds | sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| usage seconds | sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
|
Workload Memory Utilization
CATALOG | EXPRESSION |
---|---|
Detail | sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""}) by (pod_name) |
Summary | sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""}) |
Workload Network Packets
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | receive-packets | sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
—- | —- | ||
receive-dropped | sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
||
receive-errors | sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
||
transmit-packets | sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
||
transmit-dropped | sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
||
transmit-errors | sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
| Summary | | receive-packets | sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| —- | —- |
| receive-dropped | sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| receive-errors | sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| transmit-packets | sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| transmit-dropped | sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| transmit-errors | sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
|
Workload Network I/O
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | receive | sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
—- | —- | ||
transmit | sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
| Summary | | receive | sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| —- | —- |
| transmit | sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
|
Workload Disk I/O
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
—- | —- | ||
write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name) |
|
| Summary | | read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
| —- | —- |
| write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))
|
|
Pod Metrics
Pod CPU Utilization
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name) |
|
—- | —- | ||
usage seconds | sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name) |
||
system seconds | sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name) |
||
user seconds | sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name) |
|
| Summary | | cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))
|
| —- | —- |
| usage seconds | sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))
|
| system seconds | sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))
|
| user seconds | sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))
|
|
Pod Memory Utilization
CATALOG | EXPRESSION |
---|---|
Detail | sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""}) by (container_name) |
Summary | sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""}) |
Pod Network Packets
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | receive-packets | sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
|
—- | —- | ||
receive-dropped | sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
||
receive-errors | sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
||
transmit-packets | sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
||
transmit-dropped | sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
||
transmit-errors | sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
|
| Summary | | receive-packets | sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| —- | —- |
| receive-dropped | sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| receive-errors | sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| transmit-packets | sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| transmit-dropped | sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| transmit-errors | sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
|
Pod Network I/O
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | receive | sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
|
—- | —- | ||
transmit | sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |
|
| Summary | | receive | sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| —- | —- |
| transmit | sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
|
Pod Disk I/O
CATALOG | EXPRESSION | ||
---|---|---|---|
Detail | read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name) |
|
—- | —- | ||
write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name) |
|
| Summary | | read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
| —- | —- |
| write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
|
|
Container Metrics
Container CPU Utilization 容器CPU使用情况
CATALOG | EXPRESSION |
---|---|
cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m])) |
usage seconds | sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m])) |
system seconds | sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m])) |
user seconds | sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m])) |
Container Memory Utilization 容器内存使用情况
sum(container_memory_working_set_bytes{namespace="$namespace",pod_name="$podName",container_name="$containerName"})
Container Disk I/O
CATALOG | EXPRESSION |
---|---|
read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m])) |
write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m])) |