Prometheus Expressions


The PromQL expressions in this doc can be used to configure alerts.

此文档中的PromQL表达式可用于配置警报。

Before expression can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring at the cluster level or at the project level.

在警报中使用expression之前,必须启用监控服务。有关更多信息,请参阅有关在集群级别或项目级别启用监视的文档。

For more information about querying Prometheus, refer to the official Prometheus documentation.

有关查询普罗米修斯的更多信息,请参考普罗米修斯官方文档。、

Cluster Metrics

Cluster CPU Utilization

CATALOG EXPRESSION
Detail 1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))
Summary 1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])))

Cluster Load Average

CATALOG EXPRESSION
Detail load1 sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)
—- —-
load5 sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)
load15 sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)

| | Summary | | load1 | sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"}) | | —- | —- | | load5 | sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"}) | | load15 | sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"}) |

|

Cluster Memory Utilization

CATALOG EXPRESSION
Detail 1 - sum(node_memory_MemAvailable_bytes) by (instance) / sum(node_memory_MemTotal_bytes) by (instance)
Summary 1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes)

Cluster Disk Utilization

CATALOG EXPRESSION
Detail (sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance) - sum(node_filesystem_free_bytes{device!="rootfs"}) by (instance)) / sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance)
Summary (sum(node_filesystem_size_bytes{device!="rootfs"}) - sum(node_filesystem_free_bytes{device!="rootfs"})) / sum(node_filesystem_size_bytes{device!="rootfs"})

Cluster Disk I/O

CATALOG EXPRESSION
Detail read sum(rate(node_disk_read_bytes_total[5m])) by (instance)
—- —-
written sum(rate(node_disk_written_bytes_total[5m])) by (instance)

| | Summary | | read | sum(rate(node_disk_read_bytes_total[5m])) | | —- | —- | | written | sum(rate(node_disk_written_bytes_total[5m])) |

|

Cluster Network Packets

CATALOG EXPRESSION
Detail receive-dropped `sum(rate(node_network_receive_drop_total{device!~“lo veth.* docker.* flannel.* cali.* cbr.”}[5m])) by (instance)`
—- —-
receive-errs `sum(rate(node_network_receive_errs_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”}[5m])) by (instance)`
receive-packets `sum(rate(node_network_receive_packets_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”}[5m])) by (instance)`
transmit-dropped `sum(rate(node_network_transmit_drop_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”}[5m])) by (instance)`
transmit-errs `sum(rate(node_network_transmit_errs_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”}[5m])) by (instance)`
transmit-packets `sum(rate(node_network_transmit_packets_total{device!~“lo veth. docker.* flannel.* cali.* cbr.*“}[5m])) by (instance)`

| | Summary | | receive-dropped | sum(rate(node_network_receive_drop_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”}[5m]))_ | | —- | —- | | receive-errs | sum(rate(node_network_receive_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_ | | receive-packets | sum(rate(node_network_receive_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_ | | transmit-dropped | sum(rate(node_network_transmit_drop_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_ | | transmit-errs | sum(rate(node_network_transmit_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”}[5m]))_ | | transmit-packets | sum(rate(node_network_transmit_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“}[5m])) |

|

Cluster Network I/O

CATALOG EXPRESSION
Detail receive `sum(rate(node_network_receive_bytes_total{device!~“lo veth.* docker.* flannel.* cali.* cbr.”}[5m])) by (instance)`
—- —-
transmit `sum(rate(node_network_transmit_bytes_total{device!~“lo veth. docker.* flannel.* cali.* cbr.*“}[5m])) by (instance)`

| | Summary | | receive | sum(rate(node_network_receive_bytes_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”}[5m]))_ | | —- | —- | | transmit | sum(rate(node_network_transmit_bytes_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“}[5m])) |

|

Node Metrics

Node CPU Utilization

CATALOG EXPRESSION
Detail avg(irate(node_cpu_seconds_total{mode!="idle", instance=~"$instance"}[5m])) by (mode)
Summary 1 - (avg(irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m])))

Node Load Average

CATALOG EXPRESSION
Detail load1 sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})
—- —-
load5 sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})
load15 sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})

| | Summary | | load1 | sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"}) | | —- | —- | | load5 | sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"}) | | load15 | sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"}) |

|

Node Memory Utilization

CATALOG EXPRESSION
Detail 1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"})
Summary 1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"})

Node Disk Utilization

CATALOG EXPRESSION
Detail (sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"}) by (device)) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device)
Summary (sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"})) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"})

Node Disk I/O

CATALOG EXPRESSION
Detail read sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))
—- —-
written sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))

| | Summary | | read | sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m])) | | —- | —- | | written | sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m])) |

|

Node Network Packets

CATALOG EXPRESSION
Detail receive-dropped `sum(rate(node_network_receive_drop_total{device!~“lo veth.* docker.* flannel.* cali.* cbr.”,instance=~“$instance”}[5m])) by (device)`
—- —-
receive-errs `sum(rate(node_network_receive_errs_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”,instance=~“$instance”}[5m])) by (device)`
receive-packets `sum(rate(node_network_receive_packets_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”,instance=~“$instance”}[5m])) by (device)`
transmit-dropped `sum(rate(node_network_transmit_drop_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”,instance=~“$instance”}[5m])) by (device)`
transmit-errs `sum(rate(node_network_transmit_errs_total{device!~“lo veth. docker.* flannel.* cali.* cbr.”,instance=~“$instance”}[5m])) by (device)`
transmit-packets `sum(rate(node_network_transmit_packets_total{device!~“lo veth. docker.* flannel.* cali.* cbr.*“,instance=~“$instance”}[5m])) by (device)`

| | Summary | | receive-dropped | sum(rate(node_network_receive_drop_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_ | | —- | —- | | receive-errs | sum(rate(node_network_receive_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_ | | receive-packets | sum(rate(node_network_receive_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_ | | transmit-dropped | sum(rate(node_network_transmit_drop_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_ | | transmit-errs | sum(rate(node_network_transmit_errs_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_ | | transmit-packets | sum(rate(node_network_transmit_packets_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“,instance=~“$instance”}[5m])) |

|

Node Network I/O

CATALOG EXPRESSION
Detail receive `sum(rate(node_network_receive_bytes_total{device!~“lo veth.* docker.* flannel.* cali.* cbr.”,instance=~“$instance”}[5m])) by (device)`
—- —-
transmit `sum(rate(node_network_transmit_bytes_total{device!~“lo veth. docker.* flannel.* cali.* cbr.*“,instance=~“$instance”}[5m])) by (device)`

| | Summary | | receive | sum(rate(node_network_receive_bytes_total{device!~“lo | veth.* | docker.* | flannel.* | cali.* | cbr._”,instance=~“$instance”}[5m]))_ | | —- | —- | | transmit | sum(rate(node_network_transmit_bytes_total{device!~“lo | veth. | docker.* | flannel.* | cali.* | cbr.*“,instance=~“$instance”}[5m])) |

|

Etcd Metrics

Etcd Has a Leader

max(etcd_server_has_leader)

Number of Times the Leader Changes

max(etcd_server_leader_changes_seen_total)

Number of Failed Proposals

sum(etcd_server_proposals_failed_total)

GRPC Client Traffic

CATALOG EXPRESSION
Detail in sum(rate(etcd_network_client_grpc_received_bytes_total[5m])) by (instance)
—- —-
out sum(rate(etcd_network_client_grpc_sent_bytes_total[5m])) by (instance)

| | Summary | | in | sum(rate(etcd_network_client_grpc_received_bytes_total[5m])) | | —- | —- | | out | sum(rate(etcd_network_client_grpc_sent_bytes_total[5m])) |

|

Peer Traffic

CATALOG EXPRESSION
Detail in sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)
—- —-
out sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)

| | Summary | | in | sum(rate(etcd_network_peer_received_bytes_total[5m])) | | —- | —- | | out | sum(rate(etcd_network_peer_sent_bytes_total[5m])) |

|

DB Size

CATALOG EXPRESSION
Detail sum(etcd_debugging_mvcc_db_total_size_in_bytes) by (instance)
Summary sum(etcd_debugging_mvcc_db_total_size_in_bytes)

Active Streams

CATALOG EXPRESSION
Detail lease-watch sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance)
—- —-
watch sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance)

| | Summary | | lease-watch | sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) | | —- | —- | | watch | sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) |

|

Raft Proposals

CATALOG EXPRESSION
Detail applied sum(increase(etcd_server_proposals_applied_total[5m])) by (instance)
—- —-
committed sum(increase(etcd_server_proposals_committed_total[5m])) by (instance)
pending sum(increase(etcd_server_proposals_pending[5m])) by (instance)
failed sum(increase(etcd_server_proposals_failed_total[5m])) by (instance)

| | Summary | | applied | sum(increase(etcd_server_proposals_applied_total[5m])) | | —- | —- | | committed | sum(increase(etcd_server_proposals_committed_total[5m])) | | pending | sum(increase(etcd_server_proposals_pending[5m])) | | failed | sum(increase(etcd_server_proposals_failed_total[5m])) |

|

RPC Rate

CATALOG EXPRESSION
Detail total sum(rate(grpc_server_started_total{grpc_type="unary"}[5m])) by (instance)
—- —-
fail sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m])) by (instance)

| | Summary | | total | sum(rate(grpc_server_started_total{grpc_type="unary"}[5m])) | | —- | —- | | fail | sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m])) |

|

Disk Operations

CATALOG EXPRESSION
Detail commit-called-by-backend sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m])) by (instance)
—- —-
fsync-called-by-wal sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m])) by (instance)

| | Summary | | commit-called-by-backend | sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m])) | | —- | —- | | fsync-called-by-wal | sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m])) |

|

Disk Sync Duration

CATALOG EXPRESSION
Detail wal histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))
—- —-
db histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))

| | Summary | | wal | sum(histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))) | | —- | —- | | db | sum(histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))) |

|

Kubernetes Components Metrics

API Server Request Latency 请求的延时

CATALOG EXPRESSION
Detail avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance, verb) /1e+06
平均延时时间
Summary avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance) /1e+06

API Server Request Rate

CATALOG EXPRESSION
Detail sum(rate(apiserver_request_count[5m])) by (instance, code)
过去5分钟的请求率
Summary sum(rate(apiserver_request_count[5m])) by (instance)

Scheduling Failed Pods

CATALOG EXPRESSION
Detail sum(kube_pod_status_scheduled{condition="false"})
pod调度失败的次数
Summary sum(kube_pod_status_scheduled{condition="false"})

Controller Manager Queue Depth 控制器管理器队列深度

CATALOG EXPRESSION
Detail volumes sum(volumes_depth) by instance
—- —-
deployment sum(deployment_depth) by instance
replicaset sum(replicaset_depth) by instance
service sum(service_depth) by instance
serviceaccount sum(serviceaccount_depth) by instance
endpoint sum(endpoint_depth) by instance
daemonset sum(daemonset_depth) by instance
statefulset sum(statefulset_depth) by instance
replicationmanager sum(replicationmanager_depth) by instance

| | Summary | | volumes | sum(volumes_depth) | | —- | —- | | deployment | sum(deployment_depth) | | replicaset | sum(replicaset_depth) | | service | sum(service_depth) | | serviceaccount | sum(serviceaccount_depth) | | endpoint | sum(endpoint_depth) | | daemonset | sum(daemonset_depth) | | statefulset | sum(statefulset_depth) | | replicationmanager | sum(replicationmanager_depth) |

|

Scheduler E2E Scheduling Latency

CATALOG EXPRESSION
Detail histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06
Summary sum(histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06)

Scheduler Preemption Attempts 调度器抢占尝试

CATALOG EXPRESSION
Detail sum(rate(scheduler_total_preemption_attempts[5m])) by (instance)
各个调度器过去5分钟调度器尝试抢占频率之和
Summary sum(rate(scheduler_total_preemption_attempts[5m]))

Ingress Controller Connections

CATALOG EXPRESSION
Detail reading sum(nginx_ingress_controller_nginx_process_connections{state="reading"}) by (instance)
—- —-
waiting sum(nginx_ingress_controller_nginx_process_connections{state="waiting"}) by (instance)
writing sum(nginx_ingress_controller_nginx_process_connections{state="writing"}) by (instance)
accepted sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m]))) by (instance)
active sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m]))) by (instance)
handled sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m]))) by (instance)

| | Summary | | reading | sum(nginx_ingress_controller_nginx_process_connections{state="reading"}) | | —- | —- | | waiting | sum(nginx_ingress_controller_nginx_process_connections{state="waiting"}) | | writing | sum(nginx_ingress_controller_nginx_process_connections{state="writing"}) | | accepted | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m]))) | | active | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m]))) | | handled | sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m]))) |

|

Ingress Controller Request Process Time

CATALOG EXPRESSION
Detail topk(10, histogram_quantile(0.95,sum by (le, host, path)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))
Summary topk(10, histogram_quantile(0.95,sum by (le, host)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))

Rancher Logging Metrics

Fluentd Buffer Queue Rate

CATALOG EXPRESSION
Detail sum(rate(fluentd_output_status_buffer_queue_length[5m])) by (instance)
各个fluentd实例过去5分钟的缓存队列长度频率的总和
Summary sum(rate(fluentd_output_status_buffer_queue_length[5m]))

Fluentd Input Rate

CATALOG EXPRESSION
Detail sum(rate(fluentd_input_status_num_records_total[5m])) by (instance)
Summary sum(rate(fluentd_input_status_num_records_total[5m]))

Fluentd Output Errors Rate 输出错误率

CATALOG EXPRESSION
Detail sum(rate(fluentd_output_status_num_errors[5m])) by (type)
过去5分钟各类型日志输出状态的错误率总和
Summary sum(rate(fluentd_output_status_num_errors[5m]))

Fluentd Output Rate 日志输出率

CATALOG EXPRESSION
Detail sum(rate(fluentd_output_status_num_records_total[5m])) by (instance)
过去5分钟每个fluentd的输出率的总和
Summary sum(rate(fluentd_output_status_num_records_total[5m]))

Workload Metrics

Workload CPU Utilization

CATALOG EXPRESSION
Detail cfs throttled seconds sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
—- —-
user seconds sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
system seconds sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
usage seconds sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)

| | Summary | | cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | —- | —- | | user seconds | sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | system seconds | sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | usage seconds | sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) |

|

Workload Memory Utilization

CATALOG EXPRESSION
Detail sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""}) by (pod_name)
Summary sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""})

Workload Network Packets

CATALOG EXPRESSION
Detail receive-packets sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
—- —-
receive-dropped sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
receive-errors sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
transmit-packets sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
transmit-dropped sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
transmit-errors sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)

| | Summary | | receive-packets | sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | —- | —- | | receive-dropped | sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | receive-errors | sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | transmit-packets | sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | transmit-dropped | sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | transmit-errors | sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) |

|

Workload Network I/O

CATALOG EXPRESSION
Detail receive sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
—- —-
transmit sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)

| | Summary | | receive | sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | —- | —- | | transmit | sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) |

|

Workload Disk I/O

CATALOG EXPRESSION
Detail read sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)
—- —-
write sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)

| | Summary | | read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) | | —- | —- | | write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) |

|

Pod Metrics

Pod CPU Utilization

CATALOG EXPRESSION
Detail cfs throttled seconds sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)
—- —-
usage seconds sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)
system seconds sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)
user seconds sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)

| | Summary | | cfs throttled seconds | sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) | | —- | —- | | usage seconds | sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) | | system seconds | sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) | | user seconds | sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) |

|

Pod Memory Utilization

CATALOG EXPRESSION
Detail sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""}) by (container_name)
Summary sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""})

Pod Network Packets

CATALOG EXPRESSION
Detail receive-packets sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
—- —-
receive-dropped sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
receive-errors sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
transmit-packets sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
transmit-dropped sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
transmit-errors sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))

| | Summary | | receive-packets | sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | —- | —- | | receive-dropped | sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | receive-errors | sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | transmit-packets | sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | transmit-dropped | sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | transmit-errors | sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |

|

Pod Network I/O

CATALOG EXPRESSION
Detail receive sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))
—- —-
transmit sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))

| | Summary | | receive | sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | —- | —- | | transmit | sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |

|

Pod Disk I/O

CATALOG EXPRESSION
Detail read sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)
—- —-
write sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)

| | Summary | | read | sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) | | —- | —- | | write | sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) |

|

Container Metrics

Container CPU Utilization 容器CPU使用情况

CATALOG EXPRESSION
cfs throttled seconds sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))
usage seconds sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))
system seconds sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))
user seconds sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))

Container Memory Utilization 容器内存使用情况

sum(container_memory_working_set_bytes{namespace="$namespace",pod_name="$podName",container_name="$containerName"})

Container Disk I/O

CATALOG EXPRESSION
read sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))
write sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))