Kubernetes 监控指标

对比

node-exporter用于采集服务器层面的运行指标,包括机器的loadavg、filesystem、meminfo等基础监控,类似于传统主机监控维度的zabbix-agent。

metric-server/heapster是从api-server中获取CPU、内存使用率这种监控指标,并把他们发送给存储后端,如InfluxDB或云厂商,他当前的核心作用是:为HPA等组件提供决策指标支持。

kube-state-metrics关注于获取Kubernetes各种资源的最新状态,如Deployment或者DaemonSet。

例如:

  • 调度了多少个Replicas?现在可用的有几个?
  • 多少个Pod是running/stopped/terminated状态?
  • Pod重启了多少次?
  • 有多少job在运行中?

这些指标都由kube-state-metrics提供。

之所以没有把kube-state-metrics纳入到metric-server的能力中,是因为他们的关注点本质上是不一样的。

  • metric-server仅仅是获取、格式化现有数据,写入特定的存储,实质上是一个监控系统。
  • kube-state-metrics是将Kubernetes的运行状况在内存中做了个快照,并且获取新的指标,但他没有能力导出这些指标。

部署metric-server

下载metric-server部署的yaml文件到本地。

  1. wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml

拉取metric-server的镜像到本地:

  1. # docker pull zhaoqinchang/metrics-server:0.3.7
  2. 0.3.7: Pulling from zhaoqinchang/metrics-server
  3. 9ff2acc3204b: Pull complete
  4. 9d14b55ff9a0: Pull complete
  5. Digest: sha256:c0efe772bb9e5c289db6cc4bc2002c268507d0226f2a3815f7213e00261c38e9
  6. Status: Downloaded newer image for zhaoqinchang/metrics-server:0.3.7
  7. docker.io/zhaoqinchang/metrics-server:0.3.7

修改components.yaml文件为如下内容:

  1. # cat components.yaml
  2. ---
  3. apiVersion: rbac.authorization.k8s.io/v1
  4. kind: ClusterRole
  5. metadata:
  6. name: system:aggregated-metrics-reader
  7. labels:
  8. rbac.authorization.k8s.io/aggregate-to-view: "true"
  9. rbac.authorization.k8s.io/aggregate-to-edit: "true"
  10. rbac.authorization.k8s.io/aggregate-to-admin: "true"
  11. rules:
  12. - apiGroups: ["metrics.k8s.io"]
  13. resources: ["pods", "nodes"]
  14. verbs: ["get", "list", "watch"]
  15. ---
  16. apiVersion: rbac.authorization.k8s.io/v1
  17. kind: ClusterRoleBinding
  18. metadata:
  19. name: metrics-server:system:auth-delegator
  20. roleRef:
  21. apiGroup: rbac.authorization.k8s.io
  22. kind: ClusterRole
  23. name: system:auth-delegator
  24. subjects:
  25. - kind: ServiceAccount
  26. name: metrics-server
  27. namespace: kube-system
  28. ---
  29. apiVersion: rbac.authorization.k8s.io/v1
  30. kind: RoleBinding
  31. metadata:
  32. name: metrics-server-auth-reader
  33. namespace: kube-system
  34. roleRef:
  35. apiGroup: rbac.authorization.k8s.io
  36. kind: Role
  37. name: extension-apiserver-authentication-reader
  38. subjects:
  39. - kind: ServiceAccount
  40. name: metrics-server
  41. namespace: kube-system
  42. ---
  43. apiVersion: apiregistration.k8s.io/v1beta1
  44. kind: APIService
  45. metadata:
  46. name: v1beta1.metrics.k8s.io
  47. spec:
  48. service:
  49. name: metrics-server
  50. namespace: kube-system
  51. group: metrics.k8s.io
  52. version: v1beta1
  53. insecureSkipTLSVerify: true
  54. groupPriorityMinimum: 100
  55. versionPriority: 100
  56. ---
  57. apiVersion: v1
  58. kind: ServiceAccount
  59. metadata:
  60. name: metrics-server
  61. namespace: kube-system
  62. ---
  63. apiVersion: apps/v1
  64. kind: Deployment
  65. metadata:
  66. name: metrics-server
  67. namespace: kube-system
  68. labels:
  69. k8s-app: metrics-server
  70. spec:
  71. selector:
  72. matchLabels:
  73. k8s-app: metrics-server
  74. template:
  75. metadata:
  76. name: metrics-server
  77. labels:
  78. k8s-app: metrics-server
  79. spec:
  80. serviceAccountName: metrics-server
  81. volumes:
  82. # mount in tmp so we can safely use from-scratch images and/or read-only containers
  83. - name: tmp-dir
  84. emptyDir: {}
  85. containers:
  86. - name: metrics-server
  87. image: zhaoqinchang/metrics-server:0.3.7 #修改镜像为刚刚拉取下来的镜像
  88. imagePullPolicy: IfNotPresent
  89. args:
  90. - --cert-dir=/tmp
  91. - --secure-port=4443
  92. command: #添加以下三行command命令
  93. - /metrics-server
  94. - --kubelet-preferred-address-types=InternalIP
  95. - --kubelet-insecure-tls
  96. ports:
  97. - name: main-port
  98. containerPort: 4443
  99. protocol: TCP
  100. securityContext:
  101. readOnlyRootFilesystem: true
  102. runAsNonRoot: true
  103. runAsUser: 1000
  104. volumeMounts:
  105. - name: tmp-dir
  106. mountPath: /tmp
  107. nodeSelector:
  108. kubernetes.io/os: linux
  109. ---
  110. apiVersion: v1
  111. kind: Service
  112. metadata:
  113. name: metrics-server
  114. namespace: kube-system
  115. labels:
  116. kubernetes.io/name: "Metrics-server"
  117. kubernetes.io/cluster-service: "true"
  118. spec:
  119. selector:
  120. k8s-app: metrics-server
  121. ports:
  122. - port: 443
  123. protocol: TCP
  124. targetPort: main-port
  125. ---
  126. apiVersion: rbac.authorization.k8s.io/v1
  127. kind: ClusterRole
  128. metadata:
  129. name: system:metrics-server
  130. rules:
  131. - apiGroups:
  132. - ""
  133. resources:
  134. - pods
  135. - nodes
  136. - nodes/stats
  137. - namespaces
  138. - configmaps
  139. verbs:
  140. - get
  141. - list
  142. - watch
  143. ---
  144. apiVersion: rbac.authorization.k8s.io/v1
  145. kind: ClusterRoleBinding
  146. metadata:
  147. name: system:metrics-server
  148. roleRef:
  149. apiGroup: rbac.authorization.k8s.io
  150. kind: ClusterRole
  151. name: system:metrics-server
  152. subjects:
  153. - kind: ServiceAccount
  154. name: metrics-server
  155. namespace: kube-system

部署metric-server:

  1. # kubectl apply -f components.yaml
  2. clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
  3. clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
  4. rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
  5. apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
  6. serviceaccount/metrics-server created
  7. deployment.apps/metrics-server created
  8. service/metrics-server created
  9. clusterrole.rbac.authorization.k8s.io/system:metrics-server created
  10. clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

查看metric.k8s.io是否出现在Kubernetes集群的API群组列表中:

  1. # kubectl api-versions | grep metrics
  2. metrics.k8s.io/v1beta1

使用

kubectl top命令可显示节点和Pod对象的资源使用信息,它依赖于集群中的资源指标API来收集各项指标数据。它包含有Node和Pod两个子命令,可分别显示Node对象和Pod对象的相关资源占用率。

列出Node资源占用率命令的语法格式为“kubectl top node [-l label | NAME]”,例如下面显示所有节点的资源占用状况的结果中显示了各节点累计CPU资源占用时长及百分比,以及内容空间占用量及占用比例。必要时,也可以在命令直接给出要查看的特定节点的标识,以及使用标签选择器进行节点过滤。

  1. [root@master metric]# kubectl top nodes
  2. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  3. master 282m 14% 1902Mi 51%
  4. node-02 70m 3% 1371Mi 37%
  5. node-03 121m 1% 892Mi 11%

而名称空间级别的Pod对象资源占用率的使用方法会略有不同,使用时,一般应该跟定名称空间及使用标签选择器过滤出目标Pod对象。例如,下面显示kube-system名称空间下的Pod资源使用状况:

  1. [root@master metric]# kubectl top pods -n kube-system
  2. NAME CPU(cores) MEMORY(bytes)
  3. etcd-master 32m 300Mi
  4. kube-apiserver-master 86m 342Mi
  5. kube-controller-manager-master 30m 48Mi
  6. kube-flannel-ds-l5ghn 5m 10Mi
  7. kube-flannel-ds-rqlm2 4m 12Mi
  8. kube-flannel-ds-v92r9 4m 14Mi
  9. kube-proxy-7vjcv 18m 15Mi
  10. kube-proxy-xrz8f 13m 21Mi
  11. kube-proxy-zpwn6 1m 14Mi
  12. kube-scheduler-master 7m 17Mi
  13. metrics-server-5549c7694f-7vb66 2m 14Mi

kubectl top命令为用户提供简洁、快速获取Node对象及Pod对象系统资源占用状况的接口,是集群运行和维护的常用命令之一。