HPA 依赖环境
通常情况下,控制器将从一系列的聚合 API(metrics.k8s.io、custom.metrics.k8s.io 和 external.metrics.k8s.io)中获取度量值。 metrics.k8s.io API 通常由 Metrics 服务器(需要额外启动)提供。 可以从 metrics-server 获取更多信息。需要成功部署metrics-server以后k8s才可以支持POD hpa 功能。
安装metrics-server
部署metrics-server可以在github下载最新版本部署组件high-availability.yaml,当前下载[metrics-server-helm-chart-3.8.2](https://github.com/kubernetes-sigs/metrics-server/releases/tag/metrics-server-helm-chart-3.8.2)。
修改配置
因为部署集群的时候,CA 证书并没有把各个节点的 IP 签上去,所以这里 Metrics Server 通过 IP 去请求时,提示签的证书没有对应的 IP(错误:x509: cannot validate certificate for 172.26.2.131 because it doesn’t contain any IP SANs),我们可以添加一个—kubelet-insecure-tls参数跳过证书校验
###- args:- --cert-dir=/tmp# 添加参数跳过证书校验- --kubelet-insecure-tls- --secure-port=4443- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname- --kubelet-use-node-status-port- --metric-resolution=15s###
验证部署
可以通过如下命令来验证:
$ kubectl get pods -n kube-system -l k8s-app=metrics-serverNAME READY STATUS RESTARTS AGEmetrics-server-78647766bb-8rs4j 1/1 Running 0 115mmetrics-server-78647766bb-pkxt9 1/1 Running 0 115m$ kubectl logs -n kube-system metrics-server-78647766bb-8rs4jI0303 04:24:38.304408 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)I0303 04:24:39.238923 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestControllerI0303 04:24:39.238954 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestControllerI0303 04:24:39.238982 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"I0303 04:24:39.239022 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-fileI0303 04:24:39.239031 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"I0303 04:24:39.239060 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-fileI0303 04:24:39.239398 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"I0303 04:24:39.239897 1 secure_serving.go:266] Serving securely on [::]:4443I0303 04:24:39.239974 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"W0303 04:24:39.240080 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowedI0303 04:24:39.339998 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestControllerI0303 04:24:39.340088 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-fileI0303 04:24:39.340158 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-fil$ kubectl get apiservice | grep metricsv1beta1.metrics.k8s.io kube-system/metrics-server True 9m$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"dg-gpu-3090-1-116","creationTimestamp":"2022-03-03T06:22:48Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","classify":"volcano-gpu-share","gpuDeviceType":"GeForce-RTX-3090","gpuDriverVersion":"470.63.01","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"dg-gpu-3090-1-116","kubernetes.io/os":"linux"}},"timestamp":"2022-03-03T06:22:31Z","window":"10.085s","usage":{"cpu":"212655585n","memory":"27810752Ki"}},{"metadata":{"name":"gz-cs-gpu-3-131","creationTimestamp":"2022-03-03T06:22:48Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","classify":"volcano-gpu-share","gpuDeviceType":"GeForce-GTX-1080-Ti","gpuDriverVersion":"470.63.01","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"gz-cs-gpu-3-131","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":"","node-role.kubernetes.io/master":"","node.kubernetes.io/exclude-from-external-load-balancers":""}},"timestamp":"2022-03-03T06:22:33Z","window":"10.087s","usage":{"cpu":"1629935462n","memory":"15271200Ki"}}]}
CPU HPA 测试demo
下面建立一个nginx服务,添加HPA条件, 10%cpu使用率进行HPA。
demo-hpa.yaml
apiVersion: apps/v1kind: Deploymentmetadata:name: hpa-demospec:selector:matchLabels:app: nginxtemplate:metadata:labels:app: nginxspec:containers:- name: nginximage: docker.io/library/nginx:1.11.3ports:- containerPort: 80resources:requests:memory: 50Micpu: 50m---apiVersion: v1kind: Servicemetadata:labels:app: nginxname: hpa-demospec:ports:- name: httpport: 80protocol: TCPtargetPort: 80selector:app: nginxtype: ClusterIP---apiVersion: autoscaling/v1kind: HorizontalPodAutoscalermetadata:name: hpa-demospec:maxReplicas: 5minReplicas: 1scaleTargetRef:apiVersion: apps/v1kind: Deploymentname: hpa-demotargetCPUUtilizationPercentage: 10
注意 10% CPU使用率不是1000m微核进行计算的,而是resources.requests.cpu值作为基准值计算的。由于上面requests.cpu配置是50m, 所以10%使用率是50m 10% = 5m。相当于机器top命令0.5%使用率例如requests.cpu配置为5000m, 如果HPA也配置10%CPU使用率:5000m 10%=500m, 相当于机器top显示cpu使用率50%。
提交k8s上
$ kubectl apply -f ./demo-hpa.yaml -n default
测试命令
获取demo-hpa service 服务地址, 发送请求
$ kubectl get service -n defaultNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEhpa-demo ClusterIP 10.235.50.218 <none> 80/TCP 129m$ while true; do wget -q -O- http://10.235.50.218; done
脚本启动一段时间以后,获取HPA监控情况
$ kubectl get hpaNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEhpa-demo Deployment/hpa-demo 10%/10% 1 5 5 7m17s$ kubectl describe hpa hpa-demoName: hpa-demoNamespace: defaultLabels: <none>Annotations: <none>CreationTimestamp: Thu, 03 Mar 2022 14:38:07 +0800Reference: Deployment/hpa-demoMetrics: ( current / target )resource cpu on pods (as a percentage of request): 0% (0) / 10%Min replicas: 1Max replicas: 5Deployment pods: 3 current / 1 desiredConditions:Type Status Reason Message---- ------ ------ -------AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 1ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)ScalingLimited True TooFewReplicas the desired replica count is less than the minimum replica countEvents:Type Reason Age From Message---- ------ ---- ---- -------Warning FailedGetResourceMetric 17m horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics APIWarning FailedComputeMetricsReplicas 17m horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics APIWarning FailedGetResourceMetric 17m horizontal-pod-autoscaler failed to get cpu utilization: did not receive metrics for any ready podsWarning FailedComputeMetricsReplicas 17m horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready podsNormal SuccessfulRescale 15m horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above targetNormal SuccessfulRescale 14m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above targetNormal SuccessfulRescale 13m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above targetNormal SuccessfulRescale 20s horizontal-pod-autoscaler New size: 3; reason: All metrics below targetNormal SuccessfulRescale 5s horizontal-pod-autoscaler New size: 1; reason: All metrics below targe
