1. 下载 & 安装
下载 helm chart 包 https://github.com/prometheus-community/helm-charts/releases
wget https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-16.0.1/kube-prometheus-stack-16.0.1.tgz -O kube-prometheus-stack-16.0.1.tgz
解压,进入目录:
./├── Chart.lock├── charts│ ├── grafana│ ├── kube-state-metrics│ └── prometheus-node-exporter├── Chart.yaml├── CONTRIBUTING.md├── crds│ ├── crd-alertmanagerconfigs.yaml│ ├── crd-alertmanagers.yaml│ └── ...├── README.md├── templates│ ├── alertmanager│ ├── ...│ └── prometheus-operator└── values.yaml
根据自己的需要修改 values.yaml 文件。
修改 values.yaml 后,即可安装:
# namespace: monitoringhelm -n monitoring install prometheus .
安装完成之后,可以看到已部署&启动了如下监控服务组件:
$ kubectl -n monitoring get statefulsetalertmanager-prometheus-kube-prometheus-alertmanager 1/1 78sprometheus-prometheus-kube-prometheus-prometheus 1/1 78s$ kubectl -n monitoring get deploymentprometheus-kube-prometheus-operator 1/1 1 1 102s$ kubectl -n monitoring get daemonsetprometheus-prometheus-node-exporter 3 3 3 3 3 <none> 115s
另外,还包括以下自定义资源(CRD):
alertmanagerconfigs.monitoring.coreos.comalertmanagers.monitoring.coreos.compodmonitors.monitoring.coreos.comprobes.monitoring.coreos.comprometheuses.monitoring.coreos.comprometheusrules.monitoring.coreos.comservicemonitors.monitoring.coreos.comthanosrulers.monitoring.coreos.com
访问 web 控制台:
测试期间,可修改 prometheus/alertmanager 的 service,以 NodePort 方式暴露,两种方式修改:
- 修改 values.yaml 文件,然后执行
helm -n monitoring upgrade prometheus . kubectl -n monitoring edit svc prometheus-kube-prometheus-prometheus直接修改。
prometheus: http://${NODE_IP}:30451$ kubectl -n monitoring get svcalertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 4m18sprometheus-kube-prometheus-alertmanager NodePort 10.233.25.216 <none> 9093:32565/TCP 4m20sprometheus-kube-prometheus-operator ClusterIP 10.233.35.190 <none> 443/TCP 4m20sprometheus-kube-prometheus-prometheus NodePort 10.233.41.226 <none> 9090:30451/TCP 4m20sprometheus-operated ClusterIP None <none> 9090/TCP 4m18sprometheus-prometheus-node-exporter ClusterIP 10.233.25.195 <none> 9100/TCP 4m20s
alertmanager: http://${NODE_IP}:32565
2. values.yaml 文件中的一些配置
2.1 kubernetes 核心组件监控配置:
可能需要实际情况,修改端口、tls 认证等配置信息。如监控 etcd 时需要配置:
## values.yaml...kubeEtcd:...serviceMonitor:scheme: httpsinsecureSkipVerify: false## 这里配置etcd的证书,文件路径是prometheus容器内挂载 secret etcd-client-certs的路径.caFile: /etc/prometheus/secrets/etcd-client-certs/ca.pemcertFile: /etc/prometheus/secrets/etcd-client-certs/member-k8s.pemkeyFile: /etc/prometheus/secrets/etcd-client-certs/member-k8s-key.pem...prometheus:...secrets:## 这里配置一个secret,该secret中保存了etcd证书内容,会被挂载到prometheus容器内的 /etc/prometheus/secrets/ 路径下.- etcd-client-certs...
当然,需要事先创建 secret:
kubectl create secret generic etcd-client-certs -n monitoring \--from-file=/etc/ssl/etcd/ssl/ca.pem \--from-file=/etc/ssl/etcd/ssl/member-k8s.pem \--from-file=/etc/ssl/etcd/ssl/member-k8s-key.pem
2.2 单副本 alertmanager:
(1)若在测试阶段,以单副本部署 alertmanager 时,可做如下设置:
## ForceEnableClusterMode ensures Alertmanager does not deactivate the cluster mode when running with a single replica.## Use case is e.g. spanning an Alertmanager cluster across Kubernetes clusters with a single replica in each.forceEnableClusterMode: true
这可以i确保单副本(未使用 ha 模式)部署完成之后,alertmanager 集群的状态是健康的,否则这里会显示
disable :
