1. 下载 & 安装

下载 helm chart 包 https://github.com/prometheus-community/helm-charts/releases

  1. wget https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-16.0.1/kube-prometheus-stack-16.0.1.tgz -O kube-prometheus-stack-16.0.1.tgz

解压,进入目录:

  1. ./
  2. ├── Chart.lock
  3. ├── charts
  4. ├── grafana
  5. ├── kube-state-metrics
  6. └── prometheus-node-exporter
  7. ├── Chart.yaml
  8. ├── CONTRIBUTING.md
  9. ├── crds
  10. ├── crd-alertmanagerconfigs.yaml
  11. ├── crd-alertmanagers.yaml
  12. └── ...
  13. ├── README.md
  14. ├── templates
  15. ├── alertmanager
  16. ├── ...
  17. └── prometheus-operator
  18. └── values.yaml

根据自己的需要修改 values.yaml 文件。

修改 values.yaml 后,即可安装:

  1. # namespace: monitoring
  2. helm -n monitoring install prometheus .

安装完成之后,可以看到已部署&启动了如下监控服务组件:

  1. $ kubectl -n monitoring get statefulset
  2. alertmanager-prometheus-kube-prometheus-alertmanager 1/1 78s
  3. prometheus-prometheus-kube-prometheus-prometheus 1/1 78s
  4. $ kubectl -n monitoring get deployment
  5. prometheus-kube-prometheus-operator 1/1 1 1 102s
  6. $ kubectl -n monitoring get daemonset
  7. prometheus-prometheus-node-exporter 3 3 3 3 3 <none> 115s

另外,还包括以下自定义资源(CRD):

  1. alertmanagerconfigs.monitoring.coreos.com
  2. alertmanagers.monitoring.coreos.com
  3. podmonitors.monitoring.coreos.com
  4. probes.monitoring.coreos.com
  5. prometheuses.monitoring.coreos.com
  6. prometheusrules.monitoring.coreos.com
  7. servicemonitors.monitoring.coreos.com
  8. thanosrulers.monitoring.coreos.com

访问 web 控制台:
测试期间,可修改 prometheus/alertmanager 的 service,以 NodePort 方式暴露,两种方式修改:

  1. 修改 values.yaml 文件,然后执行 helm -n monitoring upgrade prometheus .
  2. kubectl -n monitoring edit svc prometheus-kube-prometheus-prometheus 直接修改。
    1. $ kubectl -n monitoring get svc
    2. alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 4m18s
    3. prometheus-kube-prometheus-alertmanager NodePort 10.233.25.216 <none> 9093:32565/TCP 4m20s
    4. prometheus-kube-prometheus-operator ClusterIP 10.233.35.190 <none> 443/TCP 4m20s
    5. prometheus-kube-prometheus-prometheus NodePort 10.233.41.226 <none> 9090:30451/TCP 4m20s
    6. prometheus-operated ClusterIP None <none> 9090/TCP 4m18s
    7. prometheus-prometheus-node-exporter ClusterIP 10.233.25.195 <none> 9100/TCP 4m20s
    prometheus: http://${NODE_IP}:30451
    alertmanager: http://${NODE_IP}:32565

至此,完成 prometheus 及基础组件的部署。🎉

2. values.yaml 文件中的一些配置

2.1 kubernetes 核心组件监控配置:

可能需要实际情况,修改端口、tls 认证等配置信息。如监控 etcd 时需要配置:

  1. ## values.yaml
  2. ...
  3. kubeEtcd:
  4. ...
  5. serviceMonitor:
  6. scheme: https
  7. insecureSkipVerify: false
  8. ## 这里配置etcd的证书,文件路径是prometheus容器内挂载 secret etcd-client-certs的路径.
  9. caFile: /etc/prometheus/secrets/etcd-client-certs/ca.pem
  10. certFile: /etc/prometheus/secrets/etcd-client-certs/member-k8s.pem
  11. keyFile: /etc/prometheus/secrets/etcd-client-certs/member-k8s-key.pem
  12. ...
  13. prometheus:
  14. ...
  15. secrets:
  16. ## 这里配置一个secret,该secret中保存了etcd证书内容,会被挂载到prometheus容器内的 /etc/prometheus/secrets/ 路径下.
  17. - etcd-client-certs
  18. ...

当然,需要事先创建 secret:

  1. kubectl create secret generic etcd-client-certs -n monitoring \
  2. --from-file=/etc/ssl/etcd/ssl/ca.pem \
  3. --from-file=/etc/ssl/etcd/ssl/member-k8s.pem \
  4. --from-file=/etc/ssl/etcd/ssl/member-k8s-key.pem

2.2 单副本 alertmanager:

(1)若在测试阶段,以单副本部署 alertmanager 时,可做如下设置:

  1. ## ForceEnableClusterMode ensures Alertmanager does not deactivate the cluster mode when running with a single replica.
  2. ## Use case is e.g. spanning an Alertmanager cluster across Kubernetes clusters with a single replica in each.
  3. forceEnableClusterMode: true

这可以i确保单副本(未使用 ha 模式)部署完成之后,alertmanager 集群的状态是健康的,否则这里会显示
disable
image.png