使用kubeadm搭建的集群默认证书有效期是1年,续费证书其实是一件很快的事情。但是就怕出事了才发现,毕竟作为专业搬砖工程师,每天都很忙的。

鉴于此,监控集群证书有效期是一件不得不做的事情。Prometheus作为云原生领域的王者,如果能用它来监控证书有效期并能及时告警,那就再好不过了。

ssl_exporter就是来做这个事情的。ssh_exporter是一个Prometheus Exporter能提供多种针对 SSL 的检测手段,包括:https 证书生效/失效时间、文件证书生效/失效时间,OCSP 等相关指标。

下面就来监听集群证书的有效期。

安装

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. labels:
  5. name: ssl-exporter
  6. name: ssl-exporter
  7. spec:
  8. ports:
  9. - name: ssl-exporter
  10. protocol: TCP
  11. port: 9219
  12. targetPort: 9219
  13. selector:
  14. app: ssl-exporter
  15. ---
  16. apiVersion: apps/v1
  17. kind: Deployment
  18. metadata:
  19. name: ssl-exporter
  20. spec:
  21. replicas: 1
  22. selector:
  23. matchLabels:
  24. app: ssl-exporter
  25. template:
  26. metadata:
  27. name: ssl-exporter
  28. labels:
  29. app: ssl-exporter
  30. spec:
  31. initContainers:
  32. # Install kube ca cert as a root CA
  33. - name: ca
  34. image: alpine
  35. command:
  36. - sh
  37. - -c
  38. - |
  39. set -e
  40. apk add --update ca-certificates
  41. cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates/kube-ca.crt
  42. update-ca-certificates
  43. cp /etc/ssl/certs/* /ssl-certs
  44. volumeMounts:
  45. - name: ssl-certs
  46. mountPath: /ssl-certs
  47. containers:
  48. - name: ssl-exporter
  49. image: ribbybibby/ssl-exporter:v0.6.0
  50. ports:
  51. - name: tcp
  52. containerPort: 9219
  53. volumeMounts:
  54. - name: ssl-certs
  55. mountPath: /etc/ssl/certs
  56. volumes:
  57. - name: ssl-certs
  58. emptyDir: {}

执行kubectl apply -f .安装即可。

待Pod正常运行,如下:

  1. # kubectl get po -n monitoring -l app=ssl-exporter
  2. NAME READY STATUS RESTARTS AGE
  3. ssl-exporter-7ff4759679-f4qbs 1/1 Running 0 21m

然后配置prometheus抓取规则。

由于我的Prometheus是通过Prometheus Operator部署的,所以通过additional的方式进行抓取。

首先创建一个文件prometheus-additional.yaml,其内容如下:

  1. - job_name: ssl-exporter
  2. metrics_path: /probe
  3. static_configs:
  4. - targets:
  5. - kubernetes.default.svc:443
  6. relabel_configs:
  7. - source_labels: [__address__]
  8. target_label: __param_target
  9. - source_labels: [__param_target]
  10. target_label: instance
  11. - target_label: __address__
  12. replacement: ssl-exporter.monitoring:9219

然后创建secret,命令如下:

  1. kubectl delete secret additional-config -n monitoring
  2. kubectl -n monitoring create secret generic additional-config --from-file=prometheus-additional.yaml

然后修改prometheus-prometheus.yaml配置文件,新增如下内容:

  1. additionalScrapeConfigs:
  2. name: additional-config
  3. key: prometheus-additional.yaml

prometheus-prometheus.yaml的整体配置如下:

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: Prometheus
  3. metadata:
  4. labels:
  5. prometheus: k8s
  6. name: k8s
  7. namespace: monitoring
  8. spec:
  9. alerting:
  10. alertmanagers:
  11. - name: alertmanager-main
  12. namespace: monitoring
  13. port: web
  14. baseImage: quay.io/prometheus/prometheus
  15. nodeSelector:
  16. kubernetes.io/os: linux
  17. podMonitorNamespaceSelector: {}
  18. podMonitorSelector: {}
  19. replicas: 2
  20. resources:
  21. requests:
  22. memory: 400Mi
  23. ruleSelector:
  24. matchLabels:
  25. prometheus: k8s
  26. role: alert-rules
  27. securityContext:
  28. fsGroup: 2000
  29. runAsNonRoot: true
  30. runAsUser: 1000
  31. additionalScrapeConfigs:
  32. name: additional-config
  33. key: prometheus-additional.yaml
  34. serviceAccountName: prometheus-k8s
  35. serviceMonitorNamespaceSelector: {}
  36. serviceMonitorSelector: {}
  37. version: v2.11.0
  38. storage:
  39. volumeClaimTemplate:
  40. spec:
  41. storageClassName: managed-nfs-storage
  42. resources:
  43. requests:
  44. storage: 10Gi

然后重新执行prometheus-prometheus.yaml文件,命令如下:

  1. kubectl apply -f prometheus-prometheus.yaml

现在可以在prometheus的web界面看到正常的抓取任务了,如下:image.png

然后通过(ssl_cert_not_after-time())/3600/24即可看到证书还有多久失效。
image.png

通过ssl_tls_connect_success可以观测ssl链接是否正常。
image.png

告警

上面已经安装ssl_exporter成功,并且能正常监控数据了,下面就配置一些告警规则,以便于运维能快速知道这个事情。

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: PrometheusRule
  3. metadata:
  4. name: monitoring-ssl-tls-rules
  5. namespace: monitoring
  6. labels:
  7. prometheus: k8s
  8. role: alert-rules
  9. spec:
  10. groups:
  11. - name: check_ssl_validity
  12. rules:
  13. - alert: "K8S集群证书在30天后过期"
  14. expr: (ssl_cert_not_after-time())/3600/24 <30
  15. for: 1h
  16. labels:
  17. severity: critical
  18. annotations:
  19. description: 'K8S集群的证书还有{{ printf "%.1f" $value }}天就过期了,请尽快更新证书'
  20. summary: "K8S集群证书证书过期警告"
  21. - name: ssl_connect_status
  22. rules:
  23. - alert: "K8S集群证书可用性异常"
  24. expr: ssl_tls_connect_success == 0
  25. for: 1m
  26. labels:
  27. severity: critical
  28. annotations:
  29. summary: "K8S集群证书连接异常"
  30. description: "K8S集群 {{ $labels.instance }} 证书连接异常"

如下展示规则正常,在异常的时候就可以接收到告警了。
image.png