一、EFK

Kubernetes 中比较流行的日志收集解决方案是 Elasticsearch、Fluentd 和 Kibana(EFK)技术栈,也是官方现在比较推荐的一种方案。

  • Elasticsearch 是一个实时的、分布式的可扩展的搜索引擎,允许进行全文、结构化搜索,它通常用于索引和搜索大量日志数据,也可用于搜索许多不同类型的文档。
  • Kibana 是 Elasticsearch 的一个功能强大的数据可视化 Dashboard,Kibana 允许你通过 web 界面来浏览 Elasticsearch 日志数据。
  • Fluentd是一个流行的开源数据收集器,我们将在 Kubernetes 集群节点上安装 Fluentd,通过获取容器日志文件、过滤和转换日志数据,然后将数据传递到 Elasticsearch 集群,在该集群中对其进行索引和存储。

我们先来配置启动一个可扩展的 Elasticsearch 集群,然后在 Kubernetes 集群中创建一个 Kibana 应用,最后通过 DaemonSet 来运行 Fluentd,以便它在每个 Kubernetes 工作节点上都可以运行一个 Pod。

1.1、部署ElasticSearch

首先,新建一个namespace:
efk-namespace.yaml

  1. apiVersion: v1
  2. kind: Namespace
  3. metadata:
  4. name: kube-ops

然后开始部署3个节点的ElasticSearch。其中关键点是应该设置discover.zen.minimum_master_nodes=N/2+1,其中N是 Elasticsearch 集群中符合主节点的节点数,比如我们这里3个节点,意味着N应该设置为2。这样,如果一个节点暂时与集群断开连接,则另外两个节点可以选择一个新的主节点,并且集群可以在最后一个节点尝试重新加入时继续运行,在扩展 Elasticsearch 集群时,一定要记住这个参数。

(1)、创建一个elasticsearch的无头服务
elasticsearch-svc.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: elasticsearch
  5. namespace: kube-ops
  6. labels:
  7. app: elasticsearch
  8. spec:
  9. selector:
  10. app: elasticsearch
  11. clusterIP: None
  12. ports:
  13. - name: rest
  14. port: 9200
  15. - name: inter-node
  16. port: 9300

定义为无头服务,是因为我们后面真正部署elasticsearch的pod是通过statefulSet部署的,到时候将其进行关联,另外9200是REST API端口,9300是集群间通信端口。

然后我们创建这个资源对象。

  1. # kubectl apply -f elasticsearch-svc.yaml
  2. service/elasticsearch created
  3. # kubectl get svc -n kube-ops
  4. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  5. elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 9s

(2)、用StatefulSet创建elasticsearch的Pod。
cat elasticsearch-elasticsearch.yaml

  1. apiVersion: apps/v1
  2. kind: StatefulSet
  3. metadata:
  4. name: es-cluster
  5. namespace: kube-ops
  6. spec:
  7. serviceName: elasticsearch
  8. replicas: 3
  9. selector:
  10. matchLabels:
  11. app: elasticsearch
  12. template:
  13. metadata:
  14. labels:
  15. app: elasticsearch
  16. spec:
  17. containers:
  18. - name: elasticsearch
  19. image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3
  20. resources:
  21. limits:
  22. cpu: 1000m
  23. requests:
  24. cpu: 100m
  25. ports:
  26. - containerPort: 9200
  27. name: rest
  28. protocol: TCP
  29. - containerPort: 9300
  30. name: inter-node
  31. protocol: TCP
  32. volumeMounts:
  33. - name: data
  34. mountPath: /usr/share/elasticsearch/data
  35. env:
  36. - name: cluster.name
  37. value: k8s-logs
  38. - name: node.name
  39. valueFrom:
  40. fieldRef:
  41. fieldPath: metadata.name
  42. - name: discovery.zen.ping.unicast.hosts
  43. value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
  44. - name: discovery.zen.minimum_master_nodes
  45. value: "2"
  46. - name: ES_JAVA_OPTS
  47. value: "-Xms512m -Xmx512m"
  48. initContainers:
  49. - name: fix-permissions
  50. image: busybox
  51. command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
  52. securityContext:
  53. privileged: true
  54. volumeMounts:
  55. - name: data
  56. mountPath: /usr/share/elasticsearch/data
  57. - name: increase-vm-max-map
  58. image: busybox
  59. command: ["sysctl", "-w", "vm.max_map_count=262144"]
  60. securityContext:
  61. privileged: true
  62. - name: increase-fd-ulimit
  63. image: busybox
  64. command: ["sh", "-c", "ulimit -n 65536"]
  65. securityContext:
  66. privileged: true
  67. volumeClaimTemplates:
  68. - metadata:
  69. name: data
  70. labels:
  71. app: elasticsearch
  72. spec:
  73. accessModes: [ "ReadWriteOnce" ]
  74. storageClassName: es-data-db
  75. resources:
  76. requests:
  77. storage: 20Gi

解释:
上面Pod中定义了两种类型的container,普通的container和initContainer。其中在initContainer种它有3个container,它们会在所有容器启动前运行。

  • 名为fix-permissions的container的作用是将 Elasticsearch 数据目录的用户和组更改为1000:1000(Elasticsearch 用户的 UID)。因为默认情况下,Kubernetes 用 root 用户挂载数据目录,这会使得 Elasticsearch 无法方法该数据目录。
  • 名为 increase-vm-max-map 的容器用来增加操作系统对mmap计数的限制,默认情况下该值可能太低,导致内存不足的错误
  • 名为increase-fd-ulimit的容器用来执行ulimit命令增加打开文件描述符的最大数量

在普通container中,我们定义了名为elasticsearch的container,然后暴露了9200和9300两个端口,注意名称要和上面定义的 Service 保持一致。然后通过 volumeMount 声明了数据持久化目录,下面我们再来定义 VolumeClaims。最后就是我们在容器中设置的一些环境变量了:

  • cluster.name:Elasticsearch 集群的名称,我们这里命名成 k8s-logs。
  • node.name:节点的名称,通过metadata.name来获取。这将解析为 es-cluster-[0,1,2],取决于节点的指定顺序。
  • discovery.zen.ping.unicast.hosts:此字段用于设置在 Elasticsearch 集群中节点相互连接的发现方法。我们使用 unicastdiscovery 方式,它为我们的集群指定了一个静态主机列表。由于我们之前配置的无头服务,我们的 Pod 具有唯一的 DNS 域es-cluster-[0,1,2].elasticsearch.logging.svc.cluster.local,因此我们相应地设置此变量。由于都在同一个 namespace 下面,所以我们可以将其缩短为es-cluster-[0,1,2].elasticsearch。
  • discovery.zen.minimum_master_nodes:我们将其设置为(N/2) + 1,N是我们的群集中符合主节点的节点的数量。我们有3个 Elasticsearch 节点,因此我们将此值设置为2(向下舍入到最接近的整数)。
  • ES_JAVA_OPTS:这里我们设置为-Xms512m -Xmx512m,告诉JVM使用512 MB的最小和最大堆。您应该根据群集的资源可用性和需求调整这些参数

当然我们还需要创建一个StorageClass,因为我们的数据是需要持久化的。
elasticsearch-storage.yaml

  1. apiVersion: storage.k8s.io/v1
  2. kind: StorageClass
  3. metadata:
  4. name: es-data-db
  5. provisioner: rookieops/nfs

注意:由于我们这里采用的是NFS来存储,所以上面的provisioner需要和我们nfs-client-provisoner中保持一致。

然后我们创建资源:

  1. # kubectl apply -f elasticsearch-storage.yaml
  2. # kubectl apply -f elasticsearch-elasticsearch.yaml
  3. # kubectl get pod -n kube-ops
  4. NAME READY STATUS RESTARTS AGE
  5. dingtalk-hook-8497494dc6-s6qkh 1/1 Running 0 16m
  6. es-cluster-0 1/1 Running 0 10m
  7. es-cluster-1 1/1 Running 0 10m
  8. es-cluster-2 1/1 Running 0 9m20s
  9. # kubectl get pvc -n kube-ops
  10. NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
  11. data-es-cluster-0 Bound pvc-9f15c0f8-60a8-485d-b650-91fb8f5f8076 10Gi RWO es-data-db 18m
  12. data-es-cluster-1 Bound pvc-503828ec-d98e-4e94-9f00-eaf6c05f3afd 10Gi RWO es-data-db 11m
  13. data-es-cluster-2 Bound pvc-3d2eb82e-396a-4eb0-bb4e-2dd4fba8600e 10Gi RWO es-data-db 10m
  14. # kubectl get svc -n kube-ops
  15. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  16. dingtalk-hook ClusterIP 10.68.122.48 <none> 5000/TCP 18m
  17. elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 19m

测试:

  1. # kubectl port-forward es-cluster-0 9200:9200 --namespace=kube-ops
  2. Forwarding from 127.0.0.1:9200 -> 9200
  3. Forwarding from [::1]:9200 -> 9200
  4. Handling connection for 9200

如果看到如下结果,就表示服务正常:

  1. # curl http://localhost:9200/_cluster/state?pretty
  2. {
  3. "cluster_name" : "k8s-logs",
  4. "compressed_size_in_bytes" : 337,
  5. "cluster_uuid" : "nzc4y-eDSuSaYU1TigFAWw",
  6. "version" : 3,
  7. "state_uuid" : "6Mvd-WTPT0e7WMJV23Vdiw",
  8. "master_node" : "KRyMrbS0RXSfRkpS0ZaarQ",
  9. "blocks" : { },
  10. "nodes" : {
  11. "XGP4TrkrQ8KNMpH3pQlaEQ" : {
  12. "name" : "es-cluster-2",
  13. "ephemeral_id" : "f-R_IyfoSYGhY27FmA41Tg",
  14. "transport_address" : "172.20.1.104:9300",
  15. "attributes" : { }
  16. },
  17. "KRyMrbS0RXSfRkpS0ZaarQ" : {
  18. "name" : "es-cluster-0",
  19. "ephemeral_id" : "FpTnJTR8S3ysmoZlPPDnSg",
  20. "transport_address" : "172.20.1.102:9300",
  21. "attributes" : { }
  22. },
  23. "Xzjk2n3xQUutvbwx2h7f4g" : {
  24. "name" : "es-cluster-1",
  25. "ephemeral_id" : "FKjRuegwToe6Fz8vgPmSNw",
  26. "transport_address" : "172.20.1.103:9300",
  27. "attributes" : { }
  28. }
  29. },
  30. "metadata" : {
  31. "cluster_uuid" : "nzc4y-eDSuSaYU1TigFAWw",
  32. "templates" : { },
  33. "indices" : { },
  34. "index-graveyard" : {
  35. "tombstones" : [ ]
  36. }
  37. },
  38. "routing_table" : {
  39. "indices" : { }
  40. },
  41. "routing_nodes" : {
  42. "unassigned" : [ ],
  43. "nodes" : {
  44. "KRyMrbS0RXSfRkpS0ZaarQ" : [ ],
  45. "XGP4TrkrQ8KNMpH3pQlaEQ" : [ ],
  46. "Xzjk2n3xQUutvbwx2h7f4g" : [ ]
  47. }
  48. },
  49. "snapshots" : {
  50. "snapshots" : [ ]
  51. },
  52. "restore" : {
  53. "snapshots" : [ ]
  54. },
  55. "snapshot_deletions" : {
  56. "snapshot_deletions" : [ ]
  57. }
  58. }

1.2、部署Kibana

创建kibana的配置清单:
kibana.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: kibana
  5. namespace: kube-ops
  6. labels:
  7. app: kibana
  8. spec:
  9. ports:
  10. - port: 5601
  11. type: NodePort
  12. selector:
  13. app: kibana
  14. ---
  15. apiVersion: apps/v1
  16. kind: Deployment
  17. metadata:
  18. name: kibana
  19. namespace: kube-ops
  20. labels:
  21. app: kibana
  22. spec:
  23. selector:
  24. matchLabels:
  25. app: kibana
  26. template:
  27. metadata:
  28. labels:
  29. app: kibana
  30. spec:
  31. containers:
  32. - name: kibana
  33. image: docker.elastic.co/kibana/kibana-oss:6.4.3
  34. resources:
  35. limits:
  36. cpu: 1000m
  37. requests:
  38. cpu: 100m
  39. env:
  40. - name: ELASTICSEARCH_URL
  41. value: http://elasticsearch:9200
  42. ports:
  43. - containerPort: 5601

创建配置清单:

  1. # kubectl apply -f kibana.yaml
  2. service/kibana created
  3. deployment.apps/kibana created
  4. # kubectl get svc -n kube-ops
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. dingtalk-hook ClusterIP 10.68.122.48 <none> 5000/TCP 47m
  7. elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 48m
  8. kibana NodePort 10.68.221.60 <none> 5601:26575/TCP 7m29s
  9. [root@ecs-5704-0003 storage]# kubectl get pod -n kube-ops
  10. NAME READY STATUS RESTARTS AGE
  11. dingtalk-hook-8497494dc6-s6qkh 1/1 Running 0 47m
  12. es-cluster-0 1/1 Running 0 41m
  13. es-cluster-1 1/1 Running 0 41m
  14. es-cluster-2 1/1 Running 0 40m
  15. kibana-7fc9f8c964-68xbh 1/1 Running 0 7m41s

如下kibana可以正常连接。
image.png

1.3、部署Fluentd

Fluentd 是一个高效的日志聚合器,是用 Ruby 编写的,并且可以很好地扩展。对于大部分企业来说,Fluentd 足够高效并且消耗的资源相对较少,另外一个工具Fluent-bit更轻量级,占用资源更少,但是插件相对 Fluentd 来说不够丰富,所以整体来说,Fluentd 更加成熟,使用更加广泛,所以我们这里也同样使用 Fluentd 来作为日志收集工具。

1.3.1、工作原理

Fluentd 通过一组给定的数据源抓取日志数据,处理后(转换成结构化的数据格式)将它们转发给其他服务,比如 Elasticsearch、对象存储等等。Fluentd 支持超过300个日志存储和分析服务,所以在这方面是非常灵活的。主要运行步骤如下:

  • 首先 Fluentd 从多个日志源获取数据
  • 结构化并且标记这些数据
  • 然后根据匹配的标签将数据发送到多个目标服务去

11.2、日志系统搭建 - 图2

1.3.2、配置讲解

比如我们这里为了收集 Kubernetes 节点上的所有容器日志,就需要做如下的日志源配置:

  1. <source>
  2. @id fluentd-containers.log
  3. @type tail
  4. path /var/log/containers/*.log
  5. pos_file /var/log/fluentd-containers.log.pos
  6. time_format %Y-%m-%dT%H:%M:%S.%NZ
  7. tag raw.kubernetes.*
  8. format json
  9. read_from_head true
  10. </source>

上面配置部分参数说明如下:

  • id:表示引用该日志源的唯一标识符,该标识可用于进一步过滤和路由结构化日志数据
  • type:Fluentd 内置的指令,tail表示 Fluentd 从上次读取的位置通过 tail 不断获取数据,另外一个是http表示通过一个 GET 请求来收集数据。
  • path:tail类型下的特定参数,告诉 Fluentd 采集/var/log/containers目录下的所有日志,这是 docker 在 Kubernetes 节点上用来存储运行容器 stdout 输出日志数据的目录。
  • pos_file:检查点,如果 Fluentd 程序重新启动了,它将使用此文件中的位置来恢复日志数据收集。
  • tag:用来将日志源与目标或者过滤器匹配的自定义字符串,Fluentd 匹配源/目标标签来路由日志数据。

    路由配置如下:
    上面是日志源的配置,接下来看看如何将日志数据发送到 Elasticsearch:

    1. <match **>
    2. @id elasticsearch
    3. @type elasticsearch
    4. @log_level info
    5. include_tag_key true
    6. type_name fluentd
    7. host "#{ENV['OUTPUT_HOST']}"
    8. port "#{ENV['OUTPUT_PORT']}"
    9. logstash_format true
    10. <buffer>
    11. @type file
    12. path /var/log/fluentd-buffers/kubernetes.system.buffer
    13. flush_mode interval
    14. retry_type exponential_backoff
    15. flush_thread_count 2
    16. flush_interval 5s
    17. retry_forever
    18. retry_max_interval 30
    19. chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
    20. queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
    21. overflow_action block
    22. </buffer>
  • match:标识一个目标标签,后面是一个匹配日志源的正则表达式,我们这里想要捕获所有的日志并将它们发送给 Elasticsearch,所以需要配置成**

  • id:目标的一个唯一标识符。
  • type:支持的输出插件标识符,我们这里要输出到 Elasticsearch,所以配置成 elasticsearch,这是 Fluentd 的一个内置插件。
  • log_level:指定要捕获的日志级别,我们这里配置成info,表示任何该级别或者该级别以上(INFO、WARNING、ERROR)的日志都将被路由到 Elsasticsearch。
  • host/port:定义 Elasticsearch 的地址,也可以配置认证信息,我们的 Elasticsearch 不需要认证,所以这里直接指定 host 和 port 即可。
  • logstash_format:Elasticsearch 服务对日志数据构建反向索引进行搜索,将 logstash_format 设置为true,Fluentd 将会以 logstash 格式来转发结构化的日志数据。
  • Buffer: Fluentd 允许在目标不可用时进行缓存,比如,如果网络出现故障或者 Elasticsearch 不可用的时候。缓冲区配置也有助于降低磁盘的 IO。

1.3.3、安装

通过configmap来定义fluentd的配置文件
fluentd-config.yaml

  1. kind: ConfigMap
  2. apiVersion: v1
  3. metadata:
  4. name: fluentd-config
  5. namespace: kube-ops
  6. labels:
  7. addonmanager.kubernetes.io/mode: Reconcile
  8. data:
  9. system.conf: |-
  10. <system>
  11. root_dir /tmp/fluentd-buffers/
  12. </system>
  13. containers.input.conf: |-
  14. <source>
  15. @id fluentd-containers.log
  16. @type tail
  17. path /var/log/containers/*.log
  18. pos_file /var/log/es-containers.log.pos
  19. time_format %Y-%m-%dT%H:%M:%S.%NZ
  20. localtime
  21. tag raw.kubernetes.*
  22. format json
  23. read_from_head true
  24. </source>
  25. # Detect exceptions in the log output and forward them as one log entry.
  26. <match raw.kubernetes.**>
  27. @id raw.kubernetes
  28. @type detect_exceptions
  29. remove_tag_prefix raw
  30. message log
  31. stream stream
  32. multiline_flush_interval 5
  33. max_bytes 500000
  34. max_lines 1000
  35. </match>
  36. system.input.conf: |-
  37. # Logs from systemd-journal for interesting services.
  38. <source>
  39. @id journald-docker
  40. @type systemd
  41. filters [{ "_SYSTEMD_UNIT": "docker.service" }]
  42. <storage>
  43. @type local
  44. persistent true
  45. </storage>
  46. read_from_head true
  47. tag docker
  48. </source>
  49. <source>
  50. @id journald-kubelet
  51. @type systemd
  52. filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
  53. <storage>
  54. @type local
  55. persistent true
  56. </storage>
  57. read_from_head true
  58. tag kubelet
  59. </source>
  60. forward.input.conf: |-
  61. # Takes the messages sent over TCP
  62. <source>
  63. @type forward
  64. </source>
  65. output.conf: |-
  66. # Enriches records with Kubernetes metadata
  67. <filter kubernetes.**>
  68. @type kubernetes_metadata
  69. </filter>
  70. <match **>
  71. @id elasticsearch
  72. @type elasticsearch
  73. @log_level info
  74. include_tag_key true
  75. host elasticsearch
  76. port 9200
  77. logstash_format true
  78. request_timeout 30s
  79. <buffer>
  80. @type file
  81. path /var/log/fluentd-buffers/kubernetes.system.buffer
  82. flush_mode interval
  83. retry_type exponential_backoff
  84. flush_thread_count 2
  85. flush_interval 5s
  86. retry_forever
  87. retry_max_interval 30
  88. chunk_limit_size 2M
  89. queue_limit_length 8
  90. overflow_action block
  91. </buffer>
  92. </match>

上面配置文件中我们配置了 docker 容器日志目录以及 docker、kubelet 应用的日志的收集,收集到数据经过处理后发送到 elasticsearch:9200 服务。

创建配置文件:

  1. # kubectl apply -f fluentd-config.yaml
  2. configmap/fluentd-config created
  3. # kubectl get cm -n kube-ops
  4. NAME DATA AGE
  5. fluentd-config 5 115s

以DS模式运行fluentd,配置清单如下:
fluentd-daemonset.yaml

  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. name: fluentd-es
  5. namespace: kube-ops
  6. labels:
  7. k8s-app: fluentd-es
  8. kubernetes.io/cluster-service: "true"
  9. addonmanager.kubernetes.io/mode: Reconcile
  10. ---
  11. kind: ClusterRole
  12. apiVersion: rbac.authorization.k8s.io/v1
  13. metadata:
  14. name: fluentd-es
  15. labels:
  16. k8s-app: fluentd-es
  17. kubernetes.io/cluster-service: "true"
  18. addonmanager.kubernetes.io/mode: Reconcile
  19. rules:
  20. - apiGroups:
  21. - ""
  22. resources:
  23. - "namespaces"
  24. - "pods"
  25. verbs:
  26. - "get"
  27. - "watch"
  28. - "list"
  29. ---
  30. kind: ClusterRoleBinding
  31. apiVersion: rbac.authorization.k8s.io/v1
  32. metadata:
  33. name: fluentd-es
  34. labels:
  35. k8s-app: fluentd-es
  36. kubernetes.io/cluster-service: "true"
  37. addonmanager.kubernetes.io/mode: Reconcile
  38. subjects:
  39. - kind: ServiceAccount
  40. name: fluentd-es
  41. namespace: kube-ops
  42. apiGroup: ""
  43. roleRef:
  44. kind: ClusterRole
  45. name: fluentd-es
  46. apiGroup: ""
  47. ---
  48. apiVersion: apps/v1
  49. kind: DaemonSet
  50. metadata:
  51. name: fluentd-es
  52. namespace: kube-ops
  53. labels:
  54. k8s-app: fluentd-es
  55. version: v2.0.4
  56. kubernetes.io/cluster-service: "true"
  57. addonmanager.kubernetes.io/mode: Reconcile
  58. spec:
  59. selector:
  60. matchLabels:
  61. k8s-app: fluentd-es
  62. version: v2.0.4
  63. template:
  64. metadata:
  65. labels:
  66. k8s-app: fluentd-es
  67. kubernetes.io/cluster-service: "true"
  68. version: v2.0.4
  69. # This annotation ensures that fluentd does not get evicted if the node
  70. # supports critical pod annotation based priority scheme.
  71. # Note that this does not guarantee admission on the nodes (#40573).
  72. annotations:
  73. scheduler.alpha.kubernetes.io/critical-pod: ''
  74. spec:
  75. serviceAccountName: fluentd-es
  76. containers:
  77. - name: fluentd-es
  78. image: cnych/fluentd-elasticsearch:v2.0.4
  79. env:
  80. - name: FLUENTD_ARGS
  81. value: --no-supervisor -q
  82. resources:
  83. limits:
  84. memory: 500Mi
  85. requests:
  86. cpu: 100m
  87. memory: 200Mi
  88. volumeMounts:
  89. - name: varlog
  90. mountPath: /var/log
  91. - name: varlibdockercontainers
  92. mountPath: /var/lib/docker/containers
  93. readOnly: true
  94. - name: config-volume
  95. mountPath: /etc/fluent/config.d
  96. nodeSelector:
  97. beta.kubernetes.io/fluentd-ds-ready: "true"
  98. tolerations:
  99. - key: node-role.kubernetes.io/master
  100. operator: Exists
  101. effect: NoSchedule
  102. terminationGracePeriodSeconds: 30
  103. volumes:
  104. - name: varlog
  105. hostPath:
  106. path: /var/log
  107. - name: varlibdockercontainers
  108. hostPath:
  109. path: /var/lib/docker/containers
  110. - name: config-volume
  111. configMap:
  112. name: fluentd-config

由于我们上面定义了nodeSelector:

  1. nodeSelector:
  2. beta.kubernetes.io/fluentd-ds-ready: "true"

所以我们得先给Node打上标签:

  1. # kubectl label nodes 172.16.0.33 beta.kubernetes.io/fluentd-ds-ready=true
  2. node/172.16.0.33 labeled
  3. # kubectl label nodes 172.16.0.52 beta.kubernetes.io/fluentd-ds-ready=true
  4. node/172.16.0.52 labeled

然后我们创建配置清单,查看结果:

  1. # kubectl apply -f fluentd-daemonset.yaml
  2. # kubectl get ds -n kube-ops
  3. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
  4. fluentd-es 2 2 2 2 2 beta.kubernetes.io/fluentd-ds-ready=true 13h
  5. # kubectl get pod -n kube-ops
  6. NAME READY STATUS RESTARTS AGE
  7. dingtalk-hook-8497494dc6-s6qkh 1/1 Running 0 14h
  8. es-cluster-0 1/1 Running 0 14h
  9. es-cluster-1 1/1 Running 0 14h
  10. es-cluster-2 1/1 Running 0 14h
  11. fluentd-es-h4lsp 1/1 Running 0 26s
  12. fluentd-es-ktx69 1/1 Running 0 34s

先查看ES上的索引

  1. kubectl port-forward es-cluster-0 9200:9200 --namespace=kube-ops
  2. # curl -XGET 'localhost:9200/_cat/indices?v&pretty'
  3. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  4. green open logstash-2020.06.23 HR62innTQi6HjObIzf6DHw 5 1 99 0 295kb 147.5kb
  5. green open logstash-2020.06.22 8-IFAOj_SqiipqOXN6Soxw 5 1 6614 0 7.9mb 3.6mb

然后在kibana上添加

image.png

如果我们要查看某个Pod的name,只需要填写过滤条件,如下:
image.png

1.4、部署log-pilot

log-pilot是阿里开源的一个日志收集工具,它的强大之处在于只需要部署一个daemonset类型的pod,不仅能采集容器标准输出的日志,还能采集容器内部的文件日志,更多咨询可以移步这里缺点暂不支撑7的filebeat。

  1. apiVersion: apps/v1
  2. kind: DaemonSet
  3. metadata:
  4. name: log-pilot
  5. labels:
  6. app: log-pilot
  7. namespace: kube-ops
  8. spec:
  9. selector:
  10. matchLabels:
  11. app: log-pilot
  12. updateStrategy:
  13. type: RollingUpdate
  14. template:
  15. metadata:
  16. labels:
  17. app: log-pilot
  18. annotations:
  19. scheduler.alpha.kubernetes.io/critical-pod: ''
  20. spec:
  21. tolerations:
  22. - key: node-role.kubernetes.io/master
  23. effect: NoSchedule
  24. containers:
  25. - name: log-pilot
  26. image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.7-filebeat
  27. resources:
  28. limits:
  29. memory: 500Mi
  30. requests:
  31. cpu: 200m
  32. memory: 200Mi
  33. env:
  34. - name: "NODE_NAME"
  35. valueFrom:
  36. fieldRef:
  37. fieldPath: spec.nodeName
  38. # 日志收集前缀
  39. - name: PILOT_LOG_PREFIX
  40. value: aliyun
  41. - name: "LOGGING_OUTPUT"
  42. value: "elasticsearch"
  43. # 请确保集群到ES网络可达
  44. - name: "ELASTICSEARCH_HOSTS"
  45. value: "elasticsearch:9200"
  46. # 配置ES访问权限
  47. #- name: "ELASTICSEARCH_USER"
  48. # value: "{es_username}"
  49. #- name: "ELASTICSEARCH_PASSWORD"
  50. # value: "{es_password}"
  51. volumeMounts:
  52. - name: sock
  53. mountPath: /var/run/docker.sock
  54. - name: root
  55. mountPath: /host
  56. readOnly: true
  57. - name: varlib
  58. mountPath: /var/lib/filebeat
  59. - name: varlog
  60. mountPath: /var/log/filebeat
  61. - name: localtime
  62. mountPath: /etc/localtime
  63. readOnly: true
  64. livenessProbe:
  65. failureThreshold: 3
  66. exec:
  67. command:
  68. - /pilot/healthz
  69. initialDelaySeconds: 10
  70. periodSeconds: 10
  71. successThreshold: 1
  72. timeoutSeconds: 2
  73. securityContext:
  74. capabilities:
  75. add:
  76. - SYS_ADMIN
  77. terminationGracePeriodSeconds: 30
  78. volumes:
  79. - name: sock
  80. hostPath:
  81. path: /var/run/docker.sock
  82. - name: root
  83. hostPath:
  84. path: /
  85. - name: varlib
  86. hostPath:
  87. path: /var/lib/filebeat
  88. type: DirectoryOrCreate
  89. - name: varlog
  90. hostPath:
  91. path: /var/log/filebeat
  92. type: DirectoryOrCreate
  93. - name: localtime
  94. hostPath:
  95. path: /etc/localtim

创建pod测试

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: tomcat
  5. spec:
  6. containers:
  7. - name: tomcat
  8. image: "tomcat:8.0"
  9. env:
  10. # 1、stdout为约定关键字,表示采集标准输出日志
  11. # 2、配置标准输出日志采集到ES的catalina索引下
  12. - name: aliyun_logs_catalina
  13. value: "stdout"
  14. # 1、配置采集容器内文件日志,支持通配符
  15. # 2、配置该日志采集到ES的access索引下
  16. - name: aliyun_logs_access
  17. value: "/usr/local/tomcat/logs/catalina.*.log"
  18. # 容器内文件日志路径需要配置emptyDir
  19. volumeMounts:
  20. - name: tomcat-log
  21. mountPath: /usr/local/tomcat/logs
  22. volumes:
  23. - name: tomcat-log
  24. emptyDir: {}

然后我们查看索引会看到access-和catalina-的索引

  1. # curl -XGET 'localhost:9200/_cat/indices?v&pretty'
  2. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  3. green open access-2020.06.23 0LS6STfpQ4yHt7makuSI1g 5 1 40 0 205.5kb 102.5kb
  4. green open logstash-2020.06.23 HR62innTQi6HjObIzf6DHw 5 1 99 0 296kb 148kb
  5. green open catalina-2020.06.23 dSFGcZlPS6-wieFKrOWV-g 5 1 40 0 227.1kb 133.3kb
  6. green open .kibana H-TAto8QTxmi-jI_4mIUrg 1 1 2 0 20.4kb 10.2kb
  7. green open logstash-2020.06.22 8-IFAOj_SqiipqOXN6Soxw 5 1 43784 0 30.6mb 15.3mb

然后到页面添加索引即可。

二、EFK+Kafka

2.1、部署Kafka

2.1.1、搭建ZK

技术要点:

  • service要用headless service
  • 每个pod有自己独有的竞选ID(myid)

配置清单如下:
zookeeper.yaml

  1. ---
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: zk-svc
  6. namespace: kube-ops
  7. labels:
  8. app: zk-svc
  9. spec:
  10. ports:
  11. - port: 2888
  12. name: server
  13. - port: 3888
  14. name: leader-election
  15. clusterIP: None
  16. selector:
  17. app: zk
  18. ---
  19. apiVersion: v1
  20. kind: ConfigMap
  21. metadata:
  22. name: zk-cm
  23. namespace: kube-ops
  24. data:
  25. jvm.heap: "1G"
  26. tick: "2000"
  27. init: "10"
  28. sync: "5"
  29. client.cnxns: "60"
  30. snap.retain: "3"
  31. purge.interval: "0"
  32. ---
  33. apiVersion: policy/v1beta1
  34. kind: PodDisruptionBudget
  35. metadata:
  36. name: zk-pdb
  37. namespace: kube-ops
  38. spec:
  39. selector:
  40. matchLabels:
  41. app: zk
  42. minAvailable: 2
  43. ---
  44. apiVersion: apps/v1beta1
  45. kind: StatefulSet
  46. metadata:
  47. name: zk
  48. namespace: kube-ops
  49. spec:
  50. serviceName: zk-svc
  51. replicas: 3
  52. template:
  53. metadata:
  54. labels:
  55. app: zk
  56. spec:
  57. #affinity:
  58. # podAntiAffinity:
  59. # #requiredDuringSchedulingIgnoredDuringExecution:
  60. # preferredDuringSchedulingIgnoredDuringExecution:
  61. # cpu: "500m"
  62. # - labelSelector:
  63. # matchExpressions:
  64. # - key: "app"
  65. # operator: In
  66. # values:
  67. # - zk
  68. # topologyKey: "kubernetes.io/hostname"
  69. containers:
  70. - name: k8szk
  71. imagePullPolicy: Always
  72. image: registry.cn-hangzhou.aliyuncs.com/rookieops/zookeeper:3.4.10
  73. resources:
  74. requests:
  75. memory: "2Gi"
  76. cpu: "500m"
  77. ports:
  78. - containerPort: 2181
  79. name: client
  80. - containerPort: 2888
  81. name: server
  82. - containerPort: 3888
  83. name: leader-election
  84. env:
  85. - name : ZK_REPLICAS
  86. value: "3"
  87. - name : ZK_HEAP_SIZE
  88. valueFrom:
  89. configMapKeyRef:
  90. name: zk-cm
  91. key: jvm.heap
  92. - name : ZK_TICK_TIME
  93. valueFrom:
  94. configMapKeyRef:
  95. name: zk-cm
  96. key: tick
  97. - name : ZK_INIT_LIMIT
  98. valueFrom:
  99. configMapKeyRef:
  100. name: zk-cm
  101. key: init
  102. - name : ZK_SYNC_LIMIT
  103. valueFrom:
  104. configMapKeyRef:
  105. name: zk-cm
  106. key: tick
  107. - name : ZK_MAX_CLIENT_CNXNS
  108. valueFrom:
  109. configMapKeyRef:
  110. name: zk-cm
  111. key: client.cnxns
  112. - name: ZK_SNAP_RETAIN_COUNT
  113. valueFrom:
  114. configMapKeyRef:
  115. name: zk-cm
  116. key: snap.retain
  117. - name: ZK_PURGE_INTERVAL
  118. valueFrom:
  119. configMapKeyRef:
  120. name: zk-cm
  121. key: purge.interval
  122. - name: ZK_CLIENT_PORT
  123. value: "2181"
  124. - name: ZK_SERVER_PORT
  125. value: "2888"
  126. - name: ZK_ELECTION_PORT
  127. value: "3888"
  128. command:
  129. - sh
  130. - -c
  131. - zkGenConfig.sh && zkServer.sh start-foreground
  132. readinessProbe:
  133. exec:
  134. command:
  135. - "zkOk.sh"
  136. initialDelaySeconds: 10
  137. timeoutSeconds: 5
  138. livenessProbe:
  139. exec:
  140. command:
  141. - "zkOk.sh"
  142. initialDelaySeconds: 10
  143. timeoutSeconds: 5
  144. volumeMounts:
  145. - name: datadir
  146. mountPath: /var/lib/zookeeper
  147. # securityContext:
  148. # runAsUser: 1000
  149. # fsGroup: 1000
  150. volumeClaimTemplates:
  151. - metadata:
  152. name: datadir
  153. spec:
  154. accessModes: ["ReadWriteOnce"]
  155. storageClassName: zk-data-db
  156. resources:
  157. requests:
  158. storage: 1Gi

创建storageClass:
zookeeper-storage.yaml

  1. apiVersion: storage.k8s.io/v1
  2. kind: StorageClass
  3. metadata:
  4. name: zk-data-db
  5. provisioner: rookieops/nfs

然后创建配置清单:

  1. # kubectl apply -f zookeeper-storage.yaml
  2. # kubectl apply -f zookeeper.yaml
  3. # kubectl get pod -n kube-ops
  4. NAME READY STATUS RESTARTS AGE
  5. zk-0 1/1 Running 0 12m
  6. zk-1 1/1 Running 0 12m
  7. zk-2 1/1 Running 0 11m

然后查看集群状态:

  1. # for i in 0 1 2; do kubectl exec -n kube-ops zk-$i zkServer.sh status; done
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/bin/../etc/zookeeper/zoo.cfg
  4. Mode: follower
  5. ZooKeeper JMX enabled by default
  6. Using config: /usr/bin/../etc/zookeeper/zoo.cfg
  7. Mode: follower
  8. ZooKeeper JMX enabled by default
  9. Using config: /usr/bin/../etc/zookeeper/zoo.cfg
  10. Mode: leader

2.2.2、搭建kafka

Dockerfile:

  1. FROM centos:centos7
  2. LABEL "auth"="rookieops" \
  3. "mail"="rookieops@163.com"
  4. ENV TIME_ZONE Asia/Shanghai
  5. # install JAVA
  6. ADD jdk-8u131-linux-x64.tar.gz /opt/
  7. ENV JAVA_HOME /opt/jdk1.8.0_131
  8. ENV PATH ${JAVA_HOME}/bin:${PATH}
  9. # install kafka
  10. ADD kafka_2.11-2.3.1.tgz /opt/
  11. RUN mv /opt/kafka_2.11-2.3.1 /opt/kafka
  12. WORKDIR /opt/kafka
  13. EXPOSE 9092
  14. CMD ["./bin/kafka-server-start.sh", "config/server.properties"]

创建storageClass:
kafka-storage.yaml

  1. apiVersion: storage.k8s.io/v1
  2. kind: StorageClass
  3. metadata:
  4. name: kafka-data-db
  5. provisioner: rookieops/nfs

创建kafka配置清单:
kafka.yaml

  1. apiVersion: apps/v1
  2. kind: StatefulSet
  3. metadata:
  4. name: kafka
  5. namespace: kube-ops
  6. spec:
  7. serviceName: kafka-svc
  8. replicas: 3
  9. selector:
  10. matchLabels:
  11. app: kafka
  12. template:
  13. metadata:
  14. labels:
  15. app: kafka
  16. spec:
  17. affinity:
  18. podAffinity:
  19. preferredDuringSchedulingIgnoredDuringExecution:
  20. - weight: 1
  21. podAffinityTerm:
  22. labelSelector:
  23. matchExpressions:
  24. - key: "app"
  25. operator: In
  26. values:
  27. - zk
  28. topologyKey: "kubernetes.io/hostname"
  29. terminationGracePeriodSeconds: 300
  30. containers:
  31. - name: kafka
  32. image: registry.cn-hangzhou.aliyuncs.com/rookieops/kafka:2.3.1-beta
  33. imagePullPolicy: Always
  34. resources:
  35. requests:
  36. cpu: 500m
  37. memory: 1Gi
  38. limits:
  39. cpu: 500m
  40. memory: 1Gi
  41. command:
  42. - "/bin/sh"
  43. - "-c"
  44. - "./bin/kafka-server-start.sh config/server.properties --override broker.id=${HOSTNAME##*-}"
  45. ports:
  46. - name: server
  47. containerPort: 9092
  48. volumeMounts:
  49. - name: config
  50. mountPath: /opt/kafka/config/server.properties
  51. subPath: server.properties
  52. - name: data
  53. mountPath: /data/kafka/logs
  54. # readinessProbe:
  55. # exec:
  56. # command:
  57. # - "/bin/sh"
  58. # - "-c"
  59. # - "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server=localhost:9092"
  60. volumes:
  61. - name: config
  62. configMap:
  63. name: kafka-config
  64. volumeClaimTemplates:
  65. - metadata:
  66. name: data
  67. spec:
  68. accessModes: [ "ReadWriteOnce" ]
  69. storageClassName: kafka-data-db
  70. resources:
  71. requests:
  72. storage: 10Gi

创建kafka headless service:
kafka-svc.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: kafka-svc
  5. namespace: kube-ops
  6. labels:
  7. app: kafka
  8. spec:
  9. selector:
  10. app: kafka
  11. clusterIP: None
  12. ports:
  13. - name: server
  14. port: 9092

创建kafka ConfigMap配置清单:
kafka-config.yaml

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: kafka-config
  5. namespace: kube-ops
  6. data:
  7. server.properties: |
  8. broker.id=${HOSTNAME##*-}
  9. listeners=PLAINTEXT://:9092
  10. num.network.threads=3
  11. num.io.threads=8
  12. socket.send.buffer.bytes=102400
  13. socket.receive.buffer.bytes=102400
  14. socket.request.max.bytes=104857600
  15. log.dirs=/data/kafka/logs
  16. num.partitions=1
  17. num.recovery.threads.per.data.dir=1
  18. offsets.topic.replication.factor=1
  19. transaction.state.log.replication.factor=1
  20. transaction.state.log.min.isr=1
  21. log.retention.hours=168
  22. log.segment.bytes=1073741824
  23. log.retention.check.interval.ms=300000
  24. zookeeper.connect=zk-0.zk-svc.kube-ops.svc.cluster.local:2181,zk-1.zk-svc.kube-ops.svc.cluster.local:2181,zk-2.zk-svc.kube-ops.svc.cluster.local:2181
  25. zookeeper.connection.timeout.ms=6000
  26. group.initial.rebalance.delay.ms=0

创建配置清单:

  1. # kubectl apply -f kafka-storage.yaml
  2. # kubectl apply -f kafka-svc.yaml
  3. # kubectl apply -f kafka-config.yaml
  4. # kubectl apply -f kafka.yaml
  5. # kubectl get pod -n kube-ops
  6. NAME READY STATUS RESTARTS AGE
  7. kafka-0 1/1 Running 0 13m
  8. kafka-1 1/1 Running 0 13m
  9. kafka-2 1/1 Running 0 10m
  10. zk-0 1/1 Running 0 77m
  11. zk-1 1/1 Running 0 77m
  12. zk-2 1/1 Running 0 76m

测试:

  1. # kubectl exec -it -n kube-ops kafka-0 -- /bin/bash
  2. $ cd /opt/kafka
  3. $ ./bin/kafka-topics.sh --create --topic test --zookeeper zk-0.zk-svc.kube-ops.svc.cluster.local:2181,zk-1.zk-svc.kube-ops.svc.cluster.local:2181,zk-2.zk-svc.kube-ops.svc.cluster.local:2181 --partitions 3 --replication-factor 2
  4. Created topic "test".
  5. # 消费
  6. $ ./bin/kafka-console-consumer.sh --topic test --bootstrap-server localhost:9092

然后再进入一个container:
做producer

  1. # kubectl exec -it -n kube-ops kafka-1 -- /bin/bash
  2. $ cd /opt/kafka
  3. $ ./bin/kafka-console-producer.sh --topic test --broker-list localhost:9092
  4. hello
  5. nihao

可以看到consumer上会产生消费信息:

  1. $ ./bin/kafka-console-consumer.sh --topic test --bootstrap-server localhost:9092
  2. hello
  3. nihao

至此,kafka集群搭建完成。
并且可以看到待消费数据:
image.png

2.2、部署logstash

我们已经将日志信息发送到了kafka,现在我们再使用logstash从kafka中消费日志然后传递给ES。
这里我直接采用deployment部署了。

定义configMap配置清单:
logstash-config.yaml

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: logstash-k8s-config
  5. namespace: kube-ops
  6. data:
  7. containers.conf: |
  8. input {
  9. kafka {
  10. codec => "json"
  11. topics => ["test"]
  12. bootstrap_servers => ["kafka-0.kafka-svc.kube-ops:9092, kafka-1.kafka-svc.kube-ops:9092, kafka-2.kafka-svc.kube-ops:9092"]
  13. group_id => "logstash-g1"
  14. }
  15. }
  16. output {
  17. elasticsearch {
  18. hosts => ["es-cluster-0.elasticsearch.kube-ops:9200", "es-cluster-1.elasticsearch.kube-ops:9200", "es-cluster-2.elasticsearch.kube-ops:9200"]
  19. index => "logstash-%{+YYYY.MM.dd}"
  20. }
  21. }

定义Deploy配置清单:
logstash.yaml

  1. kind: Deployment
  2. metadata:
  3. name: logstash
  4. namespace: kube-ops
  5. spec:
  6. replicas: 1
  7. selector:
  8. matchLabels:
  9. app: logstash
  10. template:
  11. metadata:
  12. labels:
  13. app: logstash
  14. spec:
  15. containers:
  16. - name: logstash
  17. image: registry.cn-hangzhou.aliyuncs.com/rookieops/logstash-kubernetes:7.1.1
  18. volumeMounts:
  19. - name: config
  20. mountPath: /opt/logstash/config/containers.conf
  21. subPath: containers.conf
  22. command:
  23. - "/bin/sh"
  24. - "-c"
  25. - "/opt/logstash/bin/logstash -f /opt/logstash/config/containers.conf"
  26. volumes:
  27. - name: config
  28. configMap:
  29. name: logstash-k8s-config

然后生成配置:

  1. # kubectl apply -f logstash-config.yaml
  2. # kubectl apply -f logstash.yaml

然后观察状态,查看日志:

  1. # kubectl get pod -n kube-ops
  2. NAME READY STATUS RESTARTS AGE
  3. dingtalk-hook-856c5dbbc9-srcm6 1/1 Running 0 3d20h
  4. es-cluster-0 1/1 Running 0 22m
  5. es-cluster-1 1/1 Running 0 22m
  6. es-cluster-2 1/1 Running 0 22m
  7. fluentd-es-jvhqv 1/1 Running 0 179m
  8. fluentd-es-s7v6m 1/1 Running 0 179m
  9. kafka-0 1/1 Running 0 3h6m
  10. kafka-1 1/1 Running 0 3h6m
  11. kafka-2 1/1 Running 0 3h6m
  12. kibana-7fc9f8c964-dqr68 1/1 Running 0 5d2h
  13. logstash-678c945764-lkl2n 1/1 Running 0 10m
  14. zk-0 1/1 Running 0 3d21h
  15. zk-1 1/1 Running 0 3d21h
  16. zk-2 1/1 Running 0 3d21h

image.png

2.3、部署ES

2.4、部署kibana

image.png

2.5、部署fluentd

首先需要安装fluent-plugin-kafka插件,因为原生fluentd镜像中并没有安装此插件。
安装步骤:
(1)、先用docker起一个容器

  1. # docker run -it registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4 /bin/bash
  2. $ gem install fluent-plugin-kafka --no-document

(2)、退出容器,重新commit 一下:

  1. # docker commit c29b250d8df9 registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4

(3)、将安装了插件的镜像推向仓库:

  1. # docker push registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4

配置fluentd的configMap:
fluentd-config.yaml

  1. kind: ConfigMap
  2. apiVersion: v1
  3. metadata:
  4. name: fluentd-config
  5. namespace: kube-ops
  6. labels:
  7. addonmanager.kubernetes.io/mode: Reconcile
  8. data:
  9. system.conf: |-
  10. <system>
  11. root_dir /tmp/fluentd-buffers/
  12. </system>
  13. containers.input.conf: |-
  14. <source>
  15. @id fluentd-containers.log
  16. @type tail
  17. path /var/log/containers/*.log
  18. pos_file /var/log/es-containers.log.pos
  19. time_format %Y-%m-%dT%H:%M:%S.%NZ
  20. localtime
  21. tag raw.kubernetes.*
  22. format json
  23. read_from_head true
  24. </source>
  25. # Detect exceptions in the log output and forward them as one log entry.
  26. <match raw.kubernetes.**>
  27. @id raw.kubernetes
  28. @type detect_exceptions
  29. remove_tag_prefix raw
  30. message log
  31. stream stream
  32. multiline_flush_interval 5
  33. max_bytes 500000
  34. max_lines 1000
  35. </match>
  36. system.input.conf: |-
  37. # Logs from systemd-journal for interesting services.
  38. <source>
  39. @id journald-docker
  40. @type systemd
  41. filters [{ "_SYSTEMD_UNIT": "docker.service" }]
  42. <storage>
  43. @type local
  44. persistent true
  45. </storage>
  46. read_from_head true
  47. tag docker
  48. </source>
  49. <source>
  50. @id journald-kubelet
  51. @type systemd
  52. filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
  53. <storage>
  54. @type local
  55. persistent true
  56. </storage>
  57. read_from_head true
  58. tag kubelet
  59. </source>
  60. forward.input.conf: |-
  61. # Takes the messages sent over TCP
  62. <source>
  63. @type forward
  64. </source>
  65. output.conf: |-
  66. # Enriches records with Kubernetes metadata
  67. <filter kubernetes.**>
  68. @type kubernetes_metadata
  69. </filter>
  70. <match **>
  71. @id kafka
  72. @type kafka2
  73. @log_level info
  74. include_tag_key true
  75. brokers kafka-0.kafka-svc.kube-ops:9092,kafka-1.kafka-svc.kube-ops:9092,kafka-2.kafka-svc.kube-ops:9092
  76. logstash_format true
  77. request_timeout 30s
  78. <buffer>
  79. @type file
  80. path /var/log/fluentd-buffers/kubernetes.system.buffer
  81. flush_mode interval
  82. retry_type exponential_backoff
  83. flush_thread_count 2
  84. flush_interval 5s
  85. retry_forever
  86. retry_max_interval 30
  87. chunk_limit_size 2M
  88. queue_limit_length 8
  89. overflow_action block
  90. </buffer>
  91. # data type settings
  92. <format>
  93. @type json
  94. </format>
  95. # topic settings
  96. topic_key topic
  97. default_topic test
  98. # producer settings
  99. required_acks -1
  100. compression_codec gzip
  101. </match>

fluentd的DS配置清单:
fluentd-daemonset.yaml

  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. name: fluentd-es
  5. namespace: kube-ops
  6. labels:
  7. k8s-app: fluentd-es
  8. kubernetes.io/cluster-service: "true"
  9. addonmanager.kubernetes.io/mode: Reconcile
  10. ---
  11. kind: ClusterRole
  12. apiVersion: rbac.authorization.k8s.io/v1
  13. metadata:
  14. name: fluentd-es
  15. labels:
  16. k8s-app: fluentd-es
  17. kubernetes.io/cluster-service: "true"
  18. addonmanager.kubernetes.io/mode: Reconcile
  19. rules:
  20. - apiGroups:
  21. - ""
  22. resources:
  23. - "namespaces"
  24. - "pods"
  25. verbs:
  26. - "get"
  27. - "watch"
  28. - "list"
  29. ---
  30. kind: ClusterRoleBinding
  31. apiVersion: rbac.authorization.k8s.io/v1
  32. metadata:
  33. name: fluentd-es
  34. labels:
  35. k8s-app: fluentd-es
  36. kubernetes.io/cluster-service: "true"
  37. addonmanager.kubernetes.io/mode: Reconcile
  38. subjects:
  39. - kind: ServiceAccount
  40. name: fluentd-es
  41. namespace: kube-ops
  42. apiGroup: ""
  43. roleRef:
  44. kind: ClusterRole
  45. name: fluentd-es
  46. apiGroup: ""
  47. ---
  48. apiVersion: apps/v1
  49. kind: DaemonSet
  50. metadata:
  51. name: fluentd-es
  52. namespace: kube-ops
  53. labels:
  54. k8s-app: fluentd-es
  55. version: v2.0.4
  56. kubernetes.io/cluster-service: "true"
  57. addonmanager.kubernetes.io/mode: Reconcile
  58. spec:
  59. selector:
  60. matchLabels:
  61. k8s-app: fluentd-es
  62. version: v2.0.4
  63. template:
  64. metadata:
  65. labels:
  66. k8s-app: fluentd-es
  67. kubernetes.io/cluster-service: "true"
  68. version: v2.0.4
  69. # This annotation ensures that fluentd does not get evicted if the node
  70. # supports critical pod annotation based priority scheme.
  71. # Note that this does not guarantee admission on the nodes (#40573).
  72. annotations:
  73. scheduler.alpha.kubernetes.io/critical-pod: ''
  74. spec:
  75. serviceAccountName: fluentd-es
  76. containers:
  77. - name: fluentd-es
  78. image: registry.cn-hangzhou.aliyuncs.com/rookieops/fluentd-elasticsearch:v2.0.4
  79. command:
  80. - "/bin/sh"
  81. - "-c"
  82. - "/run.sh $FLUENTD_ARGS"
  83. env:
  84. - name: FLUENTD_ARGS
  85. value: --no-supervisor -q
  86. resources:
  87. limits:
  88. memory: 500Mi
  89. requests:
  90. cpu: 100m
  91. memory: 200Mi
  92. volumeMounts:
  93. - name: varlog
  94. mountPath: /var/log
  95. - name: varlibdockercontainers
  96. mountPath: /var/lib/docker/containers
  97. readOnly: true
  98. - name: config-volume
  99. mountPath: /etc/fluent/config.d
  100. nodeSelector:
  101. beta.kubernetes.io/fluentd-ds-ready: "true"
  102. tolerations:
  103. - key: node-role.kubernetes.io/master
  104. operator: Exists
  105. effect: NoSchedule
  106. terminationGracePeriodSeconds: 30
  107. volumes:
  108. - name: varlog
  109. hostPath:
  110. path: /var/log
  111. - name: varlibdockercontainers
  112. hostPath:
  113. path: /var/lib/docker/containers
  114. - name: config-volume
  115. configMap:
  116. name: fluentd-config

创建配置清单:

  1. # kubectl apply -f fluentd-daemonset.yaml
  2. # kubectl apply -f fluentd-config.yaml
  3. # kubectl get pod -n kube-ops
  4. NAME READY STATUS RESTARTS AGE
  5. dingtalk-hook-856c5dbbc9-srcm6 1/1 Running 0 3d17h
  6. fluentd-es-jvhqv 1/1 Running 0 19m
  7. fluentd-es-s7v6m 1/1 Running 0 19m