前提

因为前面我在测试148服务器已经相继部署了elasticsearch:6.8.6和kinba:6.8.6,所以这里只需要部署fluentd就行了。但是我的fluentd是通过k8s部署,当稳定性测试通过后,我会将docker-compose部署的es和kinba会迁移到k8上,下面我将通过我自身踩过的坑来安装下Fluentd来实现日志的搜集。

下载对应配置文件

image.png
可以从官方提供的下载地址去获取对应k8配置文件:地址,我先把它列出来以此做备份,当然你前提是需要先创建对应的namespace: logging

ConfigMap

  1. kind: ConfigMap
  2. apiVersion: v1
  3. metadata:
  4. name: fluentd-es-config-v0.2.1
  5. namespace: logging
  6. labels:
  7. addonmanager.kubernetes.io/mode: Reconcile
  8. data:
  9. system.conf: |-
  10. <system>
  11. root_dir /tmp/fluentd-buffers/
  12. </system>
  13. containers.input.conf: |-
  14. <source>
  15. @id fluentd-containers.log
  16. @type tail
  17. path /var/log/containers/*.log
  18. pos_file /var/log/es-containers.log.pos
  19. tag raw.kubernetes.*
  20. read_from_head true
  21. <parse>
  22. @type multi_format
  23. <pattern>
  24. format json
  25. time_key time
  26. time_format %Y-%m-%dT%H:%M:%S.%NZ
  27. </pattern>
  28. <pattern>
  29. format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
  30. time_format %Y-%m-%dT%H:%M:%S.%N%:z
  31. </pattern>
  32. </parse>
  33. </source>
  34. # Detect exceptions in the log output and forward them as one log entry.
  35. <match raw.kubernetes.**>
  36. @id raw.kubernetes
  37. @type detect_exceptions
  38. remove_tag_prefix raw
  39. message log
  40. stream stream
  41. multiline_flush_interval 5
  42. max_bytes 500000
  43. max_lines 1000
  44. </match>
  45. # Concatenate multi-line logs
  46. <filter **>
  47. @id filter_concat
  48. @type concat
  49. key message
  50. multiline_end_regexp /\n$/
  51. separator ""
  52. </filter>
  53. # Enriches records with Kubernetes metadata
  54. <filter kubernetes.**>
  55. @id filter_kubernetes_metadata
  56. @type kubernetes_metadata
  57. </filter>
  58. # Fixes json fields in Elasticsearch
  59. <filter kubernetes.**>
  60. @id filter_parser
  61. @type parser
  62. key_name log
  63. reserve_data true
  64. remove_key_name_field true
  65. <parse>
  66. @type multi_format
  67. <pattern>
  68. format json
  69. </pattern>
  70. <pattern>
  71. format none
  72. </pattern>
  73. </parse>
  74. </filter>
  75. system.input.conf: |-
  76. <source>
  77. @id minion
  78. @type tail
  79. format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
  80. time_format %Y-%m-%d %H:%M:%S
  81. path /var/log/salt/minion
  82. pos_file /var/log/salt.pos
  83. tag salt
  84. </source>
  85. <source>
  86. @id startupscript.log
  87. @type tail
  88. format syslog
  89. path /var/log/startupscript.log
  90. pos_file /var/log/es-startupscript.log.pos
  91. tag startupscript
  92. </source>
  93. <source>
  94. @id docker.log
  95. @type tail
  96. format /^time="(?<time>[^"]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
  97. path /var/log/docker.log
  98. pos_file /var/log/es-docker.log.pos
  99. tag docker
  100. </source>
  101. <source>
  102. @id etcd.log
  103. @type tail
  104. # Not parsing this, because it doesn't have anything particularly useful to
  105. # parse out of it (like severities).
  106. format none
  107. path /var/log/etcd.log
  108. pos_file /var/log/es-etcd.log.pos
  109. tag etcd
  110. </source>
  111. <source>
  112. @id kubelet.log
  113. @type tail
  114. format multiline
  115. multiline_flush_interval 5s
  116. format_firstline /^\w\d{4}/
  117. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  118. time_format %m%d %H:%M:%S.%N
  119. path /var/log/kubelet.log
  120. pos_file /var/log/es-kubelet.log.pos
  121. tag kubelet
  122. </source>
  123. <source>
  124. @id kube-proxy.log
  125. @type tail
  126. format multiline
  127. multiline_flush_interval 5s
  128. format_firstline /^\w\d{4}/
  129. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  130. time_format %m%d %H:%M:%S.%N
  131. path /var/log/kube-proxy.log
  132. pos_file /var/log/es-kube-proxy.log.pos
  133. tag kube-proxy
  134. </source>
  135. <source>
  136. @id kube-apiserver.log
  137. @type tail
  138. format multiline
  139. multiline_flush_interval 5s
  140. format_firstline /^\w\d{4}/
  141. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  142. time_format %m%d %H:%M:%S.%N
  143. path /var/log/kube-apiserver.log
  144. pos_file /var/log/es-kube-apiserver.log.pos
  145. tag kube-apiserver
  146. </source>
  147. <source>
  148. @id kube-controller-manager.log
  149. @type tail
  150. format multiline
  151. multiline_flush_interval 5s
  152. format_firstline /^\w\d{4}/
  153. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  154. time_format %m%d %H:%M:%S.%N
  155. path /var/log/kube-controller-manager.log
  156. pos_file /var/log/es-kube-controller-manager.log.pos
  157. tag kube-controller-manager
  158. </source>
  159. <source>
  160. @id kube-scheduler.log
  161. @type tail
  162. format multiline
  163. multiline_flush_interval 5s
  164. format_firstline /^\w\d{4}/
  165. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  166. time_format %m%d %H:%M:%S.%N
  167. path /var/log/kube-scheduler.log
  168. pos_file /var/log/es-kube-scheduler.log.pos
  169. tag kube-scheduler
  170. </source>
  171. <source>
  172. @id glbc.log
  173. @type tail
  174. format multiline
  175. multiline_flush_interval 5s
  176. format_firstline /^\w\d{4}/
  177. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  178. time_format %m%d %H:%M:%S.%N
  179. path /var/log/glbc.log
  180. pos_file /var/log/es-glbc.log.pos
  181. tag glbc
  182. </source>
  183. <source>
  184. @id cluster-autoscaler.log
  185. @type tail
  186. format multiline
  187. multiline_flush_interval 5s
  188. format_firstline /^\w\d{4}/
  189. format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  190. time_format %m%d %H:%M:%S.%N
  191. path /var/log/cluster-autoscaler.log
  192. pos_file /var/log/es-cluster-autoscaler.log.pos
  193. tag cluster-autoscaler
  194. </source>
  195. <source>
  196. @id journald-docker
  197. @type systemd
  198. matches [{ "_SYSTEMD_UNIT": "docker.service" }]
  199. <storage>
  200. @type local
  201. persistent true
  202. path /var/log/journald-docker.pos
  203. </storage>
  204. read_from_head true
  205. tag docker
  206. </source>
  207. <source>
  208. @id journald-container-runtime
  209. @type systemd
  210. matches [{ "_SYSTEMD_UNIT": "{{ fluentd_container_runtime_service }}.service" }]
  211. <storage>
  212. @type local
  213. persistent true
  214. path /var/log/journald-container-runtime.pos
  215. </storage>
  216. read_from_head true
  217. tag container-runtime
  218. </source>
  219. <source>
  220. @id journald-kubelet
  221. @type systemd
  222. matches [{ "_SYSTEMD_UNIT": "kubelet.service" }]
  223. <storage>
  224. @type local
  225. persistent true
  226. path /var/log/journald-kubelet.pos
  227. </storage>
  228. read_from_head true
  229. tag kubelet
  230. </source>
  231. <source>
  232. @id journald-node-problem-detector
  233. @type systemd
  234. matches [{ "_SYSTEMD_UNIT": "node-problem-detector.service" }]
  235. <storage>
  236. @type local
  237. persistent true
  238. path /var/log/journald-node-problem-detector.pos
  239. </storage>
  240. read_from_head true
  241. tag node-problem-detector
  242. </source>
  243. <source>
  244. @id kernel
  245. @type systemd
  246. matches [{ "_TRANSPORT": "kernel" }]
  247. <storage>
  248. @type local
  249. persistent true
  250. path /var/log/kernel.pos
  251. </storage>
  252. <entry>
  253. fields_strip_underscores true
  254. fields_lowercase true
  255. </entry>
  256. read_from_head true
  257. tag kernel
  258. </source>
  259. forward.input.conf: |-
  260. # Takes the messages sent over TCP
  261. <source>
  262. @id forward
  263. @type forward
  264. </source>
  265. monitoring.conf: |-
  266. # Prometheus Exporter Plugin
  267. # input plugin that exports metrics
  268. <source>
  269. @id prometheus
  270. @type prometheus
  271. </source>
  272. <source>
  273. @id monitor_agent
  274. @type monitor_agent
  275. </source>
  276. # input plugin that collects metrics from MonitorAgent
  277. <source>
  278. @id prometheus_monitor
  279. @type prometheus_monitor
  280. <labels>
  281. host ${hostname}
  282. </labels>
  283. </source>
  284. # input plugin that collects metrics for output plugin
  285. <source>
  286. @id prometheus_output_monitor
  287. @type prometheus_output_monitor
  288. <labels>
  289. host ${hostname}
  290. </labels>
  291. </source>
  292. # input plugin that collects metrics for in_tail plugin
  293. <source>
  294. @id prometheus_tail_monitor
  295. @type prometheus_tail_monitor
  296. <labels>
  297. host ${hostname}
  298. </labels>
  299. </source>
  300. output.conf: |-
  301. <match **>
  302. @id elasticsearch
  303. @type elasticsearch
  304. @log_level info
  305. type_name _doc
  306. include_tag_key true
  307. host elasticsearch-logging #这个可以填写我们自身148服务器es所在ip 192.168.1.148
  308. port 9200
  309. logstash_format true
  310. <buffer>
  311. @type file
  312. path /var/log/fluentd-buffers/kubernetes.system.buffer
  313. flush_mode interval
  314. retry_type exponential_backoff
  315. flush_thread_count 2
  316. flush_interval 5s
  317. retry_forever
  318. retry_max_interval 30
  319. chunk_limit_size 2M
  320. total_limit_size 500M
  321. overflow_action block
  322. </buffer>
  323. </match>

DaemonSet + rbac

  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. name: fluentd-es
  5. namespace: logging
  6. labels:
  7. k8s-app: fluentd-es
  8. addonmanager.kubernetes.io/mode: Reconcile
  9. ---
  10. kind: ClusterRole
  11. apiVersion: rbac.authorization.k8s.io/v1
  12. metadata:
  13. name: fluentd-es
  14. labels:
  15. k8s-app: fluentd-es
  16. addonmanager.kubernetes.io/mode: Reconcile
  17. rules:
  18. - apiGroups:
  19. - ""
  20. resources:
  21. - "namespaces"
  22. - "pods"
  23. verbs:
  24. - "get"
  25. - "watch"
  26. - "list"
  27. ---
  28. kind: ClusterRoleBinding
  29. apiVersion: rbac.authorization.k8s.io/v1
  30. metadata:
  31. name: fluentd-es
  32. labels:
  33. k8s-app: fluentd-es
  34. addonmanager.kubernetes.io/mode: Reconcile
  35. subjects:
  36. - kind: ServiceAccount
  37. name: fluentd-es
  38. namespace: logging
  39. apiGroup: ""
  40. roleRef:
  41. kind: ClusterRole
  42. name: fluentd-es
  43. apiGroup: ""
  44. ---
  45. apiVersion: apps/v1
  46. kind: DaemonSet
  47. metadata:
  48. name: fluentd-es-v3.1.1
  49. namespace: logging
  50. labels:
  51. k8s-app: fluentd-es
  52. version: v3.1.1
  53. addonmanager.kubernetes.io/mode: Reconcile
  54. spec:
  55. selector:
  56. matchLabels:
  57. k8s-app: fluentd-es
  58. version: v3.1.1
  59. template:
  60. metadata:
  61. labels:
  62. k8s-app: fluentd-es
  63. version: v3.1.1
  64. spec:
  65. securityContext:
  66. seccompProfile:
  67. type: RuntimeDefault
  68. priorityClassName: system-node-critical
  69. serviceAccountName: fluentd-es
  70. containers:
  71. - name: fluentd-es
  72. image: quay.io/fluentd_elasticsearch/fluentd:v3.1.0
  73. env:
  74. - name: FLUENTD_ARGS
  75. value: --no-supervisor -q
  76. resources:
  77. limits:
  78. memory: 500Mi
  79. requests:
  80. cpu: 100m
  81. memory: 200Mi
  82. volumeMounts:
  83. - name: varlog
  84. mountPath: /var/log
  85. - name: varlibdockercontainers
  86. mountPath: /var/lib/docker/containers
  87. readOnly: true
  88. - name: config-volume
  89. mountPath: /etc/fluent/config.d
  90. ports:
  91. - containerPort: 24231
  92. name: prometheus
  93. protocol: TCP
  94. livenessProbe:
  95. tcpSocket:
  96. port: prometheus
  97. initialDelaySeconds: 5
  98. timeoutSeconds: 10
  99. readinessProbe:
  100. tcpSocket:
  101. port: prometheus
  102. initialDelaySeconds: 5
  103. timeoutSeconds: 10
  104. terminationGracePeriodSeconds: 30
  105. volumes:
  106. - name: varlog
  107. hostPath:
  108. path: /var/log #这个路径不要填写自身定义的宿主机路径
  109. - name: varlibdockercontainers
  110. hostPath:
  111. path: /var/lib/docker/containers #这个路径不要填写自身定义的宿主机路径
  112. - name: config-volume
  113. configMap:
  114. name: fluentd-es-config-v0.2.1

需要注意的就是其中ConfigMap里面的es host需要填写es宿主机所在的ip,由于我们elasticsearch是部署在kubernetes外面,想让内部服务访问elasticsearch还需要简单配置一下。添加一个elasticsearch的endpoints让service能够找到我们的elasticsearch服务。只需要执行下面的两个文件即可。
其次需要注意的就是我们挂在每个节点机器的hostpath必须是对应地址,否则也无法收集对应日志

创建es的终端Endpoints

  1. kind: Endpoints
  2. apiVersion: v1
  3. metadata:
  4. name: elasticsearch-logging
  5. namespace: logging
  6. labels:
  7. k8s-app: elasticsearch-logging
  8. kubernetes.io/name: "Elasticsearch"
  9. subsets:
  10. - addresses:
  11. - ip: 192.168.1.148
  12. ports:
  13. - port: 9200
  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: elasticsearch-logging
  5. namespace: logging
  6. labels:
  7. k8s-app: elasticsearch-logging
  8. addonmanager.kubernetes.io/mode: Reconcile
  9. kubernetes.io/name: "Elasticsearch"
  10. spec:
  11. ports:
  12. - port: 9200
  13. protocol: TCP
  14. targetPort: db
  15. clusterIP: None

检查service能否找到endpoints

  1. kubectl -n logging describe svc elasticsearch-logging

image.png
执行上面文件之后,我们就会在每个节点机器上起对应的pod去监听宿主机下/var/log/containers 对应各个pod的日志信息了,然后将日志发送到我们的es上进行存储,也是为我们后面通过kinba分析做铺垫。
image.png

完成日志索引

查看创建的索引文件

从下图可以看出我们近期两日的文件已经被索引进去了,而且是通过logstash格式。接下来我们要通过kinba中的左侧的管理菜单创建对应的索引模式。
image.png

创建索引模式

在创建索引模式也遇到了坑,然后点击确定界面一直打转阻塞,查看网页network显示报错error forbidden,从日志中可以看出索引仅有 只读权限,状态码为 403;,上网上查了一下说当es发现你的磁盘占用超过85%后就会产生此错误,让你的索引变为只读模式,但是我磁盘是正常的。后面通过在es宿主机上执行一行命令修改下全局的索引状态后便ok了。

  1. curl -XPUT -H "Content-Type: application/json" http://127.0.0.1:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
  2. {"acknowledged":true}%

image.png

查看日志

在我们的kinba上左侧第一个discove,里面的玩法很多,我这里就不一一举例了。ok至此完成服务器日志统一收集
image.png

参考:文章地址