说明

filebeat 提供了多种 Module 预制模块,简化了各种日志的格式化, 但是默认的字段并不能满足实际需求, 例如我们需要记录额外的 Nginx 字段
例如 请求时间、后端响应时间、主机头等信息
那么在 filebeat 的 nginx module 中需要同步定义
Nginx 的模块位置在 /usr/share/filebeat/module/nginx, 下边是目录结构.
目录结构

  1. ├── access
  2. ├── config
  3. └── nginx-access.yml
  4. ├── ingest
  5. └── default.json # 默认的解析字段
  6. ├── machine_learning
  7. └── ....json
  8. └── manifest.yml
  9. └── module.yml

默认的解析模块
这里我们需要修改的是 patterns 中的数据, 注意这里是经过 json 转义的.

  1. {
  2. "description": "Pipeline for parsing Nginx access logs. Requires the geoip and user_agent plugins.",
  3. "processors": [
  4. {
  5. "grok": {
  6. "field": "message",
  7. "patterns": [
  8. "\"?%{IP_LIST:nginx.access.remote_ip_list} - %{DATA:user.name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{GREEDYDATA:nginx.access.info}\" %{NUMBER:http.response.status_code:long} %{NUMBER:http.response.body.bytes:long} \"%{DATA:http.request.referrer}\" \"%{DATA:user_agent.original}\""
  9. ],
  10. "pattern_definitions": {
  11. "IP_LIST": "%{IP}(\"?,?\\s*%{IP})*"
  12. },
  13. "ignore_missing": true
  14. }
  15. }
  16. ]
  17. }

更改 nginx 日志的格式

之前

  1. log_format main '$remote_addr - $remote_user [$time_local] "$request" '
  2. '$status $body_bytes_sent "$http_referer" '
  3. '"$http_user_agent" "$http_x_forwarded_for" ';

之后

  1. log_format main '$remote_addr - $remote_user [$time_local] "$request" '
  2. '$status $body_bytes_sent "$http_referer" '
  3. '"$http_user_agent" "$http_x_forwarded_for" '
  4. '"$host" $request_time $upstream_response_time';

这里我们增加了三个字段

  1. 192.168.1.112 - - [25/Apr/2019:18:22:01 +0800] "GET /help/show/20 HTTP/1.1" 200 7474 "http://t.dailian.iliexiang.com/help?cat_id=2" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36" "-" "t.dailian.iliexiang.com" 0.063 0.021

更新 Patterns

支持的 Patterns grok-patterns

  1. vim /usr/share/filebeat/module/nginx/access/ingest/default.json

之前

  1. "\"?%{IP_LIST:nginx.access.remote_ip_list} - %{DATA:user.name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{GREEDYDATA:nginx.access.info}\" %{NUMBER:http.response.status_code:long} %{NUMBER:http.response.body.bytes:long} \"%{DATA:http.request.referrer}\" \"%{DATA:user_agent.original}\""

之后

  1. "?%{IP_LIST:nginx.access.remote_ip_list} - %{DATA:user.name} \[%{HTTPDATE:nginx.access.time}\] \"%{GREEDYDATA:nginx.access.info}\" %{NUMBER:http.response.status_code:long} %{NUMBER:http.response.body.bytes:long} "%{DATA:http.request.referrer}" \"%{DATA:user_agent.original}\" \"%{DATA:nginx.access.x_forwarded_for}\" \"%{DATA:nginx.access.host}\" %{NUMBER:nginx.access.request_time:float} %{NUMBER:nginx.access.upstream_response_time:float}

调试工具: 使用 kibana 的 Debuger

  1. http://192.168.1.21:5601/app/kibana#/dev_tools/grokdebugger?_g=()

这里需要填写自定义的 Patterns , 否则无法识别
Custom Patterns

  1. IP_LIST %{IP}(\"?,?\\s*%{IP})*

更新 Fields

编辑字段

  1. vim /etc/filebeat/fields.yml

在文件 /etc/filebeat/fields.yml, 找到 nginx 字段, 添加以上的三个字段

  1. - name: x_forwarded_for
  2. type: group
  3. description: >
  4. Forwarded IP
  5. - name: host
  6. type: group
  7. description: >
  8. Server hostname.
  9. - name: request_time
  10. type: group
  11. description: >
  12. Url Request Time
  13. - name: upstream_response_time
  14. type: group
  15. description: >
  16. Upstream Response Time.

让新修改的文件生效

先检查配置文件是否正确

  1. # filebeat test config
  1. # systemctl restart filebeat
  1. # 获取所有的 pipeline
  2. GET _ingest/pipeline
  3. DELETE _ingest/pipeline/filebeat-7.0.0-nginx-access-default

模拟请求 pattern

  1. POST _ingest/pipeline/filebeat-6.6.1-nginx-access-default/_simulate
  2. {
  3. "docs":[
  4. {
  5. "_source": {
  6. "message": "10.10.10.10 - - [17/Oct/2017:03:48:00 +0200] \"GET /my_page/40 HTTP/1.1\" 200 75793 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\" 0.277"
  7. }
  8. }
  9. ]
  10. }