一、初步了解

1.1 Elasticsearch SQL是什么?

2018年6月,Elasticsearch 6.3 版本 X-Pack插件中引入新的SQL查询功能,虽然一直到现在,有经验的专业人员仍然更加喜欢使用Elasticsearch DSL 来进行搜索,但是毫无疑问,Elasticsearch SQL 的查询语法对于新用户更加友好,毕竟大部分程序员并不想从头学习一门新的语言。下面是两条等价的查询语句:

  1. # Elasticsearch SQL 查询语句
  2. POST /_sql?format=txt
  3. {
  4. "query": "SELECT AVG(DistanceKilometers) AS AvgDistance, OriginCountry FROM kibana_sample_data_flights WHERE DestWeather='Sunny' AND AvgTicketPrice>=300 GROUP BY OriginCountry ORDER BY OriginCountry ASC LIMIT 5"
  5. }
  6. # Elasticsearch DSL 查询语句
  7. GET kibana_sample_data_flights/_search
  8. {
  9. "size" : 0,
  10. "query" : {
  11. "bool" : {
  12. "must": [
  13. {
  14. "term" : {
  15. "DestWeather" : {
  16. "value" : "Sunny"
  17. }
  18. }
  19. },
  20. {
  21. "range" : {
  22. "AvgTicketPrice" : {
  23. "gte": 300
  24. }
  25. }
  26. }
  27. ]
  28. }
  29. },
  30. "aggregations" : {
  31. "groupby" : {
  32. "composite" : {
  33. "size": 5,
  34. "sources" : [
  35. {
  36. "a1" : {
  37. "terms" : {
  38. "field" : "OriginCountry",
  39. "order" : "asc"
  40. }
  41. }
  42. }
  43. ]
  44. },
  45. "aggregations" : {
  46. "a2" : {
  47. "avg" : {
  48. "field" : "DistanceKilometers"
  49. }
  50. }
  51. }
  52. }
  53. }
  54. }

1.2 Elasticsearch SQL不是什么?

Elasticsearch SQL提供了一个符合ANSI SQL规范的只读接口,即通过Elasticsearch SQL, 我们只能进行搜索,无法实现创建、更新、删除Elasticsearch的索引或者索引中的文档。
Elasticsearch SQL 的查询功能,相比较于其他关系型数据库的SQL查询,也有许多独特之处,例如: Elasticsearch SQL 不支持 join 查询语法,但却实现了更多的聚合分析函数。更多不同之处,下文将会详细介绍。

1.3 环境说明

本文测试环境为 Elasticsearch Cloud 免费试用的Elasticsearch集群,其中Elasticsearch 和 Kibana 版本是7.14.0,文中查询的索引test_logs即是Kibana示例索引kibana_sample_data_logs。

二、基本使用

2.1 概念对应

因为Elasticsearch SQL在设计之初就遵循着最小惊讶原则(principle of least surprise),所以我们可以在Elasticsearch SQL中查找到许多熟悉的SQL概念。

SQL中概念 Elasticsearch对应的概念
database cluster instance
table index
schema mapping
row document
column field

2.2 连接方式

2.2.1 SQL CLI脚本

  1. # 进入安装目录,存在可执行脚本./bin/elasticsearch-sql-cli
  2. ./bin/elasticsearch-sql-cli https://username:password@host:port
  3. # 进入sql终端,执行sql语句
  4. show tables;

2.2.2 JDBC连接

2.2.3 RESTful API调用

2.3 基础查询

下面的基础查询,默认在Kibana中操作。

2.3.1 简单示例

(1)SHOW TABLES, 获取集群的所有索引,包含别名信息

  1. POST _sql?format=txt
  2. {
  3. "query":"SHOW TABLES"
  4. }
  5. name | type | kind
  6. -----------------------+---------------+---------------
  7. kibana_sample_data_logs|TABLE |INDEX
  8. test_logs |VIEW |ALIAS

(2)SHOW COLUMNS IN test_logs,查看索引 test_logs的字段信息(语句DESC test_logs作用相同)

  1. POST _sql?format=txt
  2. {
  3. "query":"SHOW COLUMNS IN test_logs"
  4. }
  5. POST _sql?format=txt
  6. {
  7. "query":"DESC test_logs"
  8. }
  9. column | type | mapping
  10. ------------------+---------------+---------------
  11. @timestamp |TIMESTAMP |datetime
  12. agent |VARCHAR |text
  13. agent.keyword |VARCHAR |keyword
  14. bytes |BIGINT |long
  15. clientip |VARCHAR |ip
  16. event |STRUCT |object

(3)SHOW FUNCTIONS, 查看当前支持的函数

  1. POST _sql?format=txt
  2. {
  3. "query":"SHOW FUNCTIONS"
  4. }
  5. name | type
  6. -----------------+---------------
  7. AVG |AGGREGATE
  8. COUNT |AGGREGATE
  9. FIRST |AGGREGATE
  10. FIRST_VALUE |AGGREGATE
  11. LAST |AGGREGATE

(4)SELECT FROM,获取索引中的部分数据

  1. POST _sql?format=txt
  2. {
  3. "query":"SELECT clientip, url, response FROM test_logs ORDER BY timestamp DESC LIMIT 3"
  4. }
  5. clientip | url | response
  6. ---------------+-----------------------------------------------------------------------------------------+---------------
  7. 118.151.35.151 |https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb |200
  8. 99.76.103.49 |https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb |200
  9. 41.77.243.255 |https://elastic-elastic-elastic.org/people/type:astronauts/name:michael-s-hopkins/profile|200

上面通过4条语句,简单介绍了Elasticsearch SQL的基本查询,接下来将介绍更多的实用技巧。

2.3.2 全文搜索

Elasticsearch的全文检索功能,在Ealsticsearch SQL中可以通过两个语法实现: match, query。

(1)WHERE MATCH,实现全文搜索,效果同mulit_match。

  1. # 查询 url 字段中包含 metricbeat的文档, score()函数返回相关性评分
  2. POST _sql?format=txt
  3. {
  4. "query":"SELECT SCORE(), clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 3"
  5. }
  6. SCORE() | clientip | url | response
  7. ---------------+---------------+----------------------------------------------------------------------------------+---------------
  8. 2.7142088 |74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200
  9. 2.7142088 |232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200
  10. 2.7142088 |9.16.178.89 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|404

(2)WHERE QUERY,实现全文搜索,效果同query_string。

  1. POST _sql?format=txt
  2. {
  3. "query":"SELECT SCORE(), clientip, url, response FROM test_logs WHERE QUERY('url:metricbeat') ORDER BY timestamp DESC LIMIT 3"
  4. }
  5. SCORE() | clientip | url | response
  6. ---------------+---------------+----------------------------------------------------------------------------------+---------------
  7. 2.7142088 |74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200
  8. 2.7142088 |232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200
  9. 2.7142088 |9.16.178.89 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|404

2.3.3 深层分页

实际使用过程中,对于全文搜索后,得到的大量返回文档,经常需要分页展示。在Elasticsearch DSL中我们一般会使用 from + size / scroll / search_after 等方式来实现,Elasticsearch SQL中存在 fetch_size ,指定当前查询返回的文档数量,用于实现分页功能。

Elasticsearch SQL 中fetch_size,可以类比为Elasticsearch DSL中的 scroll(游标)

(1)第1次查询,指定 fetch_size = 10,表示获取 test_logs中的一个文档,如果返回值中有 cursor,表示还有更多满足条件的文档;

  1. POST _sql?format=json
  2. {
  3. "query":"SELECT clientip, host, response FROM test_logs ORDER BY timestamp DESC",
  4. "fetch_size": 10
  5. }
  6. {
  7. "columns" : [
  8. {
  9. "name" : "clientip",
  10. "type" : "ip"
  11. },
  12. {
  13. "name" : "host",
  14. "type" : "text"
  15. },
  16. {
  17. "name" : "response",
  18. "type" : "text"
  19. }
  20. ],
  21. "rows" : [
  22. [
  23. "118.151.35.151",
  24. "artifacts.elastic.co",
  25. "200"
  26. ]
  27. ...
  28. ],
  29. "cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRm5ScVMzSjBRbEZ0VkRSNWJpMVBZMEYwUmpZME9GRUFBQUFBQUJVU2hCWkJaMlJ5Y0hOTU1WTnhRMUppUlZwVWFtNTViM2Qz/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
  30. }

(2)第2次查询,只需要添加前一次的 cursor

  1. POST /_sql?format=json
  2. {
  3. "cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRm5ScVMzSjBRbEZ0VkRSNWJpMVBZMEYwUmpZME9GRUFBQUFBQUJVU2hCWkJaMlJ5Y0hOTU1WTnhRMUppUlZwVWFtNTViM2Qz/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
  4. }
  5. {
  6. "rows" : [
  7. [
  8. "41.77.243.255",
  9. "elastic-elastic-elastic.org",
  10. "200"
  11. ],
  12. ...
  13. ],
  14. "cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRmpoSlRHMTFURUZVVVRoVFgxZ3dRMjV2TWxwcE9HY0FBQUFBQUJVZ2loWnlla3hUY1ZaYU0xTnlWeTF6UVVkRlVYQkNUWFJC/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
  15. }

(3)最后一次查询,返回结果中将不存在 cursor ,表示已经到了最后一页

  1. POST /_sql?format=json
  2. {
  3. "cursor": "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRmpoSlRHMTFURUZVVVRoVFgxZ3dRMjV2TWxwcE9HY0FBQUFBQUJVZ2loWnlla3hUY1ZaYU0xTnlWeTF6UVVkRlVYQkNUWFJC/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
  4. }
  5. {
  6. "columns" : [
  7. {
  8. "name" : "clientip",
  9. "type" : "ip"
  10. },
  11. {
  12. "name" : "host",
  13. "type" : "text"
  14. },
  15. {
  16. "name" : "response",
  17. "type" : "text"
  18. }
  19. ],
  20. "rows" : [
  21. [
  22. "118.151.35.151",
  23. "artifacts.elastic.co",
  24. "200"
  25. ]
  26. ]
  27. }

需要注意的是,

a. 如果查询指定返回结构是 txt / csv 等,将无法获取到 carsor;
b. 如果查询语句中包含有 LIMIT,fetch_size将无法工作,即无法分页。

2.4 SQL与DSL

2.4.1 SQL转换为DSL

Elasticsearch SQL提供了 translate接口,可以将SQL语法转换为对应的DSL语法。
通过查看DSL语法,可以更好的学习lasticsearch DSL;也可以通过转化结果,反过来判断SQL查询的正确性或者优化SQL查询。

  1. POST _sql/translate
  2. {
  3. "query":"SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 10"
  4. }
  5. {
  6. "size" : 10,
  7. "query" : {
  8. "match" : {
  9. "url" : {
  10. "query" : "metricbeat",
  11. "operator" : "OR",
  12. "prefix_length" : 0,
  13. "max_expansions" : 50,
  14. "fuzzy_transpositions" : true,
  15. "lenient" : false,
  16. "zero_terms_query" : "NONE",
  17. "auto_generate_synonyms_phrase_query" : true,
  18. "boost" : 1.0
  19. }
  20. }
  21. },
  22. "_source" : false,
  23. "fields" : [
  24. {
  25. "field" : "clientip"
  26. },
  27. {
  28. "field" : "url"
  29. },
  30. {
  31. "field" : "response"
  32. }
  33. ],
  34. "sort" : [
  35. {
  36. "timestamp" : {
  37. "order" : "desc",
  38. "missing" : "_first",
  39. "unmapped_type" : "date"
  40. }
  41. }
  42. ]
  43. }

2.4.2 SQL混合DSL语句

如果遇到过滤条件逻辑关系非常复杂,我们可以在SQL中 添加Elasticsearch DSL过滤条件。

  1. POST /_sql?format=txt
  2. {
  3. "query": "SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 3",
  4. "filter": {
  5. "range": {
  6. "response": {
  7. "gte": 200,
  8. "lt": 300
  9. }
  10. }
  11. }
  12. }
  13. clientip | url | response
  14. ---------------+----------------------------------------------------------------------------------+---------------
  15. 74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200
  16. 232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200
  17. 148.192.209.125|https://www.elastic.co/downloads/beats/metricbeat |200


注意此时filter中的过滤条件,与query中的WHERR 是逻辑AND关系

  1. POST /_sql/translate
  2. {
  3. "query": "SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 10",
  4. "filter": {
  5. "range": {
  6. "response": {
  7. "gte": 200,
  8. "lt": 300
  9. }
  10. }
  11. }
  12. }
  13. {
  14. "size" : 3,
  15. "query" : {
  16. "bool" : {
  17. "must" : [
  18. {
  19. "match" : {
  20. "url" : {
  21. "query" : "metricbeat",
  22. "operator" : "OR",
  23. "prefix_length" : 0,
  24. "max_expansions" : 50,
  25. "fuzzy_transpositions" : true,
  26. "lenient" : false,
  27. "zero_terms_query" : "NONE",
  28. "auto_generate_synonyms_phrase_query" : true,
  29. "boost" : 1.0
  30. }
  31. }
  32. }
  33. ],
  34. "filter" : [
  35. {
  36. "range" : {
  37. "response" : {
  38. "from" : 200,
  39. "to" : 300,
  40. "include_lower" : true,
  41. "include_upper" : false,
  42. "boost" : 1.0
  43. }
  44. }
  45. }
  46. ],
  47. "adjust_pure_negative" : true,
  48. "boost" : 1.0
  49. }
  50. },
  51. "_source" : false,
  52. "fields" : [
  53. {
  54. "field" : "clientip"
  55. },
  56. {
  57. "field" : "url"
  58. },
  59. {
  60. "field" : "response"
  61. }
  62. ],
  63. "sort" : [
  64. {
  65. "timestamp" : {
  66. "order" : "desc",
  67. "missing" : "_first",
  68. "unmapped_type" : "date"
  69. }
  70. }
  71. ]
  72. }

2.4.3 SQL与DSL的区别

Elasticsearch SQL

2.5 聚合分析

Elasticsearch的聚合分析功能非常强大,在实际生产环境中,通过聚合分析再结合图表展示,我们可以实现许多实用功能,例如:查看网站访问IP分布情况,某些服务的P99延迟等等。

2.5.1 复合聚合

Elasticsearch SQL中提供了group by 语法用于桶聚合分析,对于 group by 语法,Elasticsearch内部将转换为复合聚合(composite aggregation)。
例如:下面的查询根据 返回状态码分类,统计对应的http 请求的body字节大小平均值 和请求次数。

  1. POST _sql/translate
  2. {
  3. "query":"SELECT response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY response"
  4. }
  5. {
  6. "size" : 0,
  7. "_source" : false,
  8. "aggregations" : {
  9. "groupby" : {
  10. "composite" : {
  11. "size" : 1000,
  12. "sources" : [
  13. {
  14. "565ba6a7" : {
  15. "terms" : {
  16. "field" : "response.keyword",
  17. "missing_bucket" : true,
  18. "order" : "asc"
  19. }
  20. }
  21. }
  22. ]
  23. },
  24. "aggregations" : {
  25. "945f09f1" : {
  26. "avg" : {
  27. "field" : "bytes"
  28. }
  29. }
  30. }
  31. }
  32. }
  33. }
  34. POST _sql?format=txt
  35. {
  36. "query":"SELECT response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY response"
  37. }
  38. response | count_docs | avg_bytes
  39. ---------------+---------------+-----------------
  40. 200 |12832 |5897.852711970075
  41. 404 |801 |5049.242197253433
  42. 503 |441 |0.0

2.5.2 多层聚合

既然group by会被转化为复合聚合,我们可以通过 group by 多个字段,实现多层聚合。
例如:我们对测试数据(访问日志),先通过访问域名(host)分类,然后再根据返回状态码分类,统计对应的http 请求的body字节大小平均值 和请求次数。

  1. POST _sql?format=txt
  2. {
  3. "query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response"
  4. }
  5. host | response | count_docs | avg_bytes
  6. -------------------------------+---------------+---------------+-----------------
  7. artifacts.elastic.co |200 |5950 |6449.373109243697
  8. artifacts.elastic.co |404 |330 |6111.981818181818
  9. artifacts.elastic.co |503 |208 |0.0
  10. cdn.elastic-elastic-elastic.org|200 |2082 |5771.631123919308
  11. cdn.elastic-elastic-elastic.org|404 |106 |5527.292452830188
  12. cdn.elastic-elastic-elastic.org|503 |67 |0.0
  13. elastic-elastic-elastic.org |200 |415 |4775.802409638554
  14. elastic-elastic-elastic.org |404 |125 |2397.616
  15. elastic-elastic-elastic.org |503 |12 |0.0
  16. www.elastic.co |200 |4385 |5315.617331812999
  17. www.elastic.co |404 |240 |4757.891666666666
  18. www.elastic.co |503 |154 |0.0

2.5.3 直方图

在我们的使用场景中,经常需要对日期进行直方图聚合,Elasticsearch SQL中存在histogram函数,可以实现。

  1. POST _sql?format=txt
  2. {
  3. "query":"SELECT host, HISTOGRAM(timestamp, INTERVAL 1 MONTH) as h, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, h"
  4. }
  5. host | h | count_docs | avg_bytes
  6. -------------------------------+------------------------+---------------+------------------
  7. artifacts.elastic.co |2021-08-01T00:00:00.000Z|1077 |6146.069637883008
  8. artifacts.elastic.co |2021-09-01T00:00:00.000Z|3150 |6234.906666666667
  9. artifacts.elastic.co |2021-10-01T00:00:00.000Z|2261 |6250.088898717382
  10. cdn.elastic-elastic-elastic.org|2021-08-01T00:00:00.000Z|374 |5678.467914438503
  11. cdn.elastic-elastic-elastic.org|2021-09-01T00:00:00.000Z|1125 |5569.506666666667
  12. cdn.elastic-elastic-elastic.org|2021-10-01T00:00:00.000Z|756 |5572.734126984127
  13. elastic-elastic-elastic.org |2021-08-01T00:00:00.000Z|54 |4193.055555555556
  14. elastic-elastic-elastic.org |2021-09-01T00:00:00.000Z|228 |4558.456140350877
  15. elastic-elastic-elastic.org |2021-10-01T00:00:00.000Z|270 |3762.6185185185186
  16. www.elastic.co |2021-08-01T00:00:00.000Z|821 |5346.6077953714985
  17. www.elastic.co |2021-09-01T00:00:00.000Z|2340 |5128.597008547009
  18. www.elastic.co |2021-10-01T00:00:00.000Z|1618 |4981.702101359703

2.5.4 聚合分页

在实际使用场景中,如果分桶过多,可能会出现消耗过多资源、查询响应超时、经典的桶太多报错(too many buckets)等情况。幸运的是,我们可以通过聚合分页,避免上面的查询问题,而通过Elasticsearch SQL的group by 和 fetch_size结合使用,聚合分页将变得十分简洁。

  1. # 第1次查询,指定fetch_size=10
  2. POST _sql?format=json
  3. {
  4. "query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response",
  5. "fetch_size": 10
  6. }
  7. {
  8. "columns" : [
  9. {
  10. "name" : "host",
  11. "type" : "text"
  12. },
  13. {
  14. "name" : "response",
  15. "type" : "text"
  16. },
  17. {
  18. "name" : "count_docs",
  19. "type" : "long"
  20. },
  21. {
  22. "name" : "avg_bytes",
  23. "type" : "double"
  24. }
  25. ],
  26. "rows" : [
  27. [
  28. "artifacts.elastic.co",
  29. "200",
  30. 5950,
  31. 6449.373109243697
  32. ],
  33. ...
  34. ],
  35. "cursor" : "g+azAwFaAWMBCXRlc3RfbG9nc8YBAQEJY29tcG9zaXRlB2dyb3VwYnkBA2F2Zwg5NDVmMDlmMQAA/wEFYnl0ZXMAAAD/AAD/AgAIYzdhOThkYTABDGhvc3Qua2V5d29yZAAAAQAAAAg1NjViYTZhNwEQcmVzcG9uc2Uua2V5d29yZAAAAQAACgEKAghjN2E5OGRhMAAOd3d3LmVsYXN0aWMuY28INTY1YmE2YTcAAzIwMAACAQAAAAAA/////w8AAAAAAAAAAAAAAAABWgMAAgIAAAAAAAAAAAoA/////w8EAWsIYzdhOThkYTAAAAFrCDU2NWJhNmE3AAABawg1NjViYTZhNwEAAW0IOTQ1ZjA5ZjEFdmFsdWUAAQZET1VCTEUBDwA="
  36. }
  37. # 第2次查询(也即最后1次查询)
  38. POST _sql?format=json
  39. {
  40. "cursor" : "g+azAwFaAWMBCXRlc3RfbG9nc8YBAQEJY29tcG9zaXRlB2dyb3VwYnkBA2F2Zwg5NDVmMDlmMQAA/wEFYnl0ZXMAAAD/AAD/AgAIYzdhOThkYTABDGhvc3Qua2V5d29yZAAAAQAAAAg1NjViYTZhNwEQcmVzcG9uc2Uua2V5d29yZAAAAQAACgEKAghjN2E5OGRhMAAOd3d3LmVsYXN0aWMuY28INTY1YmE2YTcAAzIwMAACAQAAAAAA/////w8AAAAAAAAAAAAAAAABWgMAAgIAAAAAAAAAAAoA/////w8EAWsIYzdhOThkYTAAAAFrCDU2NWJhNmE3AAABawg1NjViYTZhNwEAAW0IOTQ1ZjA5ZjEFdmFsdWUAAQZET1VCTEUBDwA="
  41. }

2.5.4 管道聚合

实际使用过程中,我们经常需要对聚合结果,再次过滤,而通过管道聚合可以轻松实现。在Elasticsearch SQL中,提供了having 语法,实现相同的管道聚合功能。

例如:我们先根据host,response分类,然后计算出每个分类的请求次数,还需要过滤count_docs <= 1000的分类。

  1. POST _sql?format=txt
  2. {
  3. "query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response HAVING count_docs > 1000"
  4. }
  5. host | response | count_docs | avg_bytes
  6. -------------------------------+---------------+---------------+-----------------
  7. artifacts.elastic.co |200 |5950 |6449.373109243697
  8. cdn.elastic-elastic-elastic.org|200 |2082 |5771.631123919308
  9. www.elastic.co |200 |4385 |5315.617331812999

2.6 常见错误

2.6.1 数组类型字段无法展示

  1. # tags字段值是一个字符串类型
  2. GET test_logs/_search
  3. {
  4. "size": 1,
  5. "_source": ["tags", "url"]
  6. }
  7. {
  8. "took" : 3,
  9. "timed_out" : false,
  10. "_shards" : {
  11. "total" : 1,
  12. "successful" : 1,
  13. "skipped" : 0,
  14. "failed" : 0
  15. },
  16. "hits" : {
  17. "total" : {
  18. "value" : 10000,
  19. "relation" : "gte"
  20. },
  21. "max_score" : 1.0,
  22. "hits" : [
  23. {
  24. "_index" : "kibana_sample_data_logs",
  25. "_type" : "_doc",
  26. "_id" : "YcHfn3sBLoQsaVEnrdd5",
  27. "_score" : 1.0,
  28. "_source" : {
  29. "url" : "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb_1",
  30. "tags" : [
  31. "success",
  32. "info"
  33. ]
  34. }
  35. }
  36. ]
  37. }
  38. }
  39. # 无法select tags字段
  40. POST _sql?format=txt
  41. {
  42. "query":"SELECT tags,url FROM test_log"
  43. }
  44. {
  45. "error" : {
  46. "root_cause" : [
  47. {
  48. "type" : "verification_exception",
  49. "reason" : "Found 1 problem\nline 1:22: Unknown index [test_log]"
  50. }
  51. ],
  52. "type" : "verification_exception",
  53. "reason" : "Found 1 problem\nline 1:22: Unknown index [test_log]"
  54. },
  55. "status" : 400
  56. }

2.6.2

三、SQL语言

3.1 解析过程

内容:初步描述语法树的生成流程与源码位置

3.2 组成部分

  1. SELECT [TOP [ count ] ] select_expr [, ...]
  2. [ FROM table_name ]
  3. [ WHERE condition ]
  4. [ GROUP BY grouping_element [, ...] ]
  5. [ HAVING condition]
  6. [ ORDER BY expression [ ASC | DESC ] [, ...] ]
  7. [ LIMIT [ count ] ]
  8. [ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]

上面是Elasticsearch SQL查询语句的语法规范,接下来将展示一些常见的组成部分,更加详细的信息可以查看官方文档。

3.2.1 查询语句

查询语句 语句描述 示例
DESC 描述具体索引字段列表 DESC test_logs
SHOW 返回集群中索引列表,函数列表,索引字段列表 SHOW TABLES
SHOW FUNCTIONS
SHOW COLUMNS FROM test_logs
SELECT 指定查询返回的列 SELECT url FROM test_logs
FROM 指定查询索引 SELECT url FROM test_logs
WHERE 指定查询的过滤条件 SELECT url FROM test_logs WHERE response = 200
GROUP BY 指定聚合 SELECT host, response, count(*) AS count_docs FROM test_logs GROUP BY host
ORDER BY 指定排序 SELECT SCORE(), url FROM test_logs WHERE MATCH(url, ‘filebeat’) ORDER BY SCORE() DESC
LIMIT 指定返回数量 SELECT SCORE(), url FROM test_logs WHERE MATCH(url, ‘filebeat’) ORDER BY SCORE() DESC LIMIT 10
HAVING 指定管道聚合 SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response HAVING count_docs > 1000”

注意,上面的查询语句组成部分有序,如果组成顺序不正确,查询语句解析阶段将会报错。

3.3 支持函数

函数类型 函数名称 函数描述
数学计算 ABS,LOG,RANDOM,SIN,EXP, … 一些数值计算函数
全文搜索 MATCH, QUERY, SCOER MATCH函数将转化为 multi_match query
QUERY函数将转化为 query_string
SCORE函数将返回 文档的相关性评分
直方图聚合 HISTOGRAM 直方图聚合,一般用于日期分组
聚合度量函数 COUNT, COUNT(ALL), COUNT(DISTINCT) 统计数量
AVG, MIN, MAX, SUM 数值计算:平均值、最值、求和
FIRST,LAST 第一个或者最后一个值
PERCENTILE 百分位计算
类型转化 CAST 字段类型转化,例如CAST(‘123’ AS INT)

四、局限与展望

4.1 局限

内容:(1)只能查询,无法添加、更新、删除;(2)查询的局限(注意部分是DSL也存在的问题)

简述 详细描述 示例
不支持join查询 可以使用nested 类型字段
(实际)不支持子查询

4.2 展望

内容:可能会有哪些改进,对比其他引擎的sql语言,例如:Solr SQL, Flink SQL等

五、参考

官方文档SQL Elasticsearch SQL 源码 Elasticsearch SQL 社区问题

中文社区: 玩转 Elasticsearch 的 SQL 功能

官方博客: Elasticsearch SQL 简介(附练习示例)— 第 1 部分

官方博客: Elasticsearch SQL 简介(附示例示例)- 第 2 部分

中国社区官方博客: Elasticsearch:Elasticsearch SQL介绍及实例 (一)

中国社区官方博客: Elasticsearch:Elasticsearch SQL介绍及实例(二)