一、初步了解
1.1 Elasticsearch SQL是什么?
2018年6月,Elasticsearch 6.3 版本 X-Pack插件中引入新的SQL查询功能,虽然一直到现在,有经验的专业人员仍然更加喜欢使用Elasticsearch DSL 来进行搜索,但是毫无疑问,Elasticsearch SQL 的查询语法对于新用户更加友好,毕竟大部分程序员并不想从头学习一门新的语言。下面是两条等价的查询语句:
# Elasticsearch SQL 查询语句POST /_sql?format=txt{"query": "SELECT AVG(DistanceKilometers) AS AvgDistance, OriginCountry FROM kibana_sample_data_flights WHERE DestWeather='Sunny' AND AvgTicketPrice>=300 GROUP BY OriginCountry ORDER BY OriginCountry ASC LIMIT 5"}# Elasticsearch DSL 查询语句GET kibana_sample_data_flights/_search{"size" : 0,"query" : {"bool" : {"must": [{"term" : {"DestWeather" : {"value" : "Sunny"}}},{"range" : {"AvgTicketPrice" : {"gte": 300}}}]}},"aggregations" : {"groupby" : {"composite" : {"size": 5,"sources" : [{"a1" : {"terms" : {"field" : "OriginCountry","order" : "asc"}}}]},"aggregations" : {"a2" : {"avg" : {"field" : "DistanceKilometers"}}}}}}
1.2 Elasticsearch SQL不是什么?
Elasticsearch SQL提供了一个符合ANSI SQL规范的只读接口,即通过Elasticsearch SQL, 我们只能进行搜索,无法实现创建、更新、删除Elasticsearch的索引或者索引中的文档。
Elasticsearch SQL 的查询功能,相比较于其他关系型数据库的SQL查询,也有许多独特之处,例如: Elasticsearch SQL 不支持 join 查询语法,但却实现了更多的聚合分析函数。更多不同之处,下文将会详细介绍。
1.3 环境说明
本文测试环境为 Elasticsearch Cloud 免费试用的Elasticsearch集群,其中Elasticsearch 和 Kibana 版本是7.14.0,文中查询的索引test_logs即是Kibana示例索引kibana_sample_data_logs。
二、基本使用
2.1 概念对应
因为Elasticsearch SQL在设计之初就遵循着最小惊讶原则(principle of least surprise),所以我们可以在Elasticsearch SQL中查找到许多熟悉的SQL概念。
| SQL中概念 | Elasticsearch对应的概念 |
|---|---|
| database | cluster instance |
| table | index |
| schema | mapping |
| row | document |
| column | field |
2.2 连接方式
2.2.1 SQL CLI脚本
# 进入安装目录,存在可执行脚本./bin/elasticsearch-sql-cli./bin/elasticsearch-sql-cli https://username:password@host:port# 进入sql终端,执行sql语句show tables;
2.2.2 JDBC连接
2.2.3 RESTful API调用
2.3 基础查询
下面的基础查询,默认在Kibana中操作。
2.3.1 简单示例
(1)SHOW TABLES, 获取集群的所有索引,包含别名信息
POST _sql?format=txt{"query":"SHOW TABLES"}name | type | kind-----------------------+---------------+---------------kibana_sample_data_logs|TABLE |INDEXtest_logs |VIEW |ALIAS
(2)SHOW COLUMNS IN test_logs,查看索引 test_logs的字段信息(语句DESC test_logs作用相同)
POST _sql?format=txt{"query":"SHOW COLUMNS IN test_logs"}POST _sql?format=txt{"query":"DESC test_logs"}column | type | mapping------------------+---------------+---------------@timestamp |TIMESTAMP |datetimeagent |VARCHAR |textagent.keyword |VARCHAR |keywordbytes |BIGINT |longclientip |VARCHAR |ipevent |STRUCT |object
(3)SHOW FUNCTIONS, 查看当前支持的函数
POST _sql?format=txt{"query":"SHOW FUNCTIONS"}name | type-----------------+---------------AVG |AGGREGATECOUNT |AGGREGATEFIRST |AGGREGATEFIRST_VALUE |AGGREGATELAST |AGGREGATE
(4)SELECT FROM,获取索引中的部分数据
POST _sql?format=txt{"query":"SELECT clientip, url, response FROM test_logs ORDER BY timestamp DESC LIMIT 3"}clientip | url | response---------------+-----------------------------------------------------------------------------------------+---------------118.151.35.151 |https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb |20099.76.103.49 |https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb |20041.77.243.255 |https://elastic-elastic-elastic.org/people/type:astronauts/name:michael-s-hopkins/profile|200
上面通过4条语句,简单介绍了Elasticsearch SQL的基本查询,接下来将介绍更多的实用技巧。
2.3.2 全文搜索
Elasticsearch的全文检索功能,在Ealsticsearch SQL中可以通过两个语法实现: match, query。
(1)WHERE MATCH,实现全文搜索,效果同mulit_match。
# 查询 url 字段中包含 metricbeat的文档, score()函数返回相关性评分POST _sql?format=txt{"query":"SELECT SCORE(), clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 3"}SCORE() | clientip | url | response---------------+---------------+----------------------------------------------------------------------------------+---------------2.7142088 |74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |2002.7142088 |232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|2002.7142088 |9.16.178.89 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|404
(2)WHERE QUERY,实现全文搜索,效果同query_string。
POST _sql?format=txt{"query":"SELECT SCORE(), clientip, url, response FROM test_logs WHERE QUERY('url:metricbeat') ORDER BY timestamp DESC LIMIT 3"}SCORE() | clientip | url | response---------------+---------------+----------------------------------------------------------------------------------+---------------2.7142088 |74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |2002.7142088 |232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|2002.7142088 |9.16.178.89 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|404
2.3.3 深层分页
实际使用过程中,对于全文搜索后,得到的大量返回文档,经常需要分页展示。在Elasticsearch DSL中我们一般会使用 from + size / scroll / search_after 等方式来实现,Elasticsearch SQL中存在 fetch_size ,指定当前查询返回的文档数量,用于实现分页功能。
Elasticsearch SQL 中fetch_size,可以类比为Elasticsearch DSL中的 scroll(游标)
(1)第1次查询,指定 fetch_size = 10,表示获取 test_logs中的一个文档,如果返回值中有 cursor,表示还有更多满足条件的文档;
POST _sql?format=json{"query":"SELECT clientip, host, response FROM test_logs ORDER BY timestamp DESC","fetch_size": 10}{"columns" : [{"name" : "clientip","type" : "ip"},{"name" : "host","type" : "text"},{"name" : "response","type" : "text"}],"rows" : [["118.151.35.151","artifacts.elastic.co","200"]...],"cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRm5ScVMzSjBRbEZ0VkRSNWJpMVBZMEYwUmpZME9GRUFBQUFBQUJVU2hCWkJaMlJ5Y0hOTU1WTnhRMUppUlZwVWFtNTViM2Qz/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="}
(2)第2次查询,只需要添加前一次的 cursor
POST /_sql?format=json{"cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRm5ScVMzSjBRbEZ0VkRSNWJpMVBZMEYwUmpZME9GRUFBQUFBQUJVU2hCWkJaMlJ5Y0hOTU1WTnhRMUppUlZwVWFtNTViM2Qz/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="}{"rows" : [["41.77.243.255","elastic-elastic-elastic.org","200"],...],"cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRmpoSlRHMTFURUZVVVRoVFgxZ3dRMjV2TWxwcE9HY0FBQUFBQUJVZ2loWnlla3hUY1ZaYU0xTnlWeTF6UVVkRlVYQkNUWFJC/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="}
(3)最后一次查询,返回结果中将不存在 cursor ,表示已经到了最后一页
POST /_sql?format=json{"cursor": "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRmpoSlRHMTFURUZVVVRoVFgxZ3dRMjV2TWxwcE9HY0FBQUFBQUJVZ2loWnlla3hUY1ZaYU0xTnlWeTF6UVVkRlVYQkNUWFJC/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="}{"columns" : [{"name" : "clientip","type" : "ip"},{"name" : "host","type" : "text"},{"name" : "response","type" : "text"}],"rows" : [["118.151.35.151","artifacts.elastic.co","200"]]}
需要注意的是,
a. 如果查询指定返回结构是 txt / csv 等,将无法获取到 carsor;
b. 如果查询语句中包含有 LIMIT,fetch_size将无法工作,即无法分页。
2.4 SQL与DSL
2.4.1 SQL转换为DSL
Elasticsearch SQL提供了 translate接口,可以将SQL语法转换为对应的DSL语法。
通过查看DSL语法,可以更好的学习lasticsearch DSL;也可以通过转化结果,反过来判断SQL查询的正确性或者优化SQL查询。
POST _sql/translate{"query":"SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 10"}{"size" : 10,"query" : {"match" : {"url" : {"query" : "metricbeat","operator" : "OR","prefix_length" : 0,"max_expansions" : 50,"fuzzy_transpositions" : true,"lenient" : false,"zero_terms_query" : "NONE","auto_generate_synonyms_phrase_query" : true,"boost" : 1.0}}},"_source" : false,"fields" : [{"field" : "clientip"},{"field" : "url"},{"field" : "response"}],"sort" : [{"timestamp" : {"order" : "desc","missing" : "_first","unmapped_type" : "date"}}]}
2.4.2 SQL混合DSL语句
如果遇到过滤条件逻辑关系非常复杂,我们可以在SQL中 添加Elasticsearch DSL过滤条件。
POST /_sql?format=txt{"query": "SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 3","filter": {"range": {"response": {"gte": 200,"lt": 300}}}}clientip | url | response---------------+----------------------------------------------------------------------------------+---------------74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200148.192.209.125|https://www.elastic.co/downloads/beats/metricbeat |200
注意此时filter中的过滤条件,与query中的WHERR 是逻辑AND关系
POST /_sql/translate{"query": "SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 10","filter": {"range": {"response": {"gte": 200,"lt": 300}}}}{"size" : 3,"query" : {"bool" : {"must" : [{"match" : {"url" : {"query" : "metricbeat","operator" : "OR","prefix_length" : 0,"max_expansions" : 50,"fuzzy_transpositions" : true,"lenient" : false,"zero_terms_query" : "NONE","auto_generate_synonyms_phrase_query" : true,"boost" : 1.0}}}],"filter" : [{"range" : {"response" : {"from" : 200,"to" : 300,"include_lower" : true,"include_upper" : false,"boost" : 1.0}}}],"adjust_pure_negative" : true,"boost" : 1.0}},"_source" : false,"fields" : [{"field" : "clientip"},{"field" : "url"},{"field" : "response"}],"sort" : [{"timestamp" : {"order" : "desc","missing" : "_first","unmapped_type" : "date"}}]}
2.4.3 SQL与DSL的区别
Elasticsearch SQL
2.5 聚合分析
Elasticsearch的聚合分析功能非常强大,在实际生产环境中,通过聚合分析再结合图表展示,我们可以实现许多实用功能,例如:查看网站访问IP分布情况,某些服务的P99延迟等等。
2.5.1 复合聚合
Elasticsearch SQL中提供了group by 语法用于桶聚合分析,对于 group by 语法,Elasticsearch内部将转换为复合聚合(composite aggregation)。
例如:下面的查询根据 返回状态码分类,统计对应的http 请求的body字节大小平均值 和请求次数。
POST _sql/translate{"query":"SELECT response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY response"}{"size" : 0,"_source" : false,"aggregations" : {"groupby" : {"composite" : {"size" : 1000,"sources" : [{"565ba6a7" : {"terms" : {"field" : "response.keyword","missing_bucket" : true,"order" : "asc"}}}]},"aggregations" : {"945f09f1" : {"avg" : {"field" : "bytes"}}}}}}POST _sql?format=txt{"query":"SELECT response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY response"}response | count_docs | avg_bytes---------------+---------------+-----------------200 |12832 |5897.852711970075404 |801 |5049.242197253433503 |441 |0.0
2.5.2 多层聚合
既然group by会被转化为复合聚合,我们可以通过 group by 多个字段,实现多层聚合。
例如:我们对测试数据(访问日志),先通过访问域名(host)分类,然后再根据返回状态码分类,统计对应的http 请求的body字节大小平均值 和请求次数。
POST _sql?format=txt{"query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response"}host | response | count_docs | avg_bytes-------------------------------+---------------+---------------+-----------------artifacts.elastic.co |200 |5950 |6449.373109243697artifacts.elastic.co |404 |330 |6111.981818181818artifacts.elastic.co |503 |208 |0.0cdn.elastic-elastic-elastic.org|200 |2082 |5771.631123919308cdn.elastic-elastic-elastic.org|404 |106 |5527.292452830188cdn.elastic-elastic-elastic.org|503 |67 |0.0elastic-elastic-elastic.org |200 |415 |4775.802409638554elastic-elastic-elastic.org |404 |125 |2397.616elastic-elastic-elastic.org |503 |12 |0.0www.elastic.co |200 |4385 |5315.617331812999www.elastic.co |404 |240 |4757.891666666666www.elastic.co |503 |154 |0.0
2.5.3 直方图
在我们的使用场景中,经常需要对日期进行直方图聚合,Elasticsearch SQL中存在histogram函数,可以实现。
POST _sql?format=txt{"query":"SELECT host, HISTOGRAM(timestamp, INTERVAL 1 MONTH) as h, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, h"}host | h | count_docs | avg_bytes-------------------------------+------------------------+---------------+------------------artifacts.elastic.co |2021-08-01T00:00:00.000Z|1077 |6146.069637883008artifacts.elastic.co |2021-09-01T00:00:00.000Z|3150 |6234.906666666667artifacts.elastic.co |2021-10-01T00:00:00.000Z|2261 |6250.088898717382cdn.elastic-elastic-elastic.org|2021-08-01T00:00:00.000Z|374 |5678.467914438503cdn.elastic-elastic-elastic.org|2021-09-01T00:00:00.000Z|1125 |5569.506666666667cdn.elastic-elastic-elastic.org|2021-10-01T00:00:00.000Z|756 |5572.734126984127elastic-elastic-elastic.org |2021-08-01T00:00:00.000Z|54 |4193.055555555556elastic-elastic-elastic.org |2021-09-01T00:00:00.000Z|228 |4558.456140350877elastic-elastic-elastic.org |2021-10-01T00:00:00.000Z|270 |3762.6185185185186www.elastic.co |2021-08-01T00:00:00.000Z|821 |5346.6077953714985www.elastic.co |2021-09-01T00:00:00.000Z|2340 |5128.597008547009www.elastic.co |2021-10-01T00:00:00.000Z|1618 |4981.702101359703
2.5.4 聚合分页
在实际使用场景中,如果分桶过多,可能会出现消耗过多资源、查询响应超时、经典的桶太多报错(too many buckets)等情况。幸运的是,我们可以通过聚合分页,避免上面的查询问题,而通过Elasticsearch SQL的group by 和 fetch_size结合使用,聚合分页将变得十分简洁。
# 第1次查询,指定fetch_size=10POST _sql?format=json{"query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response","fetch_size": 10}{"columns" : [{"name" : "host","type" : "text"},{"name" : "response","type" : "text"},{"name" : "count_docs","type" : "long"},{"name" : "avg_bytes","type" : "double"}],"rows" : [["artifacts.elastic.co","200",5950,6449.373109243697],...],"cursor" : "g+azAwFaAWMBCXRlc3RfbG9nc8YBAQEJY29tcG9zaXRlB2dyb3VwYnkBA2F2Zwg5NDVmMDlmMQAA/wEFYnl0ZXMAAAD/AAD/AgAIYzdhOThkYTABDGhvc3Qua2V5d29yZAAAAQAAAAg1NjViYTZhNwEQcmVzcG9uc2Uua2V5d29yZAAAAQAACgEKAghjN2E5OGRhMAAOd3d3LmVsYXN0aWMuY28INTY1YmE2YTcAAzIwMAACAQAAAAAA/////w8AAAAAAAAAAAAAAAABWgMAAgIAAAAAAAAAAAoA/////w8EAWsIYzdhOThkYTAAAAFrCDU2NWJhNmE3AAABawg1NjViYTZhNwEAAW0IOTQ1ZjA5ZjEFdmFsdWUAAQZET1VCTEUBDwA="}# 第2次查询(也即最后1次查询)POST _sql?format=json{"cursor" : "g+azAwFaAWMBCXRlc3RfbG9nc8YBAQEJY29tcG9zaXRlB2dyb3VwYnkBA2F2Zwg5NDVmMDlmMQAA/wEFYnl0ZXMAAAD/AAD/AgAIYzdhOThkYTABDGhvc3Qua2V5d29yZAAAAQAAAAg1NjViYTZhNwEQcmVzcG9uc2Uua2V5d29yZAAAAQAACgEKAghjN2E5OGRhMAAOd3d3LmVsYXN0aWMuY28INTY1YmE2YTcAAzIwMAACAQAAAAAA/////w8AAAAAAAAAAAAAAAABWgMAAgIAAAAAAAAAAAoA/////w8EAWsIYzdhOThkYTAAAAFrCDU2NWJhNmE3AAABawg1NjViYTZhNwEAAW0IOTQ1ZjA5ZjEFdmFsdWUAAQZET1VCTEUBDwA="}
2.5.4 管道聚合
实际使用过程中,我们经常需要对聚合结果,再次过滤,而通过管道聚合可以轻松实现。在Elasticsearch SQL中,提供了having 语法,实现相同的管道聚合功能。
例如:我们先根据host,response分类,然后计算出每个分类的请求次数,还需要过滤count_docs <= 1000的分类。
POST _sql?format=txt{"query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response HAVING count_docs > 1000"}host | response | count_docs | avg_bytes-------------------------------+---------------+---------------+-----------------artifacts.elastic.co |200 |5950 |6449.373109243697cdn.elastic-elastic-elastic.org|200 |2082 |5771.631123919308www.elastic.co |200 |4385 |5315.617331812999
2.6 常见错误
2.6.1 数组类型字段无法展示
# tags字段值是一个字符串类型GET test_logs/_search{"size": 1,"_source": ["tags", "url"]}{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : 1.0,"hits" : [{"_index" : "kibana_sample_data_logs","_type" : "_doc","_id" : "YcHfn3sBLoQsaVEnrdd5","_score" : 1.0,"_source" : {"url" : "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb_1","tags" : ["success","info"]}}]}}# 无法select tags字段POST _sql?format=txt{"query":"SELECT tags,url FROM test_log"}{"error" : {"root_cause" : [{"type" : "verification_exception","reason" : "Found 1 problem\nline 1:22: Unknown index [test_log]"}],"type" : "verification_exception","reason" : "Found 1 problem\nline 1:22: Unknown index [test_log]"},"status" : 400}
2.6.2
三、SQL语言
3.1 解析过程
3.2 组成部分
SELECT [TOP [ count ] ] select_expr [, ...][ FROM table_name ][ WHERE condition ][ GROUP BY grouping_element [, ...] ][ HAVING condition][ ORDER BY expression [ ASC | DESC ] [, ...] ][ LIMIT [ count ] ][ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
上面是Elasticsearch SQL查询语句的语法规范,接下来将展示一些常见的组成部分,更加详细的信息可以查看官方文档。
3.2.1 查询语句
| 查询语句 | 语句描述 | 示例 |
|---|---|---|
| DESC | 描述具体索引字段列表 | DESC test_logs |
| SHOW | 返回集群中索引列表,函数列表,索引字段列表 | SHOW TABLES SHOW FUNCTIONS SHOW COLUMNS FROM test_logs |
| SELECT | 指定查询返回的列 | SELECT url FROM test_logs |
| FROM | 指定查询索引 | SELECT url FROM test_logs |
| WHERE | 指定查询的过滤条件 | SELECT url FROM test_logs WHERE response = 200 |
| GROUP BY | 指定聚合 | SELECT host, response, count(*) AS count_docs FROM test_logs GROUP BY host |
| ORDER BY | 指定排序 | SELECT SCORE(), url FROM test_logs WHERE MATCH(url, ‘filebeat’) ORDER BY SCORE() DESC |
| LIMIT | 指定返回数量 | SELECT SCORE(), url FROM test_logs WHERE MATCH(url, ‘filebeat’) ORDER BY SCORE() DESC LIMIT 10 |
| HAVING | 指定管道聚合 | SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response HAVING count_docs > 1000” |
| … |
注意,上面的查询语句组成部分有序,如果组成顺序不正确,查询语句解析阶段将会报错。
3.3 支持函数
| 函数类型 | 函数名称 | 函数描述 |
|---|---|---|
| 数学计算 | ABS,LOG,RANDOM,SIN,EXP, … | 一些数值计算函数 |
| 全文搜索 | MATCH, QUERY, SCOER | MATCH函数将转化为 multi_match query QUERY函数将转化为 query_string SCORE函数将返回 文档的相关性评分 |
| 直方图聚合 | HISTOGRAM | 直方图聚合,一般用于日期分组 |
| 聚合度量函数 | COUNT, COUNT(ALL), COUNT(DISTINCT) | 统计数量 |
| AVG, MIN, MAX, SUM | 数值计算:平均值、最值、求和 | |
| FIRST,LAST | 第一个或者最后一个值 | |
| PERCENTILE | 百分位计算 | |
| 类型转化 | CAST | 字段类型转化,例如CAST(‘123’ AS INT) |
| … |
四、局限与展望
4.1 局限
内容:(1)只能查询,无法添加、更新、删除;(2)查询的局限(注意部分是DSL也存在的问题)
| 简述 | 详细描述 | 示例 |
|---|---|---|
| 不支持join查询 | 可以使用nested 类型字段 | |
| (实际)不支持子查询 |
4.2 展望
内容:可能会有哪些改进,对比其他引擎的sql语言,例如:Solr SQL, Flink SQL等
五、参考
官方文档SQL Elasticsearch SQL 源码 Elasticsearch SQL 社区问题
中文社区: 玩转 Elasticsearch 的 SQL 功能
官方博客: Elasticsearch SQL 简介(附练习示例)— 第 1 部分
官方博客: Elasticsearch SQL 简介(附示例示例)- 第 2 部分
