一、初步了解
1.1 Elasticsearch SQL是什么?
2018年6月,Elasticsearch 6.3 版本 X-Pack插件中引入新的SQL查询功能,虽然一直到现在,有经验的专业人员仍然更加喜欢使用Elasticsearch DSL 来进行搜索,但是毫无疑问,Elasticsearch SQL 的查询语法对于新用户更加友好,毕竟大部分程序员并不想从头学习一门新的语言。下面是两条等价的查询语句:
# Elasticsearch SQL 查询语句
POST /_sql?format=txt
{
"query": "SELECT AVG(DistanceKilometers) AS AvgDistance, OriginCountry FROM kibana_sample_data_flights WHERE DestWeather='Sunny' AND AvgTicketPrice>=300 GROUP BY OriginCountry ORDER BY OriginCountry ASC LIMIT 5"
}
# Elasticsearch DSL 查询语句
GET kibana_sample_data_flights/_search
{
"size" : 0,
"query" : {
"bool" : {
"must": [
{
"term" : {
"DestWeather" : {
"value" : "Sunny"
}
}
},
{
"range" : {
"AvgTicketPrice" : {
"gte": 300
}
}
}
]
}
},
"aggregations" : {
"groupby" : {
"composite" : {
"size": 5,
"sources" : [
{
"a1" : {
"terms" : {
"field" : "OriginCountry",
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"a2" : {
"avg" : {
"field" : "DistanceKilometers"
}
}
}
}
}
}
1.2 Elasticsearch SQL不是什么?
Elasticsearch SQL提供了一个符合ANSI SQL规范的只读接口,即通过Elasticsearch SQL, 我们只能进行搜索,无法实现创建、更新、删除Elasticsearch的索引或者索引中的文档。
Elasticsearch SQL 的查询功能,相比较于其他关系型数据库的SQL查询,也有许多独特之处,例如: Elasticsearch SQL 不支持 join 查询语法,但却实现了更多的聚合分析函数。更多不同之处,下文将会详细介绍。
1.3 环境说明
本文测试环境为 Elasticsearch Cloud 免费试用的Elasticsearch集群,其中Elasticsearch 和 Kibana 版本是7.14.0,文中查询的索引test_logs即是Kibana示例索引kibana_sample_data_logs。
二、基本使用
2.1 概念对应
因为Elasticsearch SQL在设计之初就遵循着最小惊讶原则(principle of least surprise),所以我们可以在Elasticsearch SQL中查找到许多熟悉的SQL概念。
SQL中概念 | Elasticsearch对应的概念 |
---|---|
database | cluster instance |
table | index |
schema | mapping |
row | document |
column | field |
2.2 连接方式
2.2.1 SQL CLI脚本
# 进入安装目录,存在可执行脚本./bin/elasticsearch-sql-cli
./bin/elasticsearch-sql-cli https://username:password@host:port
# 进入sql终端,执行sql语句
show tables;
2.2.2 JDBC连接
2.2.3 RESTful API调用
2.3 基础查询
下面的基础查询,默认在Kibana中操作。
2.3.1 简单示例
(1)SHOW TABLES, 获取集群的所有索引,包含别名信息
POST _sql?format=txt
{
"query":"SHOW TABLES"
}
name | type | kind
-----------------------+---------------+---------------
kibana_sample_data_logs|TABLE |INDEX
test_logs |VIEW |ALIAS
(2)SHOW COLUMNS IN test_logs,查看索引 test_logs的字段信息(语句DESC test_logs作用相同)
POST _sql?format=txt
{
"query":"SHOW COLUMNS IN test_logs"
}
POST _sql?format=txt
{
"query":"DESC test_logs"
}
column | type | mapping
------------------+---------------+---------------
@timestamp |TIMESTAMP |datetime
agent |VARCHAR |text
agent.keyword |VARCHAR |keyword
bytes |BIGINT |long
clientip |VARCHAR |ip
event |STRUCT |object
(3)SHOW FUNCTIONS, 查看当前支持的函数
POST _sql?format=txt
{
"query":"SHOW FUNCTIONS"
}
name | type
-----------------+---------------
AVG |AGGREGATE
COUNT |AGGREGATE
FIRST |AGGREGATE
FIRST_VALUE |AGGREGATE
LAST |AGGREGATE
(4)SELECT FROM,获取索引中的部分数据
POST _sql?format=txt
{
"query":"SELECT clientip, url, response FROM test_logs ORDER BY timestamp DESC LIMIT 3"
}
clientip | url | response
---------------+-----------------------------------------------------------------------------------------+---------------
118.151.35.151 |https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb |200
99.76.103.49 |https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb |200
41.77.243.255 |https://elastic-elastic-elastic.org/people/type:astronauts/name:michael-s-hopkins/profile|200
上面通过4条语句,简单介绍了Elasticsearch SQL的基本查询,接下来将介绍更多的实用技巧。
2.3.2 全文搜索
Elasticsearch的全文检索功能,在Ealsticsearch SQL中可以通过两个语法实现: match, query。
(1)WHERE MATCH,实现全文搜索,效果同mulit_match。
# 查询 url 字段中包含 metricbeat的文档, score()函数返回相关性评分
POST _sql?format=txt
{
"query":"SELECT SCORE(), clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 3"
}
SCORE() | clientip | url | response
---------------+---------------+----------------------------------------------------------------------------------+---------------
2.7142088 |74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200
2.7142088 |232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200
2.7142088 |9.16.178.89 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|404
(2)WHERE QUERY,实现全文搜索,效果同query_string。
POST _sql?format=txt
{
"query":"SELECT SCORE(), clientip, url, response FROM test_logs WHERE QUERY('url:metricbeat') ORDER BY timestamp DESC LIMIT 3"
}
SCORE() | clientip | url | response
---------------+---------------+----------------------------------------------------------------------------------+---------------
2.7142088 |74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200
2.7142088 |232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200
2.7142088 |9.16.178.89 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|404
2.3.3 深层分页
实际使用过程中,对于全文搜索后,得到的大量返回文档,经常需要分页展示。在Elasticsearch DSL中我们一般会使用 from + size / scroll / search_after 等方式来实现,Elasticsearch SQL中存在 fetch_size ,指定当前查询返回的文档数量,用于实现分页功能。
Elasticsearch SQL 中fetch_size,可以类比为Elasticsearch DSL中的 scroll(游标)
(1)第1次查询,指定 fetch_size = 10,表示获取 test_logs中的一个文档,如果返回值中有 cursor,表示还有更多满足条件的文档;
POST _sql?format=json
{
"query":"SELECT clientip, host, response FROM test_logs ORDER BY timestamp DESC",
"fetch_size": 10
}
{
"columns" : [
{
"name" : "clientip",
"type" : "ip"
},
{
"name" : "host",
"type" : "text"
},
{
"name" : "response",
"type" : "text"
}
],
"rows" : [
[
"118.151.35.151",
"artifacts.elastic.co",
"200"
]
...
],
"cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRm5ScVMzSjBRbEZ0VkRSNWJpMVBZMEYwUmpZME9GRUFBQUFBQUJVU2hCWkJaMlJ5Y0hOTU1WTnhRMUppUlZwVWFtNTViM2Qz/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
}
(2)第2次查询,只需要添加前一次的 cursor
POST /_sql?format=json
{
"cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRm5ScVMzSjBRbEZ0VkRSNWJpMVBZMEYwUmpZME9GRUFBQUFBQUJVU2hCWkJaMlJ5Y0hOTU1WTnhRMUppUlZwVWFtNTViM2Qz/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
}
{
"rows" : [
[
"41.77.243.255",
"elastic-elastic-elastic.org",
"200"
],
...
],
"cursor" : "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRmpoSlRHMTFURUZVVVRoVFgxZ3dRMjV2TWxwcE9HY0FBQUFBQUJVZ2loWnlla3hUY1ZaYU0xTnlWeTF6UVVkRlVYQkNUWFJC/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
}
(3)最后一次查询,返回结果中将不存在 cursor ,表示已经到了最后一页
POST /_sql?format=json
{
"cursor": "g+azAwFaAXN4RkdsdVkyeDFaR1ZmWTI5dWRHVjRkRjkxZFdsa0RYRjFaWEo1UVc1a1JtVjBZMmdCRmpoSlRHMTFURUZVVVRoVFgxZ3dRMjV2TWxwcE9HY0FBQUFBQUJVZ2loWnlla3hUY1ZaYU0xTnlWeTF6UVVkRlVYQkNUWFJC/////w8DAWYIY2xpZW50aXABAmlwAAABZgRob3N0AQR0ZXh0AAABZghyZXNwb25zZQEEdGV4dAAAAQc="
}
{
"columns" : [
{
"name" : "clientip",
"type" : "ip"
},
{
"name" : "host",
"type" : "text"
},
{
"name" : "response",
"type" : "text"
}
],
"rows" : [
[
"118.151.35.151",
"artifacts.elastic.co",
"200"
]
]
}
需要注意的是,
a. 如果查询指定返回结构是 txt / csv 等,将无法获取到 carsor;
b. 如果查询语句中包含有 LIMIT,fetch_size将无法工作,即无法分页。
2.4 SQL与DSL
2.4.1 SQL转换为DSL
Elasticsearch SQL提供了 translate接口,可以将SQL语法转换为对应的DSL语法。
通过查看DSL语法,可以更好的学习lasticsearch DSL;也可以通过转化结果,反过来判断SQL查询的正确性或者优化SQL查询。
POST _sql/translate
{
"query":"SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 10"
}
{
"size" : 10,
"query" : {
"match" : {
"url" : {
"query" : "metricbeat",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
"_source" : false,
"fields" : [
{
"field" : "clientip"
},
{
"field" : "url"
},
{
"field" : "response"
}
],
"sort" : [
{
"timestamp" : {
"order" : "desc",
"missing" : "_first",
"unmapped_type" : "date"
}
}
]
}
2.4.2 SQL混合DSL语句
如果遇到过滤条件逻辑关系非常复杂,我们可以在SQL中 添加Elasticsearch DSL过滤条件。
POST /_sql?format=txt
{
"query": "SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 3",
"filter": {
"range": {
"response": {
"gte": 200,
"lt": 300
}
}
}
}
clientip | url | response
---------------+----------------------------------------------------------------------------------+---------------
74.184.0.64 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm |200
232.20.97.5 |https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb|200
148.192.209.125|https://www.elastic.co/downloads/beats/metricbeat |200
注意此时filter中的过滤条件,与query中的WHERR 是逻辑AND关系
POST /_sql/translate
{
"query": "SELECT clientip, url, response FROM test_logs WHERE MATCH(url, 'metricbeat') ORDER BY timestamp DESC LIMIT 10",
"filter": {
"range": {
"response": {
"gte": 200,
"lt": 300
}
}
}
}
{
"size" : 3,
"query" : {
"bool" : {
"must" : [
{
"match" : {
"url" : {
"query" : "metricbeat",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"filter" : [
{
"range" : {
"response" : {
"from" : 200,
"to" : 300,
"include_lower" : true,
"include_upper" : false,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : false,
"fields" : [
{
"field" : "clientip"
},
{
"field" : "url"
},
{
"field" : "response"
}
],
"sort" : [
{
"timestamp" : {
"order" : "desc",
"missing" : "_first",
"unmapped_type" : "date"
}
}
]
}
2.4.3 SQL与DSL的区别
Elasticsearch SQL
2.5 聚合分析
Elasticsearch的聚合分析功能非常强大,在实际生产环境中,通过聚合分析再结合图表展示,我们可以实现许多实用功能,例如:查看网站访问IP分布情况,某些服务的P99延迟等等。
2.5.1 复合聚合
Elasticsearch SQL中提供了group by 语法用于桶聚合分析,对于 group by 语法,Elasticsearch内部将转换为复合聚合(composite aggregation)。
例如:下面的查询根据 返回状态码分类,统计对应的http 请求的body字节大小平均值 和请求次数。
POST _sql/translate
{
"query":"SELECT response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY response"
}
{
"size" : 0,
"_source" : false,
"aggregations" : {
"groupby" : {
"composite" : {
"size" : 1000,
"sources" : [
{
"565ba6a7" : {
"terms" : {
"field" : "response.keyword",
"missing_bucket" : true,
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"945f09f1" : {
"avg" : {
"field" : "bytes"
}
}
}
}
}
}
POST _sql?format=txt
{
"query":"SELECT response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY response"
}
response | count_docs | avg_bytes
---------------+---------------+-----------------
200 |12832 |5897.852711970075
404 |801 |5049.242197253433
503 |441 |0.0
2.5.2 多层聚合
既然group by会被转化为复合聚合,我们可以通过 group by 多个字段,实现多层聚合。
例如:我们对测试数据(访问日志),先通过访问域名(host)分类,然后再根据返回状态码分类,统计对应的http 请求的body字节大小平均值 和请求次数。
POST _sql?format=txt
{
"query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response"
}
host | response | count_docs | avg_bytes
-------------------------------+---------------+---------------+-----------------
artifacts.elastic.co |200 |5950 |6449.373109243697
artifacts.elastic.co |404 |330 |6111.981818181818
artifacts.elastic.co |503 |208 |0.0
cdn.elastic-elastic-elastic.org|200 |2082 |5771.631123919308
cdn.elastic-elastic-elastic.org|404 |106 |5527.292452830188
cdn.elastic-elastic-elastic.org|503 |67 |0.0
elastic-elastic-elastic.org |200 |415 |4775.802409638554
elastic-elastic-elastic.org |404 |125 |2397.616
elastic-elastic-elastic.org |503 |12 |0.0
www.elastic.co |200 |4385 |5315.617331812999
www.elastic.co |404 |240 |4757.891666666666
www.elastic.co |503 |154 |0.0
2.5.3 直方图
在我们的使用场景中,经常需要对日期进行直方图聚合,Elasticsearch SQL中存在histogram函数,可以实现。
POST _sql?format=txt
{
"query":"SELECT host, HISTOGRAM(timestamp, INTERVAL 1 MONTH) as h, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, h"
}
host | h | count_docs | avg_bytes
-------------------------------+------------------------+---------------+------------------
artifacts.elastic.co |2021-08-01T00:00:00.000Z|1077 |6146.069637883008
artifacts.elastic.co |2021-09-01T00:00:00.000Z|3150 |6234.906666666667
artifacts.elastic.co |2021-10-01T00:00:00.000Z|2261 |6250.088898717382
cdn.elastic-elastic-elastic.org|2021-08-01T00:00:00.000Z|374 |5678.467914438503
cdn.elastic-elastic-elastic.org|2021-09-01T00:00:00.000Z|1125 |5569.506666666667
cdn.elastic-elastic-elastic.org|2021-10-01T00:00:00.000Z|756 |5572.734126984127
elastic-elastic-elastic.org |2021-08-01T00:00:00.000Z|54 |4193.055555555556
elastic-elastic-elastic.org |2021-09-01T00:00:00.000Z|228 |4558.456140350877
elastic-elastic-elastic.org |2021-10-01T00:00:00.000Z|270 |3762.6185185185186
www.elastic.co |2021-08-01T00:00:00.000Z|821 |5346.6077953714985
www.elastic.co |2021-09-01T00:00:00.000Z|2340 |5128.597008547009
www.elastic.co |2021-10-01T00:00:00.000Z|1618 |4981.702101359703
2.5.4 聚合分页
在实际使用场景中,如果分桶过多,可能会出现消耗过多资源、查询响应超时、经典的桶太多报错(too many buckets)等情况。幸运的是,我们可以通过聚合分页,避免上面的查询问题,而通过Elasticsearch SQL的group by 和 fetch_size结合使用,聚合分页将变得十分简洁。
# 第1次查询,指定fetch_size=10
POST _sql?format=json
{
"query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response",
"fetch_size": 10
}
{
"columns" : [
{
"name" : "host",
"type" : "text"
},
{
"name" : "response",
"type" : "text"
},
{
"name" : "count_docs",
"type" : "long"
},
{
"name" : "avg_bytes",
"type" : "double"
}
],
"rows" : [
[
"artifacts.elastic.co",
"200",
5950,
6449.373109243697
],
...
],
"cursor" : "g+azAwFaAWMBCXRlc3RfbG9nc8YBAQEJY29tcG9zaXRlB2dyb3VwYnkBA2F2Zwg5NDVmMDlmMQAA/wEFYnl0ZXMAAAD/AAD/AgAIYzdhOThkYTABDGhvc3Qua2V5d29yZAAAAQAAAAg1NjViYTZhNwEQcmVzcG9uc2Uua2V5d29yZAAAAQAACgEKAghjN2E5OGRhMAAOd3d3LmVsYXN0aWMuY28INTY1YmE2YTcAAzIwMAACAQAAAAAA/////w8AAAAAAAAAAAAAAAABWgMAAgIAAAAAAAAAAAoA/////w8EAWsIYzdhOThkYTAAAAFrCDU2NWJhNmE3AAABawg1NjViYTZhNwEAAW0IOTQ1ZjA5ZjEFdmFsdWUAAQZET1VCTEUBDwA="
}
# 第2次查询(也即最后1次查询)
POST _sql?format=json
{
"cursor" : "g+azAwFaAWMBCXRlc3RfbG9nc8YBAQEJY29tcG9zaXRlB2dyb3VwYnkBA2F2Zwg5NDVmMDlmMQAA/wEFYnl0ZXMAAAD/AAD/AgAIYzdhOThkYTABDGhvc3Qua2V5d29yZAAAAQAAAAg1NjViYTZhNwEQcmVzcG9uc2Uua2V5d29yZAAAAQAACgEKAghjN2E5OGRhMAAOd3d3LmVsYXN0aWMuY28INTY1YmE2YTcAAzIwMAACAQAAAAAA/////w8AAAAAAAAAAAAAAAABWgMAAgIAAAAAAAAAAAoA/////w8EAWsIYzdhOThkYTAAAAFrCDU2NWJhNmE3AAABawg1NjViYTZhNwEAAW0IOTQ1ZjA5ZjEFdmFsdWUAAQZET1VCTEUBDwA="
}
2.5.4 管道聚合
实际使用过程中,我们经常需要对聚合结果,再次过滤,而通过管道聚合可以轻松实现。在Elasticsearch SQL中,提供了having 语法,实现相同的管道聚合功能。
例如:我们先根据host,response分类,然后计算出每个分类的请求次数,还需要过滤count_docs <= 1000的分类。
POST _sql?format=txt
{
"query":"SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response HAVING count_docs > 1000"
}
host | response | count_docs | avg_bytes
-------------------------------+---------------+---------------+-----------------
artifacts.elastic.co |200 |5950 |6449.373109243697
cdn.elastic-elastic-elastic.org|200 |2082 |5771.631123919308
www.elastic.co |200 |4385 |5315.617331812999
2.6 常见错误
2.6.1 数组类型字段无法展示
# tags字段值是一个字符串类型
GET test_logs/_search
{
"size": 1,
"_source": ["tags", "url"]
}
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "kibana_sample_data_logs",
"_type" : "_doc",
"_id" : "YcHfn3sBLoQsaVEnrdd5",
"_score" : 1.0,
"_source" : {
"url" : "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.deb_1",
"tags" : [
"success",
"info"
]
}
}
]
}
}
# 无法select tags字段
POST _sql?format=txt
{
"query":"SELECT tags,url FROM test_log"
}
{
"error" : {
"root_cause" : [
{
"type" : "verification_exception",
"reason" : "Found 1 problem\nline 1:22: Unknown index [test_log]"
}
],
"type" : "verification_exception",
"reason" : "Found 1 problem\nline 1:22: Unknown index [test_log]"
},
"status" : 400
}
2.6.2
三、SQL语言
3.1 解析过程
3.2 组成部分
SELECT [TOP [ count ] ] select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
上面是Elasticsearch SQL查询语句的语法规范,接下来将展示一些常见的组成部分,更加详细的信息可以查看官方文档。
3.2.1 查询语句
查询语句 | 语句描述 | 示例 |
---|---|---|
DESC | 描述具体索引字段列表 | DESC test_logs |
SHOW | 返回集群中索引列表,函数列表,索引字段列表 | SHOW TABLES SHOW FUNCTIONS SHOW COLUMNS FROM test_logs |
SELECT | 指定查询返回的列 | SELECT url FROM test_logs |
FROM | 指定查询索引 | SELECT url FROM test_logs |
WHERE | 指定查询的过滤条件 | SELECT url FROM test_logs WHERE response = 200 |
GROUP BY | 指定聚合 | SELECT host, response, count(*) AS count_docs FROM test_logs GROUP BY host |
ORDER BY | 指定排序 | SELECT SCORE(), url FROM test_logs WHERE MATCH(url, ‘filebeat’) ORDER BY SCORE() DESC |
LIMIT | 指定返回数量 | SELECT SCORE(), url FROM test_logs WHERE MATCH(url, ‘filebeat’) ORDER BY SCORE() DESC LIMIT 10 |
HAVING | 指定管道聚合 | SELECT host, response, count(*) AS count_docs, avg(bytes) AS avg_bytes FROM test_logs GROUP BY host, response HAVING count_docs > 1000” |
… |
注意,上面的查询语句组成部分有序,如果组成顺序不正确,查询语句解析阶段将会报错。
3.3 支持函数
函数类型 | 函数名称 | 函数描述 |
---|---|---|
数学计算 | ABS,LOG,RANDOM,SIN,EXP, … | 一些数值计算函数 |
全文搜索 | MATCH, QUERY, SCOER | MATCH函数将转化为 multi_match query QUERY函数将转化为 query_string SCORE函数将返回 文档的相关性评分 |
直方图聚合 | HISTOGRAM | 直方图聚合,一般用于日期分组 |
聚合度量函数 | COUNT, COUNT(ALL), COUNT(DISTINCT) | 统计数量 |
AVG, MIN, MAX, SUM | 数值计算:平均值、最值、求和 | |
FIRST,LAST | 第一个或者最后一个值 | |
PERCENTILE | 百分位计算 | |
类型转化 | CAST | 字段类型转化,例如CAST(‘123’ AS INT) |
… |
四、局限与展望
4.1 局限
内容:(1)只能查询,无法添加、更新、删除;(2)查询的局限(注意部分是DSL也存在的问题)
简述 | 详细描述 | 示例 |
---|---|---|
不支持join查询 | 可以使用nested 类型字段 | |
(实际)不支持子查询 |
4.2 展望
内容:可能会有哪些改进,对比其他引擎的sql语言,例如:Solr SQL, Flink SQL等
五、参考
官方文档SQL Elasticsearch SQL 源码 Elasticsearch SQL 社区问题
中文社区: 玩转 Elasticsearch 的 SQL 功能
官方博客: Elasticsearch SQL 简介(附练习示例)— 第 1 部分
官方博客: Elasticsearch SQL 简介(附示例示例)- 第 2 部分