概述
:::info Elasticsearch SQL是一个X-Pack组件,允许针对Elasticsearch实时执行类似SQL的查询。无论是使用REST接口、命令行还是JDBC,任何客户端都可以使用SQL在Elasticsearch中本机搜索和聚合数据。人们可以将Elasticsearch SQL视为一个翻译器,它能够理解SQL和Elasticsearch,并且通过利用Elasticsearch功能,可以方便地实时读取和处理数据。 :::
从SQL开始
#1 新建数据测试
PUT /library/_bulk?refresh
{"index":{"_id": "Leviathan Wakes"}}
{"name": "Leviathan Wakes", "author": "James S.A. Corey", "release_date": "2011-06-02", "page_count": 561}
{"index":{"_id": "Hyperion"}}
{"name": "Hyperion", "author": "Dan Simmons", "release_date": "1989-05-26", "page_count": 482}
{"index":{"_id": "Dune"}}
{"name": "Dune", "author": "Frank Herbert", "release_date": "1965-06-01", "page_count": 604}
# 查询
POST /_sql?format=txt
{
"query": "SELECT * FROM library WHERE release_date < '2000-01-01'"
}
# 结果
author | name | page_count | release_date
---------------+---------------+---------------+------------------------
Dan Simmons |Hyperion |482 |1989-05-26T00:00:00.000Z
Frank Herbert |Dune |604 |1965-06-01T00:00:00.000Z
公约和术语
SQL语义和ES映射关系
SQL | ES | 说明 |
---|---|---|
column | field | 列 VS 字段 |
row | document | 行 VS 一条文档 |
table | index | 表 VS 索引名称 |
schema | implicit | 在 RDBMS 中,schema主要是表的命名空间,通常用作安全边界。弹性搜索不提供等效的概念。但是,当启用安全性时,弹性搜索会自动应用安全执行,以便角色只看到允许使用的数据(用 SQL 行话、其模式)). |
catalog or database | cluster instance | 在 SQL 中,可互换使用catalog或database,并代表一组架构,即多个表。在弹性搜索中,可用的一组指数被分组在一个cluster中。语义也略有不同:database本质上是另一个命名空间(这可能对数据存储方式产生一些影响),而弹性搜索cluster是一个运行时实例,或者更确切地说,是至少一组弹性搜索实例(通常运行分布式)。在实践中,这意味着在 SQL 中,一个实例中可能具有多个目录,而在弹性搜索中,则只能有一个目录. |
cluster | cluster | 统上,在 SQL 中,集群是指包含多个catalog或database的单个 RDMBS 实例(见上文)。同一个词也可以在弹性搜索中重复使用,但是它的语义澄清了一点。 虽然 RDBMS 往往只有一个运行实例,但在单台机器(未分布式)上,弹性搜索走相反的路,默认情况下,是分布式和多实例。 此外,弹性搜索cluster可以以联合方式连接到其他cluster因此cluster意味着: 单个聚类::多个弹性搜索实例通常分布在计算机之间,在同一命名空间内运行。多个组集::多个组集,每个组都有自己的命名空间,在联网设置中相互连接(参见跨集群搜索)). |
SQL REST API
1:数据返回格式
:::info
csv,json,tsv,txt,yaml,cbor,smile
:::
其中CSV格式接受格式 URL 查询属性,即delimiter,该属性指示应使用哪个字符来分离 CSV 值。它默认逗号,不能采取以下任何值: 双报价”运输返回\r和新行\n。选项卡\t也不能使用,需要改用tsv格式。
例如:(以下参数‘delimiter’ 在7.1.1版本下并不支持)
POST /_sql?format=csv&delimiter=%3b
{
"query": "SELECT * FROM library ORDER BY page_count DESC",
"fetch_size": 5
}
2:SQL和DSL混用
POST /_sql?format=txt
{
"query": "SELECT * FROM library ORDER BY page_count DESC",
"filter": {
"range": {
"page_count": {
"gte" : 100,
"lte" : 200
}
}
},
"fetch_size": 5
}
:::info 有一个功能很有用但是不常用,当指定routing时,MySQL并不能指定routing的查询,这时可用DSL支持。 :::
POST /_sql?format=txt
{
"query": "SELECT * FROM library",
"filter": {
"terms": {
"_routing": ["abc"]
}
}
}
3:使用分页,并在后续查询中,只展示结果(7.1.1中并不支持)
# 第一次查询
POST /_sql?format=json
{
"query": "SELECT * FROM library ORDER BY page_count DESC",
"fetch_size": 5,
"columnar": true
}
#结果
{
"columns": [
{"name": "author", "type": "text"},
{"name": "name", "type": "text"},
{"name": "page_count", "type": "short"},
{"name": "release_date", "type": "datetime"}
],
"values": [
["Peter F. Hamilton", "Vernor Vinge", "Frank Herbert", "Alastair Reynolds", "James S.A. Corey"],
["Pandora's Star", "A Fire Upon the Deep", "Dune", "Revelation Space", "Leviathan Wakes"],
[768, 613, 604, 585, 561],
["2004-03-02T00:00:00.000Z", "1992-06-01T00:00:00.000Z", "1965-06-01T00:00:00.000Z", "2000-03-15T00:00:00.000Z", "2011-06-02T00:00:00.000Z"]
],
"cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWWWdrRlVfSS1TbDYtcW9lc1FJNmlYdw==:BAFmBmF1dGhvcgFmBG5hbWUBZgpwYWdlX2NvdW50AWYMcmVsZWFzZV9kYXRl+v///w8="
}
# 第二次查询
POST /_sql?format=json
{
"cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWWWdrRlVfSS1TbDYtcW9lc1FJNmlYdw==:BAFmBmF1dGhvcgFmBG5hbWUBZgpwYWdlX2NvdW50AWYMcmVsZWFzZV9kYXRl+v///w8=",
"columnar": true
}
# 结果 ,没有字段数据了
{
"values": [
["Dan Simmons", "Iain M. Banks", "Neal Stephenson", "Frank Herbert", "Frank Herbert"],
["Hyperion", "Consider Phlebas", "Snow Crash", "God Emperor of Dune", "Children of Dune"],
[482, 471, 470, 454, 408],
["1989-05-26T00:00:00.000Z", "1987-04-23T00:00:00.000Z", "1992-06-01T00:00:00.000Z", "1981-05-28T00:00:00.000Z", "1976-04-21T00:00:00.000Z"]
],
"cursor": "46ToAwFzQERYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQUVXWjBaNlFXbzNOV0pVY21Wa1NUZDJhV2t3V2xwblp3PT3/////DwQBZgZhdXRob3IBBHRleHQAAAFmBG5hbWUBBHRleHQAAAFmCnBhZ2VfY291bnQBBGxvbmcBAAFmDHJlbGVhc2VfZGF0ZQEIZGF0ZXRpbWUBAAEP"
}
4:参数传递
例如,通过将值集成到查询字符串本身中,可以在查询状态或HAVING”语句中使用值:
:::
POST /_sql?format=txt
{
"query": "SELECT YEAR(release_date) AS year FROM library WHERE page_count > 300 AND author = 'Frank Herbert' GROUP BY year HAVING COUNT(*) > 0"
}
:::info 或者可以通过提取单独的参数列表中的值和在查询字符串中使用问号占位符?) 来完成 :::
POST /_sql?format=txt
{
"query": "SELECT YEAR(release_date) AS year FROM library WHERE page_count > ? AND author = ? GROUP BY year HAVING COUNT(*) > ?",
"params": [300, "Frank Herbert", 0]
}
5:在运行时字段中使用(7.1.1暂不支持runtime_field)
POST _sql?format=txt
{
"runtime_mappings": {
"release_day_of_week": {
"type": "keyword",
"script": """
emit(doc['release_date'].value.dayOfWeekEnum.toString())
"""
}
},
"query": """
SELECT * FROM library WHERE page_count > 300 AND author = 'Frank Herbert'
"""
}
SQL Translte API(SQL翻译成DSL)
:::info 对于一时想不起DSL如何写,有时会有帮助 :::
POST /_sql/translate
{
"query": "SELECT id,name FROM entity_sku where id = 1 ORDER BY id DESC",
"fetch_size": 10
}
#输出
{
"size" : 10,
"query" : {
"term" : {
"id" : {
"value" : 1,
"boost" : 1.0
}
}
},
"_source" : false,
"stored_fields" : "_none_",
"docvalue_fields" : [
{
"field" : "id"
},
{
"field" : "name"
}
],
"sort" : [
{
"id" : {
"order" : "desc",
"missing" : "_first",
"unmapped_type" : "long"
}
}
]
}
SQL CLI
#1 本机无密码
./bin/elasticsearch-sql-cli
#2 指定ES服务器,无密码
./bin/elasticsearch-sql-cli https://some.server:9200
#3 指定ES服务且有密码
./bin/elasticsearch-sql-cli https://sql_user:strongpassword@some.server:9200
#4 登录控制台之后,进行相应的查询:
sql> SELECT * FROM library WHERE page_count > 500 ORDER BY page_count DESC;
author | name | page_count | release_date
-----------------+--------------------+---------------+---------------
Peter F. Hamilton|Pandora's Star |768 |1078185600000
Vernor Vinge |A Fire Upon the Deep|613 |707356800000
Frank Herbert |Dune |604 |-144720000000