概述

:::info 您可以使用搜索API搜索和聚合存储在Elasticsearch数据流或索引中的数据。API的查询请求主体参数接受在查询DSL中编写的查询。
关于Search API见:速查手册06-REST APIs
关于聚合见:速查手册14-聚合
关于QUERY DSL见:速查手册07-QUERY DSL ::: :::info 下面针对搜索数据操作做一些探究。 :::

折叠搜索结果

聚合结果

  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "match_all": {}
  5. },
  6. "collapse": {
  7. "field": "test2.long_2" #1
  8. },
  9. "sort": [
  10. {
  11. "test2.long_2": { #2
  12. "order": "desc"
  13. }
  14. }
  15. ],
  16. "from": 0 #3
  17. }
  18. explain:
  19. #1 折叠的字段:test2.long_2
  20. #2 使用test2.long_2排序
  21. #3 定义第一个折叠时的偏移量

:::tips

  • 折叠仅适用于keyword或者number类型
  • 返回中hits.total.value 代表聚合前的数据量,聚合后的数据总数未知
  • 如果想知道折叠后的结果,可通过cardinality聚合实现,前提:cardinality和折叠,使用同一个字段,eg:例2 #1 #2 ::: 例2: ```java GET test/_search { “_source”: [ “id”, “is_lastest”, “second”, “name” ], “aggs”: { “totalSize”: {
    "cardinality": {                #1
      "field": "second_id"
    }
    
    } }, “query”: { “terms”: {
    "second_id": [
      "2208",
      "196158"
    ]
    
    } }, “collapse”: { #6 第一级 “field”: “second_id”, #2 “inner_hits”: { #3
    "name": "by_location",
    "collapse": {                     #4  第二级
      "field": "name"
    },
    "_source": [
      "id",
      "is_lastest",
      "second",
      "name"
    ],
    "size": 3
    
    } }, “sort”: [ {
    "second_id": {
      "order": "desc"
    }
    
    } ], “from”: 0 }

输出结果: { “took” : 42, “timed_out” : false, “_shards” : { “total” : 1, “successful” : 1, “skipped” : 0, “failed” : 0 }, “hits” : { “total” : { “value” : 62, “relation” : “eq” }, “max_score” : null, “hits” : [ { “_index” : “test”, “_type” : “_doc”, “_id” : “53421”, “_score” : null, “_source” : { “name” : “xh2107091117”, “id” : 45897 }, “fields” : { “second_id” : [ 196158 ] }, “sort” : [ 196158 ], “inner_hits” : { “by_location” : { “hits” : { “total” : { “value” : 3, “relation” : “eq” }, “max_score” : null, “hits” : [ { “_index” : “test”, “_type” : “_doc”, “_id” : “53421”, “_score” : 1.0, “_source” : { “name” : “xh2107091117”, “id” : 45897 }, “fields” : { “name” : [ “xh2107091117” ] } }, { “_index” : “test”, “_type” : “_doc”, “_id” : “53371”, “_score” : 1.0, “_source” : { “name” : “testName”, “id” : 45868 }, “fields” : { “name” : [ “testName” ] } }, { “_index” : “test”, “_type” : “_doc”, “_id” : “53390”, “_score” : 1.0, “_source” : { “name” : “text”, “id” : 45873 }, “fields” : { “name” : [ “text” ] } } ] } } } }, { “_index” : “test”, “_type” : “_doc”, “_id” : “50735”, “_score” : null, “_source” : { “name” : “apk”, “id” : 380 }, “fields” : { “second_id” : [ 2208 ] }, “sort” : [ 2208 ], “inner_hits” : { “by_location” : { “hits” : { “total” : { “value” : 59, “relation” : “eq” }, “max_score” : null, “hits” : [ { “_index” : “test”, “_type” : “_doc”, “_id” : “50735”, “_score” : 1.0, “_source” : { “name” : “apk”, “id” : 380 }, “fields” : { “name” : [ “apk” ] } }, { “_index” : “test”, “_type” : “_doc”, “_id” : “44901”, “_score” : 1.0, “_source” : { “name” : “testName2”, “id” : 43066 }, “fields” : { “name” : [ “testName2” ] } }, { “_index” : “test”, “_type” : “_doc”, “_id” : “51399”, “_score” : 1.0, “_source” : { “name” : “testName3”, “id” : 43066 }, “fields” : { “name” : [ “testName3” ] } } ] } } } } ] }, “aggregations” : { “totalSize” : { “value” : 1 } } }

<a name="CfETO"></a>
### 折叠后的topN
:::info
如果想获取折叠后,同值字段的前几条数据,可通过inner_hits获取。见例2  #3
:::
<a name="B2Zw4"></a>
### 和search_after搭配使用
例3
```java
GET /my-index-000001/_search
{
  "query": {
    "match": {
      "message": "GET /search"
    }
  },
  "collapse": {
    "field": "user.id"
  },
  "sort": [ "user.id" ],
  "search_after": ["dd5ce1ad"]
}

二级折叠不支持inner_hits

这句的意思是,看例2#4和#6处,这里的collapse位于#6的下一级,如果这里再定义inner_hits,例如:
例4

"collapse": {
    "field": "second_id",
    "inner_hits": {
      "name": "by_location",
      "collapse": {
        "field": "name",
        "inner_hits": {   #1
          "name": "by_location"
        }
      },
      "_source": [
        "id",
        "is_lastest",
        "second",
        "name"
      ],
      "size": 3
    }
  },
# 由于#1处,inner_hits处于第二级collapse下,则报错如下
{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "Invalid token in the inner collapse",
        "line": 28,
        "col": 25
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[28:25] [collapse] failed to parse field [inner_hits]",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[28:25] [inner_hits] failed to parse field [collapse]",
      "caused_by": {
        "type": "parsing_exception",
        "reason": "Invalid token in the inner collapse",
        "line": 28,
        "col": 25
      }
    }
  },
  "status": 400
}

筛选搜索结果

post_filter

GET /shirts/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": { "brand": "gucci" } 
      }
    }
  },
  "aggs": {
    "colors": {
      "terms": { "field": "color" } 
    },
    "color_red": {
      "filter": {
        "term": { "color": "red" } 
      },
      "aggs": {
        "models": {
          "terms": { "field": "model" } 
        }
      }
    }
  },
  "post_filter": { 
    "term": { "color": "red" }
  }
}
# post_filter是对聚合后的结果再次进行过滤,性能消耗较大,且不适用缓存,看场景而用。

query rescorer

:::info 可用于优化match_phase ::: query rescorer仅对query和post_filter阶段返回的Top-K结果执行第二个查询。每个分片上将检查的文档数量可以通过window_size参数控制,该参数默认为10。 默认情况下,原始查询和rescore查询的分数线性组合,以生成每个文档的最终_score。可以分别使用query_weight和rescore_query_weight来控制原始查询和rescore查询的相对重要性。两者都默认为

POST /_search
{
   "query" : {
      "match" : {
         "message" : {
            "operator" : "or",
            "query" : "the quick brown"
         }
      }
   },
   "rescore" : {
      "window_size" : 50,
      "query" : {
         "rescore_query" : {
            "match_phrase" : {
               "message" : {
                  "query" : "the quick brown",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }
}

可以使用score_mode控制分数组合的方式:

分数模式 描述
total 添加原始分数和rescore查询分数。默认值。
multiply 将原始分数乘以rescore查询分数。对function query重新调整有用。
avg 平均原始分数和rescore查询分数。
max 取最大原始分数和rescore查询分数。
min 取最初得分和rescore查询得分的分钟。

Multiple Rescores

也可以按顺序执行多个重新扫描:

POST /_search
{
   "query" : {
      "match" : {
         "message" : {
            "operator" : "or",
            "query" : "the quick brown"
         }
      }
   },
   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match_phrase" : {
               "message" : {
                  "query" : "the quick brown",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": {
                    "source": "Math.log10(doc.likes.value + 2)"
                  }
               }
            }
         }
      }
   } ]
}

突出显示(高亮)

简单示例

GET /_search
{
  "query": {
    "match": { "content": "kimchy" }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}
GET ljh_text/_search
{
  "query": {
    "match": {
      "content": {
        "query": "父亲"
      }
    }
  },
  "rescore": {
    "window_size": 50,
    "query": {
      "rescore_query": {
        "match_phrase": {
          "content": {
            "query": "父亲",
            "slop": 1
          }
        }
      },
      "rescore_query_weight": 10
    }
  },
  "_source": false,
  "highlight": {
    "order": "score",
    "tags_schema":"styled",
    "fields": {
      "content": {
         "type": "unified",
        "fragment_size": 150,
        "number_of_fragments": 3,
        "highlight_query": {
          "bool": {
            "must": {
              "match": {
                "content": {
                  "query": "父亲"
                }
              }
            },
            "should": {
              "match_phrase": {
                "content": {
                  "query": "父亲",
                  "slop": 1,
                  "boost": 10.0
                }
              }
            },
            "minimum_should_match": 0
          }
        }
      }
    }
  }
}

参数使用示例

以上示例参数解释

参数 含义 备注
#1 no_match_size 即使字段中没有关键字命中,也可以返回一段文字,该参数表示从开始多少个字符被返回。
#2 encoder 转义HTML,默认default,不转义 比如设置为html,
会被转义成:<div>
#3 number_of_fragment 返回结果最多可以包含几段不连续的文字。默认是5。
#4 fragment_size 返回的每个片段,文本长度,默认150
#5 pre_tags,post_tags 标记 highlight 的开始,结束 标签
#6 order 文本中,片段是否按分数排序 如果有需求要求更匹配的段落优先展示,可设置
highlight—>type 使用的高亮模式:unified, plain, or fvh. 默认为 unified。
highlight->tags_schema 设置为使用内置标记模式的样式。 设置后,返回数据变成”””包着的数据

高亮优化

高亮查询渲染所有高亮数据,性能消耗较大,可通过rescore优化

GET ljh_text/_search
{
  "query": {
    "match": {
      "content": {
        "query": "父亲"
      }
    }
  },
  "rescore": {
    "window_size": 50,
    "query": {
      "rescore_query": {
        "match_phrase": {
          "content": {
            "query": "父亲",
            "slop": 1
          }
        }
      },
      "rescore_query_weight": 10
    }
  },
  "_source": false,
  "highlight": {
    "order": "score",
    "fields": {
      "content": {
        "fragment_size": 150,
        "number_of_fragments": 3,
        "highlight_query": {
          "bool": {
            "must": {
              "match": {
                "content": {
                  "query": "父亲"
                }
              }
            },
            "should": {
              "match_phrase": {
                "content": {
                  "query": "父亲",
                  "slop": 1,
                  "boost": 10.0
                }
              }
            },
            "minimum_should_match": 0
          }
        }
      }
    }
  }
}

长时间运行的搜索

:::info Elasticsearch 7.7 版本带来一个新的特性,search 过程允许异步执行,客户端发送完 search 请求后,Elasticsearch 服务端给客户端返回一个 id,以后客户端拿这个 id 来获取 search进度,并且支持返回“部分”结果,这对于 UI 交互相关的查询请求非常友好,例如绘图过程可以逐步的显示出来。使用示例 :::

近实时搜索

写入流程

image.png :::info 1:数据写入Lucene内存和translog
2:经过设置的refresh时间,默认1秒,Lucene内存转segment,reopen,此时可搜索
3:经过设置的flush时间,默认30分钟,内存中的segment写磁盘,历史translog清空,完成一次刷新。
由于经过1秒才从Lucene内存转到可搜索的segment,所以es是近实时的搜索 :::


分页搜索结果

方式1:from+size

GET /_search
{
  "from": 5,
  "size": 20,
  "query": {
    "match": {
      "user.id": "kimchy"
    }
  }
}

方式2:scroll_id滚动搜索

:::tips

不再建议使用卷轴 API 进行深度填充。如果需要在超过 10,000 次点击时保持索引状态,请使用具有时间点 (PIT)search_after参数。 :::

POST /my-index-000001/_search?scroll=1m
{
  "size": 100,
  "query": {
    "match": {
      "message": "foo"
    }
  }
}

POST /_search/scroll                                                               
{
  "scroll" : "1m",                                                                 
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}

# 清除滚动
DELETE /_search/scroll
{
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

方式3:search_after

GET /_search
{
  "size": 10000,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  },
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "1m"
  },
  "sort": [ 
    {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
    {"_shard_doc": "desc"}
  ]
}

返回
{
  "pit_id" : "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
  "took" : 17,
  "timed_out" : false,
  "_shards" : ...,
  "hits" : {
    "total" : ...,
    "max_score" : null,
    "hits" : [
      ...
      {
        "_index" : "my-index-000001",
        "_id" : "FaslK3QBySSL_rrj9zM5",
        "_score" : null,
        "_source" : ...,
        "sort" : [                                
          "2021-05-20T05:30:04.832Z",
          4294967298                              
        ]
      }
    ]
  }
}

再次请求
GET /_search
{
  "size": 10000,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  },
  "pit": {
    "id":  "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
    "keep_alive": "1m"
  },
  "sort": [
    {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}}
  ],
  "search_after": [                                
    "2021-05-20T05:30:04.832Z",
    4294967298
  ],
  "track_total_hits": false                        
}


检索内部命中 inner_hits

:::info 用于join的父子类型和nested嵌套查询时,返回具体匹配的那个对象
inner_hits中支持的参数:from,size,sort,name :::

PUT test
{
  "mappings": {
    "properties": {
      "comments": {
        "type": "nested"
      }
    }
  }
}

PUT test/_doc/1?refresh
{
  "title": "Test title",
  "comments": [
    {
      "author": "kimchy",
      "number": 1
    },
    {
      "author": "nik9000",
      "number": 2
    }
  ]
}

POST test/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "match": { "comments.number": 2 }
      },
      "inner_hits": {} 
    }
  }
}


检索所选字段


跨群集搜索


搜索多个数据流和索引


搜索分片路由


搜索模板


对搜索结果排序

参考:↓
博文:https://www.cnblogs.com/gmhappy/p/11864048.html
官网:https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-your-data.html