ES—DSL RestClient实现搜索 - 《JAVA综合》

常用查询DSL及对应的RestClient：
复合查询：（function_score/bool）
- function_score:——条件只能是filter
- Bool查询——过滤条件多种

ES对比mysql数据库：索引表就是表，每个文档就是mysql的每一行，每个列名就是field，也就是一个字段。

常用查询DSL及对应的RestClient：

//测试
private RestHighLevelClient client;
@BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://121.36.164.132:9200")
        ));
    }
@AfterEach
void tearDown() throws IOException {
    this.client.close();
}
//或者在启动类添加Bean
 @Bean
    public RestHighLevelClient client(){
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://121.36.164.132:9200")
        ));
        return client;
    }

基本查询：

查询所有：match_all—无条件查询

对应java代码：

@Test
    void testMatchAll() throws IOException {
        //1.创建请求
        SearchRequest request = new SearchRequest("hotel");
        //2.组织条件
        request.source().query(QueryBuilders.matchAllQuery());
        //3.发送请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //4.解析响应
        handleResponse(response);
    }

单字段查询：match——只能有一个列

@Test
    void testMatch() throws IOException {
        //1.
        SearchRequest request = new SearchRequest("hotel");
        //2.
        request.source().query(QueryBuilders.matchQuery
                ("all", "外滩"));
        //3.
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //4.
        handleResponse(response);
    }

多字段查询：multi_match

@Test
    void testMultiMatch() throws IOException {
        //1.
        SearchRequest request = new SearchRequest("hotel");
        //2.
        request.source().query(QueryBuilders.multiMatchQuery(
                "外滩如家", "name", "business"
        ));
        //3.请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //4.解析
        handleResponse(response);
    }

精准查询：

term查询的字段一定是keyword，因为是精准查询不能分词

range范围查询需要用到：# gte 大于等于 gt 大于 lte 小于等于 lt 小于

@Test
    void testJingZhun() throws IOException {
        //1.
        SearchRequest request = new SearchRequest("hotel");
        //2.
        //term
        // request.source().query(QueryBuilders.termQuery
        //        ("city","上海"));
        //range
        request.source().query(QueryBuilders.rangeQuery("price")
                .gte(1000).lte(2000));
        //请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //解析
        handleResponse(response);
    }

地理坐标查询：

// geo_bounding_box查询
GET /indexName/_search
{
  "query": {
    "geo_bounding_box": {
      "FIELD": {
        "top_left": { // 左上点
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right": { // 右下点
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

附件搜索：以点为圆心搜索——常用——geo_distance

            //距离排序
            String location = params.getLocation();
            if (location != null && !location.equals("")) {
                request.source().sort(SortBuilders
                        .geoDistanceSort("location", new GeoPoint(location))
                        .order(SortOrder.ASC)
                        .unit(DistanceUnit.KILOMETERS)
                );
            }

复合查询：（function_score/bool）

function_score:——条件只能是filter

原始查询条件：query部分，基于这个条件搜索文档，并且基于BM25算法给文档打分，原始算分（query score)
过滤条件：filter部分，符合该条件的文档才会重新算分
算分函数：符合filter条件的文档要根据这个函数做运算，得到的函数算分（function score），有四种函数
- weight：函数结果是常量
- field_value_factor：以文档中的某个字段值作为函数结果
- random_score：以随机数作为函数结果
- script_score：自定义算分函数算法
运算模式：算分函数的结果、原始查询的相关性算分，两者之间的运算方式，包括：
- multiply：相乘
- replace：用function score替换query score
- 其它，例如：sum、avg、max、min

#fuction score——算分函数
GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "term": {
          "name": {
            "value": "如家"
          }
        }
      },
      "functions": [
        {
          "filter": {
            "term": {
              "city": "上海"
            }
          },
          "weight": 10
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

//function_score
            FunctionScoreQueryBuilder functionScoreQuery = QueryBuilders.functionScoreQuery(
                    // 原始查询，相关性算分的查询
                    boolQuery,
                    // function score的数组
                    new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{
                            // 其中的一个function score 元素
                            new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                                    //过滤条件
                                    QueryBuilders.termQuery("isAD", true),
                                    //算分函数
                                    ScoreFunctionBuilders.weightFactorFunction(10)
                            )
                    }
            );
//boolQuery之前就加了其他的条件了。

function_score执行流程
1、先根据原始条件查询搜索文档，根据相关性计算得分，也就是原始算分
2、跟过过滤条件（filter）去过滤文档
3、根据过滤条件的文档，基于算分函数运算，得到函数算分
4、将原始算分和函数算分基于运算模式运算，得到最终得分，最终得分高的排前面。

Bool查询——过滤条件多种

多种条件：

must：必须匹配每个子查询，类似“与”
should：选择性匹配子查询，类似“或”
must_not：必须不匹配，不参与算分，类似“非”

filter：必须匹配，不参与算分

GET /hotel/_search
{
"query": {
  "bool": {
    "must": [
      {"term": {"city": "上海" }}
    ],
    "should": [
      {"term": {"brand": "皇冠假日" }},
      {"term": {"brand": "华美达" }}
    ],
    "must_not": [
      { "range": { "price": { "lte": 500 } }}
    ],
    "filter": [
      { "range": {"score": { "gte": 45 } }}
    ]
  }
}
}

```java @Test void testBool() throws IOException {

  //1
  SearchRequest request = new SearchRequest("hotel");
  //2.
  request.source().query(QueryBuilders.boolQuery().must(
                  QueryBuilders.termQuery("name", "如家")
          ).mustNot(
                  QueryBuilders.termQuery("city", "上海")
          ).should(
                  QueryBuilders.termQuery("brand", "皇冠假日")
          ).should(
                  QueryBuilders.termQuery("brand", "华美达")
          )
  );
  //3.
  SearchResponse response = client.search(request, RequestOptions.DEFAULT);
  handleResponse(response);

}

<a name="UHEZf"></a>
### 排序/分页
<a name="MvpS0"></a>
#### 排序
desc:降序，asc：升序<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27094793/1652060923032-d2731b67-40da-4dc3-99c5-1526e80ec617.png#clientId=u2d11328c-52a7-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=228&id=u59d52356&margin=%5Bobject%20Object%5D&name=image.png&originHeight=313&originWidth=837&originalType=binary&ratio=1&rotation=0&showTitle=false&size=24796&status=done&style=none&taskId=u6e2f33cb-d031-4d4c-adb0-702eb1086ff&title=&width=608.7272727272727)<br />地理坐标排序<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27094793/1652061014227-d410c104-1bf5-47ed-a72a-57460b7ff5df.png#clientId=u2d11328c-52a7-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=321&id=u36c91781&margin=%5Bobject%20Object%5D&name=image.png&originHeight=441&originWidth=970&originalType=binary&ratio=1&rotation=0&showTitle=false&size=74646&status=done&style=none&taskId=u40edd744-a990-43f1-9199-d954a3d5533&title=&width=705.4545454545455)
<a name="yavAk"></a>
#### 分页
![image.png](https://cdn.nlark.com/yuque/0/2022/png/27094793/1652061038258-fdd1e6bf-40b6-4a6e-be92-fabc49afccca.png#clientId=u2d11328c-52a7-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=228&id=u42c1f48c&margin=%5Bobject%20Object%5D&name=image.png&originHeight=314&originWidth=688&originalType=binary&ratio=1&rotation=0&showTitle=false&size=26079&status=done&style=none&taskId=u0ac8bdae-aed0-46f7-9c78-ef72ed32ee1&title=&width=500.3636363636364)<br />深度分页：（了解）<br />分页查询的常见实现方案以及优缺点：

- from + size：
   - 优点：支持随机翻页
   - 缺点：深度分页问题，默认查询上限（from + size）是10000
   - 场景：百度、京东、谷歌、淘宝这样的随机翻页搜索
- after search：
   - 优点：没有查询上限（单次查询的size不超过10000）
   - 缺点：只能向后逐页查询，不支持随机翻页
   - 场景：没有随机翻页需求的搜索，例如手机向下滚动翻页
- scroll：
   - 优点：没有查询上限（单次查询的size不超过10000）
   - 缺点：会有额外内存消耗，并且搜索结果是非实时的
   - 场景：海量数据的获取和迁移。从ES7.1开始不推荐，建议用 after search方案。
<a name="d2Or0"></a>
### 高亮处理

- 高亮是对关键字高亮，因此**搜索条件必须带有关键字**，而不能是范围这样的查询。
- 默认情况下，**高亮的字段，必须与搜索指定的字段一致**，否则无法高亮
- 如果要对非搜索字段高亮，则需要添加一个属性：required_field_match=false

![image.png](https://cdn.nlark.com/yuque/0/2022/png/27094793/1652061135052-fa4a1e48-c079-400d-a4e7-9e8c7b31f562.png#clientId=u2d11328c-52a7-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=332&id=u5d3f8018&margin=%5Bobject%20Object%5D&name=image.png&originHeight=457&originWidth=1027&originalType=binary&ratio=1&rotation=0&showTitle=false&size=42186&status=done&style=none&taskId=u831d9f56-882a-4f4d-b9bb-856fd112d9d&title=&width=746.9090909090909)
<a name="PZMti"></a>
### 代码处理返回结果：层层解析
DSL返回结果一一对应代码<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27094793/1652061279437-296572ea-deb7-498d-a251-eeb394bfc8e7.png#clientId=u2d11328c-52a7-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=455&id=ud0161a42&margin=%5Bobject%20Object%5D&name=image.png&originHeight=625&originWidth=440&originalType=binary&ratio=1&rotation=0&showTitle=false&size=43373&status=done&style=none&taskId=u909160fe-82b7-47e9-8c48-110e2510a2b&title=&width=320)
```java
private PageResult handleResponse(SearchResponse response) {
        //解析结果
        SearchHits searchHits = response.getHits();
        //总数
        long total = searchHits.getTotalHits().value;
        //集合
        SearchHit[] hits = searchHits.getHits();
        List<HotelDoc> hotelDocList = new ArrayList<>();
        for (SearchHit hit : hits) {
            //酒店对象JSON
            String hitSourceAsString = hit.getSourceAsString();
            //转换对象hotelDoc
            HotelDoc hotelDoc = JSON.parseObject(hitSourceAsString, HotelDoc.class);
            //距离
            Object[] sortValues = hit.getSortValues();
            if (sortValues != null && sortValues.length > 0) {
                hotelDoc.setDistance(sortValues[0]);
            }
            hotelDocList.add(hotelDoc);
        }
        return new PageResult(total, hotelDocList);
    }

高亮DSL：
其中的highlight跟source平级

//高亮解析
            Map<String, HighlightField> highMap = hit.getHighlightFields();
            if (!CollectionUtils.isEmpty(highMap)) {
                HighlightField name = highMap.get("name");
                if (name != null) {
                    hotelDoc.setName(name.getFragments()[0].string());
                }
            }