当用户在搜索框输入字符时,我们应该提示出与该字符有关的搜索项,如图:
这种根据用户输入的字母,提示完整词条的功能,就是自动补全了。
因为需要根据拼音字母来推断,因此要用到拼音分词功能。
安装Elasticsearch,Kibana,IK分词器,pinyin分词器
自定义分词器
默认的拼音分词器会将每个汉字单独分为拼音,还有一个全部字的首字母的缩写,
而我们希望的是每个词条形成一组拼音,词条拼音首字母的缩写,需要对拼音分词器做个性化定制,形成自定义分词器。
elasticsearch中分词器(analyzer)的组成包含三部分:
- character filters:在tokenizer之前对文本进行处理。例如删除字符、替换字符
- tokenizer:将文本按照一定的规则切割成词条(term)。例如keyword,就是不分词;还有ik_smart
- tokenizer filter:将tokenizer输出的词条做进一步处理。例如大小写转换、同义词处理、拼音处理等
自定义分词器语法
关于py的配置,看官网文档
https://github.com/medcl/elasticsearch-analysis-pinyin
PUT /test{"settings": {"analysis": {"analyzer": { #自定义分词器"my_analyzer": { #分词器名称"tokenizer": "ik_max_word","filter": "py"}},"filter": { #自定义filter"py": { #过滤器名称"type": "pinyin", #过滤器类型,这里是pinyin分词器"keep_full_pinyin": false, #不要单字分词,词组分词"keep_joined_full_pinyin": true, #全拼"keep_original": true, #保留中文"limit_first_letter_length": 16,"remove_duplicated_term": true,"none_chinese_pinyin_tokenize": false}}}},"mappings": { #创建mapping映射"properties": {"name": {"type": "text","analyzer": "my_analyzer", #创建倒排索引时用的分析器"search_analyzer": "ik_smart" #搜索时用的分析器}}}}
搜索时不能用拼音分词器的原因
自动补全查询
elasticsearch提供了Completion Suggester查询来实现自动补全功能。这个查询会匹配以用户输入内容开头的词条并返回。为了提高补全查询的效率,对于文档中字段的类型有一些约束:
- 参与补全查询的字段必须是completion类型。
- 字段的内容一般是用来补全的多个词条形成的数组。
比如,一个这样的索引库:
#创建索引库PUT /test{"mappings": {"properties": {"title": {"type": "completion"}}}}
#插入数据POST /test2/_doc{"title": ["Sony", "WH-1000XM3"]}POST /test2/_doc{"title": ["SK-II", "PITERA"]}POST /test2/_doc{"title": ["Nintendo", "switch"]}
#自动补全查询GET /test2/_search{"suggest": {"title_suggest": { #自己取的名字"text": "s", #用户要自动补全的关键字"completion": {"field": "title", #补全查询的字段"skip_duplicates": true, #跳过重复的"size": 10 #获取前10条结果}}}}
实现酒店搜索框自动补全
之前弄的htole酒店索引库,RestClient操作索引库
我们的hotel索引库还没有设置拼音分词器,需要修改索引库中的配置。但是我们知道索引库是无法修改的,只能删除然后重新创建。
另外,我们需要添加一个字段,用来做自动补全,将brand、suggestion、city等都放进去,作为自动补全的提示。
因此,总结一下,我们需要做的事情包括:
- 修改hotel索引库结构,设置自定义拼音分词器
- 修改索引库的name、all字段,使用自定义分词器
- 索引库添加一个新字段suggestion,类型为completion类型,使用自定义的分词器
- 给HotelDoc类添加suggestion字段,内容包含brand、business
- 重新导入数据到hotel库
修改酒店映射结构
如果突然这json的字段的含义,setting部分就在该文章,往上看看,mapping部分在索引库操作// 酒店数据索引库PUT /hotel{"settings": {"analysis": {"analyzer": {"text_anlyzer": {"tokenizer": "ik_max_word","filter": "py"},"completion_analyzer": {"tokenizer": "keyword","filter": "py"}},"filter": {"py": {"type": "pinyin","keep_full_pinyin": false,"keep_joined_full_pinyin": true,"keep_original": true,"limit_first_letter_length": 16,"remove_duplicated_term": true,"none_chinese_pinyin_tokenize": false}}}},"mappings": {"properties": {"id":{"type": "keyword"},"name":{"type": "text","analyzer": "text_anlyzer","search_analyzer": "ik_smart","copy_to": "all"},"address":{"type": "keyword","index": false},"price":{"type": "integer"},"score":{"type": "integer"},"brand":{"type": "keyword","copy_to": "all"},"city":{"type": "keyword"},"starName":{"type": "keyword"},"business":{"type": "keyword","copy_to": "all"},"location":{"type": "geo_point"},"pic":{"type": "keyword","index": false},"all":{"type": "text","analyzer": "text_anlyzer","search_analyzer": "ik_smart"},"suggestion":{"type": "completion","analyzer": "completion_analyzer"}}}}
修改HotelDoc实体
HotelDoc中要添加一个字段,用来做自动补全,内容可以是酒店品牌、城市、商圈等信息。按照自动补全字段的要求,最好是这些字段的数组。
因此我们在HotelDoc中添加一个suggestion字段,类型为List,然后将brand、city、business等信息放到里面。 ```java package cn.itcast.hotel.pojo;
import lombok.Data; import lombok.NoArgsConstructor;
import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.List;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
private Object distance;
private Boolean isAD;
private List
public HotelDoc(Hotel hotel) {this.id = hotel.getId();this.name = hotel.getName();this.address = hotel.getAddress();this.price = hotel.getPrice();this.score = hotel.getScore();this.brand = hotel.getBrand();this.city = hotel.getCity();this.starName = hotel.getStarName();this.business = hotel.getBusiness();this.location = hotel.getLatitude() + ", " + hotel.getLongitude();this.pic = hotel.getPic();// 组装suggestionif(this.business.contains("/")){// business有多个值,需要切割String[] arr = this.business.split("/");// 添加元素this.suggestion = new ArrayList<>();this.suggestion.add(this.brand);Collections.addAll(this.suggestion, arr);}else {this.suggestion = Arrays.asList(this.brand, this.business);}}
}
<a name="Uv62E"></a>## 重新导入数据和测试执行之前的批量导入数据方法,重新导入数据<br />[RestClient操作文档](https://www.yuque.com/shifeng-wl7di/bbpx3m/tggcbk?view=doc_embed&inner=wqQWQ)<br />导入完了数据查看数据<br /><br />测试一下自动补全<br /><a name="dRluV"></a># RestClient自动补全请求部分<br /><br />响应解析部分<br />```java@Testvoid testSuggest() throws IOException {//1.准备请求SearchRequest request = new SearchRequest("hotel");//2.请求参数SuggestBuilder suggestBuilder =new SuggestBuilder().addSuggestion("my_suggestion", SuggestBuilders.completionSuggestion("suggestion").prefix("b").skipDuplicates(true).size(10));request.source().suggest(suggestBuilder);//3.发送请求SearchResponse response = client.search(request, RequestOptions.DEFAULT);//4.处理结果Suggest suggest = response.getSuggest();//4.1.根据名称获取补全结果CompletionSuggestion mySuggestion = suggest.getSuggestion("my_suggestion");//4.2.获取options并解析for (CompletionSuggestion.Entry.Option option : mySuggestion.getOptions()) {//4.3.获取option中的text,也就是补全词条String text = option.getText().string();System.out.println(text);}}
