插入测试数据

  1. POST nba/_bulk
  2. {"index":{"_index":"nba","_type":"_doc","_id":"1"}}
  3. {"countryEn":"United States","teamName":"老鹰","birthDay":831182400000,"country":"美国","teamCityEn":"Atlanta","code":"jaylen_adams","displayAffiliation":"United States","displayName":"杰伦 亚当斯","schoolType":"College","teamConference":"东部","teamConferenceEn":"Eastern","weight":"86.2 公斤","teamCity":"亚特兰大","playYear":1,"jerseyNo":"10","teamNameEn":"Hawks","draft":2018,"displayNameEn":"Jaylen Adams","heightValue":1.88,"birthDayStr":"1996-05-04","position":"后卫","age":23,"playerId":"1629121"}
  4. {"index":{"_index":"nba","_type":"_doc","_id":"2"}}
  5. {"countryEn":"New Zealand","teamName":"雷霆","birthDay":743140800000,"country":"新西兰","teamCityEn":"Oklahoma City","code":"steven_adams","displayAffiliation":"Pittsburgh/New Zealand","displayName":"斯蒂文 亚当斯","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"120.2 公斤","teamCity":"俄克拉荷马城","playYear":6,"jerseyNo":"12","teamNameEn":"Thunder","draft":2013,"displayNameEn":"Steven Adams","heightValue":2.13,"birthDayStr":"1993-07-20","position":"中锋","age":26,"playerId":"203500"}
  6. {"index":{"_index":"nba","_type":"_doc","_id":"5"}}
  7. {"countryEn":"United States","teamName":"马刺","birthDay":490593600000,"country":"美国","teamCityEn":"New Orleans","code":"lamarcus_aldridge","displayAffiliation":"Texas/United States","displayName":"拉马库斯 阿尔德里奇","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"117.9 公斤","teamCity":"圣安东尼奥","playYear":13,"jerseyNo":"12","teamNameEn":"Spurs","draft":2006,"displayNameEn":"LaMarcus Aldridge","heightValue":2.11,"birthDayStr":"1985-07-19","position":"中锋-前锋","age":34,"playerId":"200746"}
  8. {"index":{"_index":"nba","_type":"_doc","_id":"6"}}
  9. {"countryEn":"Canada","teamName":"鹈鹕","birthDay":887000400000,"country":"加拿大","teamCityEn":"New Orleans","code":"nickeil_alexander-walker","displayAffiliation":"Virginia Tech/Canada","displayName":"Nickeil Alexander-Walker","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"92.5 公斤","teamCity":"新奥尔良","playYear":0,"jerseyNo":"","teamNameEn":"Pelicans","draft":2019,"displayNameEn":"Nickeil Alexander-Walker","heightValue":1.96,"birthDayStr":"1998-02-09","position":"后卫","age":21,"playerId":"1629638"}
  10. {"countryEn":"United States","teamName":"尼克斯","birthDay":727074000000,"country":"美国","teamCityEn":"New York","code":"kadeem_allen","displayAffiliation":"Arizona/United States","displayName":"卡迪姆 艾伦","schoolType":"College","teamConference":"东部","teamConferenceEn":"Eastern","weight":"90.7 公斤","teamCity":"纽约","playYear":2,"jerseyNo":"0","teamNameEn":"Knicks","draft":2017,"displayNameEn":"Kadeem Allen","heightValue":1.9,"birthDayStr":"1993-01-15","position":"后卫","age":26,"playerId":"1628443"}

单字段检索

只要分词后能匹配到了,那么即命中,单词顺序不影响查询结果,也不会影响最终分数。
如下 2 种写法等价,第二种写法扩展性更好,第一种写法更简洁。

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "match": {
  5. "teamCityEn": "New York" # "York New"
  6. }
  7. }
  8. }
  9. GET /nba/_search
  10. {
  11. "query": {
  12. "match": {
  13. "teamCityEn": {
  14. "query": "New York"
  15. }
  16. }
  17. }
  18. }

operator

支持andor,默认是or即只包含任何一个分词后的结果,那么即匹配。如果要全部匹配。建议使用and

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "match": {
  5. "teamCityEn": {
  6. "query": "New York",
  7. "operator": "and"
  8. }
  9. }
  10. }
  11. }

minimum_should_match

and的粒度太粗了,必须要全部满足。当需求要求的是部分满足的时候,可以使用:minimum_should_match
该选项表示:至少匹配多少个单词。比如如下案例,表示至少要匹配到New/York/States中的 2 个单词。

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "match": {
  5. "teamCityEn":
  6. {
  7. "query": "New York States",
  8. "minimum_should_match": 2
  9. }
  10. }
  11. }
  12. }

上述写法除了具体的数值外也支持百分数,比如下面这写法

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "match": {
  5. "teamCityEn":
  6. {
  7. "query": "New York States",
  8. "minimum_should_match": "50%"
  9. }
  10. }
  11. }
  12. }

表示至少匹配 3 个单词中50%,此处有个坑,3 个单词的 50%是1.5,但是 ES会向下取整即 1,所以只要匹配到1 个分词后的结果,即会展示在结果中。

多字段不同内容搜索

如下例子中是多字段且不同字段带有不同的权重

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "should": [
  6. {
  7. "match": {
  8. "teamCity": {
  9. "query": "New York",
  10. "boost": 2
  11. }
  12. }
  13. },
  14. {
  15. "match": {
  16. "displayName": {
  17. "query": "亚当斯",
  18. "boost": 5
  19. }
  20. }
  21. }
  22. ]
  23. }
  24. }
  25. }

多字段相同内容搜索

该需求是最近似我们通过搜索引擎搜索的一个场景。
在搜索的时候,我们通常会遇到一个内容,匹配多个字段,比如输入”亚当斯”,需要从:displayName、teamName、country三个字段中搜索出相似内容。

基本搜索

不做其他处理的情况下可以使用下面 2 种写法。

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "should": [
  6. {
  7. "match": {
  8. "teamCity": {
  9. "query": "亚当斯"
  10. }
  11. }
  12. },
  13. {
  14. "match": {
  15. "displayName": {
  16. "query": "亚当斯"
  17. }
  18. }
  19. },
  20. {
  21. "match": {
  22. "country": {
  23. "query": "亚当斯"
  24. }
  25. }
  26. }
  27. ]
  28. }
  29. }
  30. }
  31. # 方法 2,后续详细讲解
  32. GET /nba/_search
  33. {
  34. "query": {
  35. "multi_match": {
  36. "query": "亚当斯",
  37. "fields": [
  38. "displayName",
  39. "teamName",
  40. "country"]
  41. }
  42. }
  43. }

一种是非常简洁的multi_match,另外一种是bool-should-match

dis_max

上述查询中,会将所有字段匹配后,再显示结果,假如搜索项在其中某一个字段的分值非常高,但是其他 2 项的分值很低,就会拉低平均值。
但是通常我们希望找到的是某一项最高的某个项目,3 个字段中任何一个符合都可以,此时就需要使用dis_max了。

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "dis_max": {
  5. "queries": [
  6. {
  7. "match": {
  8. "teamCity": {
  9. "query": "亚当斯"
  10. }
  11. }
  12. },
  13. {
  14. "match": {
  15. "displayName": {
  16. "query": "亚当斯"
  17. }
  18. }
  19. },
  20. {
  21. "match": {
  22. "country": {
  23. "query": "亚当斯"
  24. }
  25. }
  26. }
  27. ]
  28. }
  29. }
  30. }

tie_breaker

上述查询中,只考虑匹配值最大的那个,不考虑其他字段的匹配度。但是假如也想将其他字段的检索结果纳入匹配考虑,此时就可以使用tie_breaker

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "dis_max": {
  5. "tie_breaker": 0.7,
  6. "queries": [
  7. {
  8. "match": {
  9. "teamCity": {
  10. "query": "亚当斯"
  11. }
  12. }
  13. },
  14. {
  15. "match": {
  16. "displayName": {
  17. "query": "亚当斯"
  18. }
  19. }
  20. },
  21. {
  22. "match": {
  23. "country": {
  24. "query": "亚当斯"
  25. }
  26. }
  27. }
  28. ]
  29. }
  30. }
  31. }

boost

当然你也可以结合boost将某个字段的权重设置的比较高,然后 使用dis_max找出最高的分值的那个,同时考虑其他字段的影响tie_brekder

  1. GET /nba/_search
  2. {
  3. "query": {
  4. "dis_max": {
  5. "tie_breaker": 0.7,
  6. "queries": [
  7. {
  8. "match": {
  9. "teamCity": {
  10. "query": "亚当斯"
  11. }
  12. }
  13. },
  14. {
  15. "match": {
  16. "displayName": {
  17. "query": "亚当斯",
  18. "boost": 3
  19. }
  20. }
  21. },
  22. {
  23. "match": {
  24. "country": {
  25. "query": "亚当斯"
  26. }
  27. }
  28. }
  29. ]
  30. }
  31. }
  32. }