Geo Distance Aggregation(地理距离聚合)

原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html

译文链接 : Geo Distance Aggregation(地理距离聚合)

贡献者 : @于永超,ApacheCNApache中文网

Geo Distance Aggregation

在geo_point字段上工作的多bucket聚合和概念上的工作非常类似于range(范围)聚合.用户可以定义原点的点和距离范围的集合。聚合计算每个文档值与原点的距离,并根据范围确定其所属的bucket(桶)(如果文档和原点之间的距离落在bucket(桶)的距离范围内,则文档属于bucket(桶) )

  1. PUT /museums
  2. {
  3. "mappings": {
  4. "doc": {
  5. "properties": {
  6. "location": {
  7. "type": "geo_point"
  8. }
  9. }
  10. }
  11. }
  12. }
  13. POST /museums/doc/_bulk?refresh
  14. {"index":{"_id":1}}
  15. {"location": "52.374081,4.912350", "name": "NEMO Science Museum"}
  16. {"index":{"_id":2}}
  17. {"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"}
  18. {"index":{"_id":3}}
  19. {"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"}
  20. {"index":{"_id":4}}
  21. {"location": "51.222900,4.405200", "name": "Letterenhuis"}
  22. {"index":{"_id":5}}
  23. {"location": "48.861111,2.336389", "name": "Musée du Louvre"}
  24. {"index":{"_id":6}}
  25. {"location": "48.860000,2.327000", "name": "Musée d'Orsay"}
  26. POST /museums/_search?size=0
  27. {
  28. "aggs" : {
  29. "rings_around_amsterdam" : {
  30. "geo_distance" : {
  31. "field" : "location",
  32. "origin" : "52.3760, 4.894",
  33. "ranges" : [
  34. { "to" : 100000 },
  35. { "from" : 100000, "to" : 300000 },
  36. { "from" : 300000 }
  37. ]
  38. }
  39. }
  40. }
  41. }

响应结果:

  1. {
  2. ...
  3. "aggregations": {
  4. "rings_around_amsterdam" : {
  5. "buckets": [
  6. {
  7. "key": "*-100000.0",
  8. "from": 0.0,
  9. "to": 100000.0,
  10. "doc_count": 3
  11. },
  12. {
  13. "key": "100000.0-300000.0",
  14. "from": 100000.0,
  15. "to": 300000.0,
  16. "doc_count": 1
  17. },
  18. {
  19. "key": "300000.0-*",
  20. "from": 300000.0,
  21. "doc_count": 2
  22. }
  23. ]
  24. }
  25. }
  26. }

指定的字段必须是geo_point类型(只能在映射中显式设置)。它还可以保存一个geo_point字段的数组,在这种情况下,在聚合期间将考虑所有这些字段。原点可以接受geo_point类型支持的所有格式:

  • 对象格式:{ “lat” : 52.3760, “lon” : 4.894 }- 这是最安全的格式,因为它是最明确的lat (纬度)& lon(经度)值
  • 字符串格式:”52.3760, 4.894” - 第一个数值是lat(纬度),第二个是lon(经度)
  • 数组格式:[4.894, 52.3760] - 它基于GeoJson标准,第一个数字是lon(经度),第二个数字是lat(纬度)

在默认情况下,距离单位是m(米),但它也可以接受:mi(英里),in(英寸),yd(码),km(公里),cm(厘米),毫米(毫米)。

  1. POST /museums/_search?size=0
  2. {
  3. "aggs" : {
  4. "rings" : {
  5. "geo_distance" : {
  6. "field" : "location",
  7. "origin" : "52.3760, 4.894",
  8. "unit" : "km", 1
  9. "ranges" : [
  10. { "to" : 100 },
  11. { "from" : 100, "to" : 300 },
  12. { "from" : 300 }
  13. ]
  14. }
  15. }
  16. }
  17. }

#1 距离将以公里计算

有两种距离计算模式:arc(默认) 和 plane, arc(电弧)计算模式是最准确的,plane模式是最快的,但是最不准确。当考虑搜索上下文是“narrow”,跨越较小的地理区域(约5km)可以用plane,plane将为非常大的区域(例如跨大陆搜索)的搜索返回更高的误差区间。距离计算类型可以使用distance_type参数设置。

  1. POST /museums/_search?size=0
  2. {
  3. "aggs" : {
  4. "rings" : {
  5. "geo_distance" : {
  6. "field" : "location",
  7. "origin" : "52.3760, 4.894",
  8. "unit" : "km",
  9. "distance_type" : "plane",
  10. "ranges" : [
  11. { "to" : 100 },
  12. { "from" : 100, "to" : 300 },
  13. { "from" : 300 }
  14. ]
  15. }
  16. }
  17. }
  18. }

Keyed Response

将keyed标志设置为true会将一个惟一的字符串键与每个bucket(桶)关联起来,并将范围作为散列而不是数组返回:

  1. POST /museums/_search?size=0
  2. {
  3. "aggs" : {
  4. "rings_around_amsterdam" : {
  5. "geo_distance" : {
  6. "field" : "location",
  7. "origin" : "52.3760, 4.894",
  8. "ranges" : [
  9. { "to" : 100000 },
  10. { "from" : 100000, "to" : 300000 },
  11. { "from" : 300000 }
  12. ],
  13. "keyed": true
  14. }
  15. }
  16. }
  17. }

返回结果:

  1. {
  2. ...
  3. "aggregations": {
  4. "rings_around_amsterdam" : {
  5. "buckets": {
  6. "*-100000.0": {
  7. "from": 0.0,
  8. "to": 100000.0,
  9. "doc_count": 3
  10. },
  11. "100000.0-300000.0": {
  12. "from": 100000.0,
  13. "to": 300000.0,
  14. "doc_count": 1
  15. },
  16. "300000.0-*": {
  17. "from": 300000.0,
  18. "doc_count": 2
  19. }
  20. }
  21. }
  22. }
  23. }

也可以为每个范围自定义key

  1. POST /museums/_search?size=0
  2. {
  3. "aggs" : {
  4. "rings_around_amsterdam" : {
  5. "geo_distance" : {
  6. "field" : "location",
  7. "origin" : "52.3760, 4.894",
  8. "ranges" : [
  9. { "to" : 100000, "key": "first_ring" },
  10. { "from" : 100000, "to" : 300000, "key": "second_ring" },
  11. { "from" : 300000, "key": "third_ring" }
  12. ],
  13. "keyed": true
  14. }
  15. }
  16. }
  17. }

返回结果:

  1. {
  2. ...
  3. "aggregations": {
  4. "rings_around_amsterdam" : {
  5. "buckets": {
  6. "first_ring": {
  7. "from": 0.0,
  8. "to": 100000.0,
  9. "doc_count": 3
  10. },
  11. "second_ring": {
  12. "from": 100000.0,
  13. "to": 300000.0,
  14. "doc_count": 1
  15. },
  16. "third_ring": {
  17. "from": 300000.0,
  18. "doc_count": 2
  19. }
  20. }
  21. }
  22. }
  23. }