原文地址
An aggregation summarizes your data as metrics, statistics, or other analytics. Aggregations help you answer questions like:

  • What’s the average load time for my website?
  • Who are my most valuable customers based on transaction volume?
  • What would be considered a large file on my network?
  • How many products are in each product category?

聚合将您的数据汇总为指标、统计数据或其他分析。聚合可帮助您回答以下问题:

  • 我的网站的平均加载时间是多少?
  • 根据交易量,谁是我最有价值的客户?
  • 什么会被视为我的网络上的大文件?
  • 每个产品类别中有多少产品?

Elasticsearch organizes aggregations into three categories:
Elasticsearch 将聚合组织为三类:

  • Metric aggregations that calculate metrics, such as a sum or average, from field values.

    1. **指标聚合** 计算诸如总和或平均值,从字段值。
  • Bucket aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria.

桶聚合 基于字段值、范围或其他条件将文档分组到桶中

  • Pipeline aggregations that take input from other aggregations instead of documents or fields.

管道聚合 从其他聚合而不是文档或字段获取输入

Run an aggregation

You can run aggregations as part of a search by specifying the search API‘s aggs parameter. The following search runs a terms aggregation on my-field:
您可以通过指定search APIaggs 参数作为搜索的一部分运行。下面搜索例子在 my-field 进行聚合:

  1. GET /my-index-000001/_search
  2. {
  3. "aggs": {
  4. "my-agg-name": {
  5. "terms": {
  6. "field": "my-field"
  7. }
  8. }
  9. }
  10. }

Aggregation results are in the response’s aggregations object:
聚合结果在响应的aggregations对象中:

  1. {
  2. "took": 78,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 5,
  13. "relation": "eq"
  14. },
  15. "max_score": 1.0,
  16. "hits": [...]
  17. },
  18. "aggregations": { // 聚合结果
  19. "my-agg-name": {
  20. "doc_count_error_upper_bound": 0,
  21. "sum_other_doc_count": 0,
  22. "buckets": []
  23. }
  24. }
  25. }

修改聚合范围

Use the query parameter to limit the documents on which an aggregation runs:
使用 query 参数来限制聚合的文档:

  1. GET /my-index-000001/_search
  2. {
  3. "query": {
  4. "range": {
  5. "@timestamp": {
  6. "gte": "now-1d/d",
  7. "lt": "now/d"
  8. }
  9. }
  10. },
  11. "aggs": {
  12. "my-agg-name": {
  13. "terms": {
  14. "field": "my-field"
  15. }
  16. }
  17. }
  18. }

只返回聚合结果

By default, searches containing an aggregation return both search hits and aggregation results. To return only aggregation results, set size to 0:
默认情况下,包含聚合的搜索会返回搜索命中和聚合结果。若仅返回聚合结果,请设置size为0:

  1. GET /my-index-000001/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "my-agg-name": {
  6. "terms": {
  7. "field": "my-field"
  8. }
  9. }
  10. }
  11. }

运行多个聚合

You can specify multiple aggregations in the same request:
可以在同一个请求中指定多个聚合:

  1. GET /my-index-000001/_search
  2. {
  3. "aggs": {
  4. "my-first-agg-name": {
  5. "terms": {
  6. "field": "my-field"
  7. }
  8. },
  9. "my-second-agg-name": {
  10. "avg": {
  11. "field": "my-other-field"
  12. }
  13. }
  14. }
  15. }

运行一个子聚合

Bucket aggregations support bucket or metric sub-aggregations. For example, a terms aggregation with an avg sub-aggregation calculates an average value for each bucket of documents. There is no level or depth limit for nesting sub-aggregations.
桶聚合支持桶或指标子聚合。例如,基于词的聚合带有avg 子聚合计算每个桶中文档的平均值。嵌套子聚合没有级别或深度限制。

  1. GET /my-index-000001/_search
  2. {
  3. "aggs": {
  4. "my-agg-name": {
  5. "terms": {
  6. "field": "my-field"
  7. },
  8. "aggs": { // 子聚合
  9. "my-sub-agg-name": {
  10. "avg": {
  11. "field": "my-other-field"
  12. }
  13. }
  14. }
  15. }
  16. }
  17. }

The response nests sub-aggregation results under their parent aggregation:
响应在其父聚合下 嵌套 子聚合结果:

  1. {
  2. ...
  3. "aggregations": {
  4. "my-agg-name": { // 父聚合结果
  5. "doc_count_error_upper_bound": 0,
  6. "sum_other_doc_count": 0,
  7. "buckets": [
  8. {
  9. "key": "foo",
  10. "doc_count": 5,
  11. "my-sub-agg-name": { // 子聚合结果
  12. "value": 75.0
  13. }
  14. }
  15. ]
  16. }
  17. }
  18. }

添加自定义元数据

Use the meta object to associate custom metadata with an aggregation:
使用meta对象将自定义元数据与聚合相关联:

  1. GET /my-index-000001/_search
  2. {
  3. "aggs": {
  4. "my-agg-name": {
  5. "terms": {
  6. "field": "my-field"
  7. },
  8. "meta": { // meta 对象
  9. "my-metadata-field": "foo"
  10. }
  11. }
  12. }
  13. }

The response returns the meta object in place:
响应返回meta对象:

  1. {
  2. ...
  3. "aggregations": {
  4. "my-agg-name": {
  5. "meta": {
  6. "my-metadata-field": "foo"
  7. },
  8. "doc_count_error_upper_bound": 0,
  9. "sum_other_doc_count": 0,
  10. "buckets": []
  11. }
  12. }
  13. }

返回聚合类型

By default, aggregation results include the aggregation’s name but not its type. To return the aggregation type, use the typed_keys query parameter.
默认情况下,聚合结果包括聚合的名称,但不包括其类型。要返回聚合类型,请使用typed_keys查询参数。

  1. GET /my-index-000001/_search?typed_keys
  2. {
  3. "aggs": {
  4. "my-agg-name": {
  5. "histogram": {
  6. "field": "my-field",
  7. "interval": 1000
  8. }
  9. }
  10. }
  11. }

The response returns the aggregation type as a prefix to the aggregation’s name.
响应返回聚合类型作为聚合名称的前缀。

⚠️
Some aggregations return a different aggregation type from the type in the request. For example, the terms, significant terms, and percentiles aggregations return different aggregations types depending on the data type of the aggregated field. 某些聚合返回与请求中的类型不同的聚合类型。例如,术语、 重要术语百分位数 聚合根据聚合字段的数据类型返回不同的聚合类型。

  1. {
  2. ...
  3. "aggregations": {
  4. "histogram#my-agg-name": { // 聚合类型 ,histogram后跟#分隔符和聚合名称my-agg-name。
  5. "buckets": []
  6. }
  7. }
  8. }

在聚合中使用脚本

When a field doesn’t exactly match the aggregation you need, you should aggregate on a runtime field:
当一个字段与您需要的聚合不完全匹配时,您应该在runtime 字段上聚合:

  1. GET /my-index-000001/_search?size=0
  2. {
  3. "runtime_mappings": {
  4. "message.length": {
  5. "type": "long",
  6. "script": "emit(doc['message.keyword'].value.length())"
  7. }
  8. },
  9. "aggs": {
  10. "message_length": {
  11. "histogram": {
  12. "interval": 10,
  13. "field": "message.length"
  14. }
  15. }
  16. }
  17. }

Scripts calculate field values dynamically, which adds a little overhead to the aggregation. In addition to the time spent calculating, some aggregations like terms and filters can’t use some of their optimizations with runtime fields. In total, performance costs for using a runtime field varies from aggregation to aggregation.
脚本动态计算字段值,这给聚合增加了一点开销。除了计算花费的时间之外,一些聚合像 termsfilters不能将它们的一些优化用于运行时字段。总的来说,使用运行时字段的性能成本因聚合而异。

聚合缓存

For faster responses, Elasticsearch caches the results of frequently run aggregations in the shard request cache. To get cached results, use the same preference string for each search. If you don’t need search hits, set size to 0 to avoid filling the cache.
为了更快的响应,Elasticsearch 将频繁运行的聚合的结果缓存在分片请求缓存中。要获得缓存的结果,请对每个搜索使用相同的preference字符串。如果你不需要搜索命中,设置size以0避免填充缓存。

Elasticsearch routes searches with the same preference string to the same shards. If the shards’ data doesn’t change between searches, the shards return cached aggregation results.
Elasticsearch 将具有相同首选项字符串的搜索路由到相同的分片。如果分片的数据在搜索之间没有变化,分片会返回缓存的聚合结果。

long 值限制

When running aggregations, Elasticsearch uses double values to hold and represent numeric data. As a result, aggregations on long numbers greater than 253 are approximate.
在运行聚合时,Elasticsearch 使用double值来保存和表示数字数据。因此,对long大于数字的聚合是近似的。253