众所周知，Elasticsearch 是一个分布式的全文搜索引擎，索引和搜索是 Elasticsearch 的基本功能。此外其提供的聚合（Aggregations）功能也十分强大，允许在数据上做复杂的分析统计并且性能很高。通过聚合，我们会得到一个数据的概览，是分析和总结全套的数据，而不是寻找单个文档。Elasticsearch 提供的聚合分析功能主要有桶聚合、指标聚合、管道聚合和矩阵聚合四大类。

Bucket Aggregation：可以把一些满足特定条件的文档分成一个一个的桶，类似 SQL 中的 group by。

Metric Aggregation：提供了一些数学运算，可以对文档字段进行统计分析。除了支持在字段上进行计算也支持在脚本（painless script）产生的结果之上进行计算。大多数 Metric 是数学计算，仅输出一个值，部分 Metric 支持输出多个数值。

Pipeline Aggregation：对其他的聚合结果进行二次聚合

Matrix Aggregration：可以对多个字段操作，并提供一个结果矩阵

下面我们来详细介绍 Elasticsearch 提供的聚合分析功能：

指标聚合

1. max

一个单值的指标聚合，用于最大值统计。统计值可以从文档中的特定数字字段中提取，也可以由提供的脚本来生成。如下示例统计 sales 索引中价格最高的是哪个文档，这里指定 size 为 0 表示只返回聚合值，不返回原始文档信息：

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "max_price" : { 
                "max" : { "field" : "price" } 
        }
    }
}'

返回结果如下：

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_price" : {
      "value" : 200.0
    }
  }
}

可以看到，聚合的名称（max_price）在返回信息中也作为键，通过该键可以从响应中检索聚合结果。

如果 price 字段值为 NULL 时，可通过 missing 参数定义如何处理没有值的文档，示例如下，将 price 字段值为 NULL 的文档归入与值为 10 的文档相同的 bucket 中：

curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "max_price" : {
            "max" : {
                "field" : "price",
                "missing": 10 
            }
        }
    }
}'

2. min

一个单值的指标聚合，用于最小值统计，使用同 max 一样，示例如下：

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "min_price": {
      "min": {
        "field": "price"
      }
    }
  }
}'

3. avg

一个单值的指标聚合，用于计算平均值，使用同 max 一样，示例如下：

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}'

4. sum

一个单值的指标聚合，用于计算总和，使用同 max 一样，如下计算 user 字段为 kimchy 的文档价格总和：

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "user": "kimchy"
    }
  },
  "aggs": {
    "sum_prices": {
      "sum": {
        "field": "price"
      }
    }
  }
}'

5. cardinality

一个单值的指标聚合，用于基数统计，其作用是先执行类似 SQL 中的 distinct 操作，去掉集合中的重复项，然后统计排重后的集合长度，注意不要对 text 字段进行统计，使用同 max 一样，示例如下：

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "user_count" : {
            "cardinality" : {
                "field" : "user.keyword"
            }
        }
    }
}'

6. weighted_avg

一个单值的指标聚合，用于计算加权平均值，计算公式为： Elasticsearch Aggregation API - 图1 。如果我们的文档有一个包含 0-100 数字分数的 “grade” 字段，以及一个包含任意数字权重的 “weight” 字段，我们可以使用以下方法计算加权平均值：

curl -X POST "localhost:9200/exams/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size": 0,
    "aggs" : {
        "weighted_grade": {
            "weighted_avg": {
                "value": {
                    "field": "grade"
                },
                "weight": {
                    "field": "weight"
                }
            }
        }
    }
}'

7. value_count

一个单值的指标聚合，可按字段统计文档数量，使用同 max 一样，下面示例用于统计 sales 索引中包含 type 字段的文档的数量：

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "types_count" : { 
                "value_count" : { "field" : "type" } 
        }
    }
}'

8. stats

一个多值的指标聚合，用于基本统计，会一次返回 count、max、min、avg 和 sum 这 5 个指标。使用同 max 一样，示例如下：

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "price_stats" : {
            "stats" : {
                "field" : "price"
            }
        }
    }
}'

9. extended_stats

一个多值的指标聚合，用于高级统计，和基本统计功能类似，但是会比基本统计多 4 个统计结果：平方和、方差、标准差、平均值加/减两个标准差的区间。使用同 stats 一样，示例如下：

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "price_stats" : {
            "extended_stats" : {
                "field" : "price"
            }
        }
    }
}'

返回结果如下：

{
  ...
  {
    "aggregations": {
      "price_stats": {
        "count": 2,
        "min": 50,
        "max": 100,
        "avg": 75,
        "sum": 150,
        "sum_of_squares": 12500,
        "variance": 625,
        "std_deviation": 25,
        "std_deviation_bounds": {
          "upper": 125,
          "lower": 25
        }
      }
    }
  }
}

10. percentiles

一个多值的指标聚合，用于百分位统计。百分位数是一个统计学术语，如果将一组数据从大到小排序，并计算相应的累计百分位，某一百分位所对应数据的值就称为这一百分位的百分位数。例如，第 95 百分位是大于观测值的 95% 的值。

统计字段必须是数字类型，使用示例如下：

curl -X GET "localhost:9200/latency/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "load_time_outlier" : {
            "percentiles" : {
                "field" : "load_time" 
            }
        }
    }
}'

默认情况下，百分位统计返回 [1, 5, 25, 50, 75, 95, 99] 范围的统计值，如下：

{
  ...
  {
    "aggregations": {
      "load_time_outlier": {
        "values": {
          "1.0": 5,
          "5.0": 25,
          "25.0": 165,
          "50.0": 445,
          "75.0": 725,
          "95.0": 945,
          "99.0": 985
        }
      }
    }
  }
}

此外，我们也可以通过 percents 参数指定我们感兴趣的百分比，请求的百分比必须是 0-100 之间的值。

curl -X GET "localhost:9200/latency/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "load_time_outlier" : {
            "percentiles" : {
                "field" : "load_time" ,
                "percents" : [95, 99, 99.9]
            }
        }
    }
}'

11. percentile_rank

一个多值的指标聚合，用于百分位统计，与 percentiles 的作用正好相反，它传入的是具体的值，返回该值对应的百分位数。

curl -X GET "localhost:9200/latency/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "load_time_ranks" : {
            "percentile_ranks" : {
                "field" : "load_time" ,
                "values" : [500, 600]
            }
        }
    }
}'

返回结果如下：

{
  ...
  {
    "aggregations": {
      "load_time_ranks": {
        "values": {
          "500": 55.00000000000001,
          "600": 64.0
        }
      }
    }
  }
}

12. top_hits

top_hits 常用于子聚合，以便可以对每个桶聚合顶部匹配的文档，通过 sort 参数指定如何排序。如下示例，我们按 type 字段对 sales 索引进行分组，每个 type 取 date 最大的文档，对于每个文档，源代码中只包含日期和价格字段。

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs": {
        "top_tags": {
            "terms": {
                "field": "type",
                "size": 3
            },
            "aggs": {
                "top_sales_hits": {
                    "top_hits": {
                        "sort": [
                            {
                                "date": { "order": "desc" }
                            }
                        ],
                        "_source": {
                            "includes": [ "date", "price" ]
                        },
                        "size" : 1
                    }
                }
            }
        }
    }
}'

桶聚合

桶聚合不像指标聚合那样计算字段上的指标，而是创建文档桶（bucket）。每个桶都与一个条件相关联，它会遍历索引中的文档，凡是符合某一要求的文档就放入一个文档桶中，分桶相当于 SQL 语句中的 group by。除了桶本身外，桶聚合还计算并返回属于每个桶的文档数量。

此外，桶聚合还支持子聚合。单个响应中允许的最大桶数受到 search.max_buckets 的动态集群设置的限制，默认值为 10000，尝试返回超过限制的请求将失败并抛出异常。

1. terms

用于分组聚合，桶是动态构建的——每个唯一的值是一个桶。terms 聚合应该使用 keyword 字段类型或适合桶聚合的任何其他数据类型的字段。

curl -X GET "localhost:9200/twitter/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size" : 0,
    "aggs" : {
        "user_group" : {
            "terms" : { "field" : "user.keyword" } 
        }
    }
}'

聚合结果如下（精简数据）：


{
  "aggregations" : {
    "user_group" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "kimchy",
          "doc_count" : 2
        },
        {
          "key" : "Iverson",
          "doc_count" : 1
        }
      ]
    }
  }
}

默认情况下，terms 聚合将返回按文档计数排序的前 10 个桶，如果分桶结果超过 10 个，则未被展示的文档数会在 sum_other_doc_count 内显示，此外我们还可以通过在聚合内部设置 size 参数来指定要返回的桶数量：

curl -X GET "localhost:9200/twitter/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size" : 0,
    "aggs" : {
        "user_group" : {
            "terms" : {
              "field" : "user.keyword" ,
              "size" : 15
            }
        }
    }
}'

size 参数的含义：

通过 size 可以定义从总的分桶中返回多少个 terms 桶。默认协调搜索过程的节点会请求每个分片，取每个分片上自己的 top-size 桶，然后统计所有分片返回的 top-size 桶，最后对整体取 top-size 个桶返回给客户端。这意味着返回的分桶列表是不准确的，因为分片数据可能不均衡，有可能某个分片上过滤掉的桶比其他分片上返回的桶的值还大。

为此，Elasticsearch 提供了 shard_size 参数，该参数用来指定在每个分片上请求的 top-size 桶，当所有分片返回自己的 top-size 桶后，再对整体取 size 参数配置的 top-size 桶。这种方式可以在一定程度上增加返回结果的准确性。默认的 shard_size 为 (size * 1.5 + 10)。

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "sales_over_time" : {
            "terms" : {
                "field" : "user.keyword",
                "shard_size" : "15",
                "size": "10"
            }
        }
    }
}'

2. filter

用于过滤器聚合，通常用于将当前聚合上下文缩小到特定的文档集，常与其他过滤搭配使用。示例如下：

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "t_shirts" : {
            "filter" : { "term": { "type": "t-shirt" } },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }
        }
    }
}'

返回结果如下：

{
  "aggregations": {
    "t_shirts": {
      "doc_count": 3,
      "avg_price": {
        "value": 128.33333333333334
      }
    }
  }
}

3. filters

多过滤器聚合，其中每个桶都与一个过滤器相关联，可以把符合多个过滤条件的文档分到不同的桶中。下面示例中的 filters 包含两个 match query，对每个 query 的查询结果进行分组统计：

curl -X GET "localhost:9200/twitter/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs" : {
    "pre_avg_price" : {
      "filters" : {
        "filters" : {
          "java" :   { "match" : { "type" : "java"   }},
          "python" : { "match" : { "type" : "python" }}
        }
      },
      "aggs" : {
        "avg_price" : {
          "avg" : { "field" : "price"   }
        }
      }
    }
  }
}'

返回结果如下：

{
  "aggregations" : {
    "pre_avg_price" : {
      "buckets" : {
        "java" : {
          "doc_count" : 10,
          "avg_price" : {
            "value" : 24
          }
        },
        "python" : {
          "doc_count" : 6,
          "avg_price" : {
            "value" : 38
          }
        }
      }
    }
  }
}

4. range

范围聚合，用户可以定义一组范围——每个范围代表一个桶，主要用于反映数据的分布情况。每个 range 的范围由 from 和 to 参数组成，并且包含 from 值不包含 to 值。示例如下：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 100.0 },
                    { "from" : 100.0, "to" : 200.0 },
                    { "from" : 200.0 }
                ]
            }
        }
    }
}'

返回结果如下：

{
  "aggregations": {
    "price_ranges": {
      "buckets": [
        {
          "key": "*-100.0",
          "to": 100,
          "doc_count": 2
        },
        {
          "key": "100.0-200.0",
          "from": 100,
          "to": 200,
          "doc_count": 2
        },
        {
          "key": "200.0-*",
          "from": 200,
          "doc_count": 3
        }
      ]
    }
  }
}

返回结果默认是按定义的范围桶进行排序的，且 key 的值为范围值，如果需要自定义 key 的值，示例如下：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "keyed" : true,
                "ranges" : [
                    { "key" : "cheap", "to" : 100 },
                    { "key" : "average", "from" : 100, "to" : 200 },
                    { "key" : "expensive", "from" : 200 }
                ]
            }
        }
    }
}'

返回结果如下：

{
  "aggregations": {
    "price_ranges": {
      "buckets": {
        "cheap": {
          "to": 100,
          "doc_count": 2
        },
        "average": {
          "from": 100,
          "to": 200,
          "doc_count": 2
        },
        "expensive": {
          "from": 200,
          "doc_count": 3
        }
      }
    }
  }
}

5. date_range

date_range 专门用于日期类型的范围聚合，和 range 的区别在于日期的起止值可以使用日期数学表达式，而且还可以指定返回的 from 和 to 响应字段的日期格式。同 date 一样，聚合值包含 from 不包含 to值。

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs": {
        "post_date_range": {
            "date_range": {
                "field": "post_date",
                "format": "yyyy-MM-dd",
                "ranges": [
                    { "to": "now-10M/M" }, 
                    { "from": "now-10M/M" } 
                ]
            }
        }
    }
}'

在上面的例子中，我们创建了两个范围桶，第一个桶为 post_date 字段在 10 个月之前的文档，第二个桶为 post_date 字段在 10 个月之后的文档。返回结果如下：

{
  "aggregations" : {
    "post_date_range" : {
      "buckets" : [
        {
          "key" : "*-2020-09-01",
          "to" : 1.5989184E12,
          "to_as_string" : "2020-09-01",
          "doc_count" : 14
        },
        {
          "key" : "2020-09-01-*",
          "from" : 1.5989184E12,
          "from_as_string" : "2020-09-01",
          "doc_count" : 1
        }
      ]
    }
  }
}

同 date 一样，我们也可以通过 keyed 参数自定义返回的 key 的名称：

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs": {
        "post_date_range": {
            "date_range": {
                "field": "post_date",
                "format": "yyyy-MM-dd",
                "ranges": [
                    { "from": "2015-01-01",  "to": "2015-03-01", "key": "quarter_01" },
                    { "from": "2015-03-01",  "to": "2015-06-01", "key": "quarter_02" }
                ],
                "keyed": true
            }
        }
    }
}'

6. ip_range

与日期字段有专门的 date_range 聚合类型一样，ip 字段也有专门的范围聚合。

curl -X GET "localhost:9200/ip_addresses/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size": 10,
    "aggs" : {
        "ip_ranges_buckets" : {
            "ip_range" : {
                "field" : "ip",
                "ranges" : [
                    { "to" : "10.0.0.5" },
                    { "from" : "10.0.0.5" }
                ]
            }
        }
    }
}'

返回结果如下：

{
  "aggregations": {
    "ip_ranges": {
      "buckets": [
        {
          "key": "*-10.0.0.5",
          "to": "10.0.0.5",
          "doc_count": 10
        },
        {
          "key": "10.0.0.5-*",
          "from": "10.0.0.5",
          "doc_count": 260
        }
      ]
    }
  }
}

7. histogram

直方图聚合，应用于数字类型的字段，会动态地在数字类型的值上构建固定大小（间隔）的桶。假设文档有一个名为 price 的数字类型字段，我们可以配置该聚合以动态构建间隔为 5 的 bucket。当执行聚合时，每个文档的 price 字段将被计算，并四舍五入到其最近的 bucket。

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "prices_histogram" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50
            }
        }
    }
}'

返回结果如下：

{
  "aggregations": {
        "prices_histogram" : {
            "buckets": [
                {
                    "key": 0.0,
                    "doc_count": 1
                },
                {
                    "key": 50.0,
                    "doc_count": 1
                },
                {
                    "key": 100.0,
                    "doc_count": 0
                },
                {
                    "key": 150.0,
                    "doc_count": 2
                },
                {
                    "key": 200.0,
                    "doc_count": 3
                }
            ]
        }
    }
}

此外，直方图聚合还支持以下参数：

interval：指定桶的间隔
min_doc_count：分桶需包含的最小文档数，如果分桶的 doc_count 小于该值，则对应分桶不返回
extended_bounds：min 指定直方图的最小值，max 指定直方图的最大值

8. date_histogram

date_histogram 是时间直方图聚合，只适用于日期类型的字段，用于按照日期对文档进行直方图统计。由于日期类型在 Elasticsearch 内部表示为长整型值，所以也可以在日期上使用 histogram，但是不那么准确，因为基于时间的间隔并不总是固定的长度，而 date_histogram 则可以使用日期表达式来分桶。

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "sales_over_time" : {
            "date_histogram" : {
                "field" : "post_date",
                "interval" : "month"
            }
        }
    }
}'

上面示例对 post_date 字段按月分桶聚合，以 post_date 字段的最小值为第一个桶，最大值为最后一个桶。

默认情况下，date_histogram 会返回文档数为 0 的分桶，也可以通过 min_doc_count 参数指定分桶需包含的最小文档数。其中 interval 的可选值为：year、month、week、day、hour、minute、second。除了标准的日期单位还可以自定义日期间隔，如：90m 表示以 90 分钟为分桶间隔。

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "sales_over_time" : {
            "date_histogram" : {
                "field" : "post_date",
                "interval" : "90m"
            }
        }
    }
}'

9. auto_date_histogram

类似于 date_histogram，但没有提供间隔作为每个桶的宽度，而是提供了指示所需的桶数的目标桶数，并且自动选择桶的间隔以最佳地实现该目标。返回的桶的数量将始终小于或等于这个目标数量。如果不指定，则默认返回十个桶，示例如下：

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "twitter_over_time" : {
            "auto_date_histogram" : {
                "field" : "post_date",
                "buckets" : 10
            }
        }
    }
}'

10. nested

一种特殊的单桶聚合，用于聚合嵌套文档。嵌套聚合需要通过 path 参数指定嵌套文档的路径，然后可以在这些嵌套文档上定义任何类型的聚合。

假设有如下定义，其中 resellers 字段类型为 nested：

curl -X PUT "localhost:9200/products?pretty" -H 'Content-Type: application/json' -d'
{
    "mappings": {
        "properties" : {
            "resellers" : { 
                "type" : "nested",
                "properties" : {
                    "reseller" : { "type" : "text" },
                    "price" : { "type" : "double" }
                }
            }
        }
    }
}'

下面的请求返回可以购买的产品的最低价格：

curl -X GET "localhost:9200/products/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "resellers_buckets" : {
            "nested" : {
                "path" : "resellers"
            },
            "aggs" : {
                "min_price" : { "min" : { "field" : "resellers.price" } }
            }
        }
    }
}'

返回结果如下，doc_count 为 nested 文档的数量：

{
  "aggregations" : {
    "resellers_buckets" : {
      "doc_count" : 2,
      "min_price" : {
        "value" : 350.0
      }
    }
  }
}

11. composite

composite 是一个多桶聚合，类似 SQL 中的多 group by 多字段，可以高效地对多个字段进行聚合。sources 参数控制用于构建复合桶的源，定义 sources 的顺序很重要，因为它还控制着键的返回顺序。每个 sources 的名称必须是唯一的。sources 值有以下三种类型：

Terms：相当于简单的 terms 聚合

curl -X GET "localhost:9200/twitter/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "my_buckets": {
            "composite" : {
                "sources" : [
                    { "user_source": { "terms" : { "field": "user.keyword" } } }
                ]
            }
        }
     }
}'

Histogram：用于数值类型，通过 interval 参数定义间隔

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "my_buckets": {
            "composite" : {
                "sources" : [
                    { "histo": { "histogram" : { "field": "price", "interval": 5 } } }
                ]
            }
        }
    }
}'

Date Histogram：用于日期类型

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "my_buckets": {
            "composite" : {
                "sources" : [
                    { "date": { "date_histogram" : { "field": "timestamp", "interval": "1d" } } }
                ]
            }
        }
    }
}'

sources 参数支持传入一个数组列表以实现多字段聚合：

curl -X GET "localhost:9200/twitter/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "my_buckets": {
            "composite" : {
                "sources" : [
                    { "user": { "terms": {"field": "user.keyword" } } },
                    { "date": { "date_histogram": { "field": "post_date", "interval": "year" } } }
                ]
            }
        }
    }
}'

12. missing

空值聚合，可以把索引中缺失某一字段或该字段值为 NULL 的文档分到一个桶中。该聚合器经常与其他类型的聚合器一起使用，以返回所有文档的信息，这些文档由于缺少字段数据值而不能放在任何其他桶中。

curl -X POST "localhost:9200/twitter/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "twitter_without_a_user" : {
            "missing" : { "field" : "user.keyword" }
        }
    }
}'

返回结果如下：

{
    "aggregations" : {
        "twitter_without_a_user" : {
            "doc_count" : 2
        }
    }
}

管道聚合

管道（Pipeline）聚合支持对聚合分析的结果，再次进行二次的聚合分析。Pipeline 的分析结果会输出到原结果当中，根据位置的不同分为两类：

Sibling：结果与现有分析结果同级，例如 max、min、avg、sum、stats、extended stats、percentiles bucket
Parent：结果内嵌到现有的聚合分析结果之中

如下示例，使用 min_bucket 聚合，通过 buckets_path 指定要二次聚合的路径，获取不同 job 分桶中平均工资最低的桶。

curl -X POST "localhost:9200/employees/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "size": 0,
    "aggs" : {
        "jobs" : {
            "term" : {
                "field" : "job.keyword",
                "size" : 10
            },
            "aggs": {
                "avg_salary": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        },
        "min_salary_by_jobs": {
            "min_bucket": {
                "buckets_path": "jobs>avg_salary" 
            }
        }
    }
}'