类似于SQL语言中的group by,avg,sum等函数
但Aggregations API还提供了更加复杂的统计分析接口。
| 聚合方式 | 解释 |
|---|---|
| Bucket Aggregation | 一些满足特定条件的文档的集合 相当于SQL中的group by |
| Metric Aggregation | 一些数学计算,可以对文档字段统计分析 基于Buckets的基础上进行统计分析,相当于SQL中的count,avg,sum等 |
| Pipeline Aggregation | 对其他的聚合结果进行二次聚合 |
| Metrix Aggregation | 支持对多个字段的操作并提供一个结果矩阵 |
bucket分组聚合
term分组+过滤
GET employee/_search{"size": 0,"aggs": {"job_analysis": {"terms": {"field": "job"},"aggs": {"age_top_1": {"top_hits": {"size": 1,"sort": [{"age": {"order": "desc"}}]}}}}}}
过滤后分组
{"size":10,"query":{"bool":{"filter":[{"term":{"classid":187010}}]}},"aggs":{"lesson_aggs":{"terms":{"field":"lessonstatus"}}}}
range
GET employee/_search{"size": 0,"aggs": {"sal_range_info": {"range": {"field": "sal","ranges": [{"to": 5000},{"from": 5001,"to": 8000},{"from": 8000}]}}}}
根据脚本分组
GET /community_user_bigtable/_doc/_search?{"size":0,"_source":false,"query":{"bool":{"filter":[{"term":{"classid":187010}}]}},"aggs":{"class_aggs":{"terms":{"script":"doc['lessonstatus'].value/10==2"},"aggs":{"test":{"terms":{"script":"doc['lessonstatus'].value%10"}}}}}}
根据filter分组
GET /community_user_bigtable/_doc/_search?{"size":0,"_source":false,"aggs":{"class_aggs":{"terms":{"field":"classid","order": [{"isFillPaperSum":"desc"} ]},"aggs":{"isFillPaperSum":{"sum":{"field":"isfillpaper"}},"loginFilter":{"filter":{"range": {"data_logintime": {"gte": 0}}}},"lessonStatusFilter":{"filter":{"range": {"lessonstatus": {"lte": 60,"gte": 50}}}}}}}}
order分组后排序
{"aggs" : {"genders" : {"terms" : {"field" : "gender","order" : { "_count" : "asc" }}}}}
也可以根据子聚合排序
{"aggs" : {"genders" : {"terms" : {"field" : "gender","order" : { "avg_balance" : "desc" }},"aggs" : {"avg_balance" : { "avg" : { "field" : "balance" } }}}}}
子聚合内字段排序
{"aggs" : {"genders" : {"terms" : {"field" : "gender","order" : { "balance_stats.avg" : "desc" }},"aggs" : {"balance_stats" : { "stats" : { "field" : "balance" } }}}}}
histogram aggregation 支持 sort 但是并不支持 size
{"aggs": {"ipo_year_range": {"aggs": {"max_market_cap": {"max": {"field": "market_cap"}}},"histogram": {"field": "ipo_year","interval": 10,"order": {"_key": "asc"}}}},"size": 0}
Metric聚合(max/min/avg/sum)
stats统计
GET employee/_search{"size": 0,"aggs": {"job_and_salary_info": {"terms": {"field": "job"},"aggs": {"sal_info": {"stats": {// 换avg/sum"field": "sal"}}}}}}
直接sum/avg…某个字段
curl 'http://10.202.11.117:9200/testindex/orders/_search?pretty' -d '{"size": 0,"aggs": {"return_expires_in": {"sum": { // min"field": "expires_in"}}}}'
分组后求百分比
{"size": 0,"aggs": {"states": {"terms": {"field": "gender"},"aggs": {"banlances": {"percentile_ranks": {"field": "balance","values": [20000,40000]}}}}}
filter nested后聚合
{"size":0,"query":{"filtered":{"filter":{"nested":{"path":"nna_risks","filter":{"exists":{"field":"nna_risks.ina_id"}}}}} },"aggs":{"level0":{"terms":{"script":"doc['inp_type'].value"}}}}
对聚合结果过滤
{"aggs": {"t_shirts": {"filter": { "term": { "SOURCETYPE": "0" } },"aggs": {"term": { "avg": { "field": "TITLE.KEYWORD" } }}}}}
{"size": 0,"aggs" : {"messages" : {"filters" : {"filters" : {"errors" : { "match" : { "SOURCETYPE" : "0" }},"warnings" : { "match" : { "SOURCETYPE" : "1" }}}}}}}
返回值解析
1、ES某些聚合统计会存在损失精准度的问题
2、损失精准度的原因是分片处理中间结果,汇总引起的误差,是ES实时性和精准度的权衡
3、可以通过调大shard_size等方法增加精准度
1)、doc_count_error_upper_bound:表示没有在这次聚合中返回,但是可能存在的潜在聚合结果。
2)、sum_other_doc_count:表示这次聚合中没有统计到的文档数。这个好理解,因为ES统计的时候默认只会根据count显示排名前十的分桶。如果分类(这里是目的地)比较多,自然会有文档没有被统计到。
其他参考
https://www.cnblogs.com/duanxz/p/6528161.html
https://www.jianshu.com/p/1b430a637971
