版权声明:本文为博主原创文章,未经博主允许不得转载。交流请联系:351605040 https://blog.csdn.net/Arvinzr/article/details/79228229
在现实开发中难免会遇到一些业务场景,通过聚合得出相应的频次并进行筛选
1.使用 minDocCount 直接上代码,大家可自行根据业务场景更改
//正确答案
SearchRequestBuilder search = transportlient.prepareSearch("bigdata_idx_2").setTypes("captureCompare");
FilterAggregationBuilder sub= AggregationBuilders.filter("channel_longitudeC").filter(QueryBuilders.rangeQuery("fcmp_time").from(startTime).to(endTime));
//分组字段是id,排序由多个字段排序组成
TermsBuilder tb= AggregationBuilders.terms("fcmp_fobj_id").field("fcmp_fobj_id").valueType(Terms.ValueType.STRING).order(Terms.Order.compound(
Terms.Order.aggregation("channel_longitudeC",false)//先按count,降序排
//如果count相等情况下,使用code的和排序
));
//求和字段1
ValueCountBuilder sb= AggregationBuilders.count("channel_longitudeC");
tb.subAggregation(sb).minDocCount(400);//添加到分组聚合请求中
//将分组聚合请求插入到主请求体重
// search.setPostFilter()
search.addAggregation(tb);
2.稍微复杂些,还有另外一种场景,就是我聚合的同时,需要把其他相应的字段信息也同时返回出来 Top Hits Aggregation
类似SQL : select *,count(*) from XXX group by a ......
SearchResponse response = null;
SearchRequestBuilder responsebuilder = transportlient.prepareSearch("syrk_bigdata_capturecmp_passer_idx")
.setTypes("captureCompare").setFrom(0).setSize(100000);
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("idNumb")
.subAggregation(
AggregationBuilders.topHits("top").setFrom(0)
.setSize(1)).size(100000);
response = responsebuilder.setQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.rangeQuery("fcapTime").from(Long.valueOf(startTime)).to(Long.valueOf(endTime))))
.addSort("idNumb", SortOrder.ASC)
.addAggregation(aggregation)// .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setExplain(true).execute().actionGet();
SearchHits hits = response.getHits();//最后取结果时不要使用此hits
Terms agg = response.getAggregations().get("agg");
long end = System.currentTimeMillis();
System.out.println("ES run time: " + (end - start) + "ms");
/**插入之前首先清除当天数据,以免重复添加**/
SyrkRegionFcapperPasserStatistics temp = new SyrkRegionFcapperPasserStatistics();
temp.setDate(Long.valueOf(startTime));
try{
syrkRegionFcapperPasserStatisticsService.deletePasser(temp);
for (Terms.Bucket entry : agg.getBuckets()) {
String key = (String) entry.getKey(); // bucket key
long docCount = entry.getDocCount(); // Doc count
// We ask for top_hits for each bucket
TopHits topHits = entry.getAggregations().get("top");
for (SearchHit hit : topHits.getHits().getHits()) {
compareUuid= (String) hit.getSource().get("idNumb");
}
/** 读取数据写入mysql **/
}
logger.info("All Analysis Data has insert : date is "+startTime);
}catch (Exception e){
logger.info("Analysis Result Data failed ,date is "+startTime);
}
聚合后的总数取相应的 docCount 其他字段信息从hits 中获取
切记,不要取最外层的hits ,因为外层的hits 和聚合的hits数量会不一致,遍历取回造成数据不一致