导读


由于项目需要,项目采用的分库分表,由于不能进行表关联查询,所以就单独分开查询,然后在集合中比较统计不同量。

使用


原本打算

原本查询后在List结合进行排查

  1. List<Map<String, Object>> qlFullCodeList = new ArrayList<>();
  2. qlFullCodeList = statisticalAnalysisMapper.getJcQlsxinfoByProvincialAreaCode(areaList.get(i).getAreaCode(), noWorkItemQuery.getServiceCode(), beginTime, endTime);
  3. itemVOs.setWorkItemNum(qlFullCodeList.size());
  4. //根据区划编码,获取Feature表的事项编码
  5. List<Map<String, Object>> serviceCodeList = new ArrayList<>();
  6. serviceCodeList = statisticalAnalysisMapper.getArchiveFeatureByProvincialServiceCode(areaList.get(i).getAreaCode(), noWorkItemQuery.getServiceCode(), beginTime, endTime);
  7. itemVOs.setProduceWorkItemNum(serviceCodeList.size());
  8. //获取两者之间的差值数
  9. List<Map<String, Object>> finalServiceCodeList = serviceCodeList;
  10. List<Map<String, Object>> reduce1 = qlFullCodeList.stream().filter(item -> !finalServiceCodeList.contains(item)).collect(Collectors.toList());
  11. itemVOs.setNoWorkItemNum(reduce1.size());

在数据量几百万上亿条的情况下,需要约三四分钟,才能筛选出不同的数量。
**

改进后

由原本的List改成Set后,效率就变好一些了。

  1. List<Map<String,Object>> finalServiceCodeList=new ArrayList<>();
  2. Set<Map<String, Object>> maps = new HashSet<>(finalServiceCodeList);
  1. List<Map<String, Object>> qlFullCodeList = new ArrayList<>();
  2. qlFullCodeList = statisticalAnalysisMapper.getJcQlsxinfoByProvincialAreaCode(cityList.get(i).getAreaCode(), noWorkItemQuery.getServiceCode(), beginTime, endTime);
  3. itemVO.setWorkItemNum(qlFullCodeList.size());
  4. //根据区划编码,获取Feature表的事项编码
  5. List<Map<String, Object>> serviceCodeList = new ArrayList<>();
  6. serviceCodeList = statisticalAnalysisMapper.getArchiveFeatureByProvincialServiceCode(cityList.get(i).getAreaCode(), noWorkItemQuery.getServiceCode(), beginTime, endTime);
  7. itemVO.setProduceWorkItemNum(serviceCodeList.size());
  8. //获取两者之间的差值数
  9. Set<Map<String, Object>> maps = new HashSet<>(serviceCodeList);
  10. List<Map<String, Object>> reduce1 = qlFullCodeList.stream().filter(item -> !maps.contains(item)).collect(Collectors.toList());

在数据量几百万上亿条的情况下,使用Set查询出来需要一分钟左右,才能筛选出不同的数量。不过效率目前还算提上了一些,以后有机会在进行其他优化。

  • 原因image.png

对List对象里的其中两个属性配套去重

  1. List<User> disList= list.stream().collect(Collectors.collectingAndThen(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(o -> o.getId() + ";" + o.getId()))), ArrayList::new));

END


搞定~