问题描述

在某生产环境中需要查询一天的操作数据,研发同事反馈该查询SQL正常情况下能3秒内返回查询结果,当查询2018-08-31时查询执行超过10分钟,查询SQL(已屏蔽业务表名)为:

  1. SELECT COUNT(1)
  2. FROM OBM
  3. LEFT JOIN OTLR ON OTLR.OUTBOUND_NO = obm.OUTBOUND_NO AND OTLR.YN =0
  4. LEFT JOIN RTM ON RTM.RECEIPT_NO = OBM.OUTBOUND_NO AND RTM.yn =0
  5. LEFT JOIN RTMM ON RTMM.RECEIPT_NO = RTM.RECEIPT_NO AND RECEIPT_STATUS = '404' AND RTMM.yn =0
  6. LEFT JOIN PBD ON PBD.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND PBD.yn = 0
  7. LEFT JOIN PBM ON PBM.BATCH_NO = PBD.BATCH_NO AND PBM.yn =0
  8. LEFT JOIN PTD ON PTD.OUTBOUND_NO = OBM.OUTBOUND_NO AND PTD.yn =0
  9. LEFT JOIN OCB ON OCB.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OCB.yn =0
  10. LEFT JOIN OLSM ON OLSM.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OLSM.yn =0
  11. WHERE OBM.OUTBOUND_TYPE IN(10,18) AND 1 = 1
  12. AND (OBM.INTERVAL_ENDTIME>=cast('2018/8/31/00:00:00' as datetime)
  13. AND OBM.INTERVAL_ENDTIME<=cast('2018/8/31/23:59:59' as datetime))

问题分析

针对该问题,我们选取2018-09-01来进行对比,发现两条SQL执行计划相同,当实际执行中查询2018-09-01的数据仅需要1.96秒,而查询2018-08-31的数据五分钟没有返回结果,因此我们推断两天的数据差异导致查询性能异常,将查询SQL进行分解测试,找出异常部分。

首先查询OBM表中满足条件的数据,返回结果:16891,查询时间:0.67秒,查询SQL为:

  1. SELECT COUNT(1)
  2. FROM OBM
  3. #LEFT JOIN OTLR ON OTLR.OUTBOUND_NO = obm.OUTBOUND_NO AND OTLR.YN =0
  4. #LEFT JOIN RTM ON RTM.RECEIPT_NO = OBM.OUTBOUND_NO AND RTM.yn =0
  5. #LEFT JOIN RTMM ON RTMM.RECEIPT_NO = RTM.RECEIPT_NO AND RECEIPT_STATUS = '404' AND RTMM.yn =0
  6. #LEFT JOIN PBD ON PBD.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND PBD.yn = 0
  7. #LEFT JOIN PBM ON PBM.BATCH_NO = PBD.BATCH_NO AND PBM.yn =0
  8. #LEFT JOIN PTD ON PTD.OUTBOUND_NO = OBM.OUTBOUND_NO AND PTD.yn =0
  9. #LEFT JOIN OCB ON OCB.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OCB.yn =0
  10. #LEFT JOIN OLSM ON OLSM.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OLSM.yn =0
  11. WHERE OBM.OUTBOUND_TYPE IN(10,18) AND 1 = 1
  12. AND (OBM.INTERVAL_ENDTIME>=cast('2018/8/31/00:00:00' as datetime)
  13. AND OBM.INTERVAL_ENDTIME<=cast('2018/8/31/23:59:59' as datetime))

尝试关联OTLRRTM两个表,返回结果:24601,查询时间:1.00秒,查询SQL为:

  1. SELECT COUNT(1)
  2. FROM OBM
  3. LEFT JOIN OTLR ON OTLR.OUTBOUND_NO = obm.OUTBOUND_NO AND OTLR.YN =0
  4. LEFT JOIN RTM ON RTM.RECEIPT_NO = OBM.OUTBOUND_NO AND RTM.yn =0
  5. #LEFT JOIN RTMM ON RTMM.RECEIPT_NO = RTM.RECEIPT_NO AND RECEIPT_STATUS = '404' AND RTMM.yn =0
  6. #LEFT JOIN PBD ON PBD.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND PBD.yn = 0
  7. #LEFT JOIN PBM ON PBM.BATCH_NO = PBD.BATCH_NO AND PBM.yn =0
  8. #LEFT JOIN PTD ON PTD.OUTBOUND_NO = OBM.OUTBOUND_NO AND PTD.yn =0
  9. #LEFT JOIN OCB ON OCB.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OCB.yn =0
  10. #LEFT JOIN OLSM ON OLSM.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OLSM.yn =0
  11. WHERE OBM.OUTBOUND_TYPE IN(10,18) AND 1 = 1
  12. AND (OBM.INTERVAL_ENDTIME>=cast('2018/8/31/00:00:00' as datetime)
  13. AND OBM.INTERVAL_ENDTIME<=cast('2018/8/31/23:59:59' as datetime))

按照上面测试方法,我们继续测试关联更多表后查询性能发现:

  1. 首先关联```OTLR``````RTM```两个表,返回结果:24601,查询时间:1.00秒,
  2. 再尝试关联```RTMM``````PBD```两个表,返回结果:25198,查询时间:1.44
  3. 再尝试关联```PBM```表,返回结果:27230,查询时间:1.56
  4. 再尝试关联```PTD```表,返回结果:6241432,查询时间:8.52
  5. 尝试关联所有表,返回结果:290400318,查询时间:582

通过对比查询,发现查询2018-09-01返回81000条记录,而查询2018-08-31返回290400318条记录,记录数差距3585倍,所以查询出现”性能问题”。

解决办法

该问题表面上是”性能问题”,但实际上是”SQL正确性问题”,该SQL没有正确使用关联条件或该SQL需求存在问题,虽然查询SQL能执行成功并返回数据,但不代表该返回数据”合理正确”,应重新检查该SQL需求合理性和关联条件正确性。