问题描述
在某生产环境中需要查询一天的操作数据,研发同事反馈该查询SQL正常情况下能3秒内返回查询结果,当查询2018-08-31
时查询执行超过10分钟,查询SQL(已屏蔽业务表名)为:
SELECT COUNT(1)
FROM OBM
LEFT JOIN OTLR ON OTLR.OUTBOUND_NO = obm.OUTBOUND_NO AND OTLR.YN =0
LEFT JOIN RTM ON RTM.RECEIPT_NO = OBM.OUTBOUND_NO AND RTM.yn =0
LEFT JOIN RTMM ON RTMM.RECEIPT_NO = RTM.RECEIPT_NO AND RECEIPT_STATUS = '404' AND RTMM.yn =0
LEFT JOIN PBD ON PBD.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND PBD.yn = 0
LEFT JOIN PBM ON PBM.BATCH_NO = PBD.BATCH_NO AND PBM.yn =0
LEFT JOIN PTD ON PTD.OUTBOUND_NO = OBM.OUTBOUND_NO AND PTD.yn =0
LEFT JOIN OCB ON OCB.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OCB.yn =0
LEFT JOIN OLSM ON OLSM.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OLSM.yn =0
WHERE OBM.OUTBOUND_TYPE IN(10,18) AND 1 = 1
AND (OBM.INTERVAL_ENDTIME>=cast('2018/8/31/00:00:00' as datetime)
AND OBM.INTERVAL_ENDTIME<=cast('2018/8/31/23:59:59' as datetime))
问题分析
针对该问题,我们选取2018-09-01
来进行对比,发现两条SQL执行计划相同,当实际执行中查询2018-09-01
的数据仅需要1.96秒,而查询2018-08-31
的数据五分钟没有返回结果,因此我们推断两天的数据差异导致查询性能异常,将查询SQL进行分解测试,找出异常部分。
首先查询OBM
表中满足条件的数据,返回结果:16891,查询时间:0.67秒,查询SQL为:
SELECT COUNT(1)
FROM OBM
#LEFT JOIN OTLR ON OTLR.OUTBOUND_NO = obm.OUTBOUND_NO AND OTLR.YN =0
#LEFT JOIN RTM ON RTM.RECEIPT_NO = OBM.OUTBOUND_NO AND RTM.yn =0
#LEFT JOIN RTMM ON RTMM.RECEIPT_NO = RTM.RECEIPT_NO AND RECEIPT_STATUS = '404' AND RTMM.yn =0
#LEFT JOIN PBD ON PBD.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND PBD.yn = 0
#LEFT JOIN PBM ON PBM.BATCH_NO = PBD.BATCH_NO AND PBM.yn =0
#LEFT JOIN PTD ON PTD.OUTBOUND_NO = OBM.OUTBOUND_NO AND PTD.yn =0
#LEFT JOIN OCB ON OCB.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OCB.yn =0
#LEFT JOIN OLSM ON OLSM.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OLSM.yn =0
WHERE OBM.OUTBOUND_TYPE IN(10,18) AND 1 = 1
AND (OBM.INTERVAL_ENDTIME>=cast('2018/8/31/00:00:00' as datetime)
AND OBM.INTERVAL_ENDTIME<=cast('2018/8/31/23:59:59' as datetime))
尝试关联OTLR
和RTM
两个表,返回结果:24601,查询时间:1.00秒,查询SQL为:
SELECT COUNT(1)
FROM OBM
LEFT JOIN OTLR ON OTLR.OUTBOUND_NO = obm.OUTBOUND_NO AND OTLR.YN =0
LEFT JOIN RTM ON RTM.RECEIPT_NO = OBM.OUTBOUND_NO AND RTM.yn =0
#LEFT JOIN RTMM ON RTMM.RECEIPT_NO = RTM.RECEIPT_NO AND RECEIPT_STATUS = '404' AND RTMM.yn =0
#LEFT JOIN PBD ON PBD.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND PBD.yn = 0
#LEFT JOIN PBM ON PBM.BATCH_NO = PBD.BATCH_NO AND PBM.yn =0
#LEFT JOIN PTD ON PTD.OUTBOUND_NO = OBM.OUTBOUND_NO AND PTD.yn =0
#LEFT JOIN OCB ON OCB.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OCB.yn =0
#LEFT JOIN OLSM ON OLSM.OUTBOUND_NO = OTLR.SUB_ORDER_NO AND OLSM.yn =0
WHERE OBM.OUTBOUND_TYPE IN(10,18) AND 1 = 1
AND (OBM.INTERVAL_ENDTIME>=cast('2018/8/31/00:00:00' as datetime)
AND OBM.INTERVAL_ENDTIME<=cast('2018/8/31/23:59:59' as datetime))
按照上面测试方法,我们继续测试关联更多表后查询性能发现:
首先关联```OTLR```和```RTM```两个表,返回结果:24601,查询时间:1.00秒,
再尝试关联```RTMM```和```PBD```两个表,返回结果:25198,查询时间:1.44秒
再尝试关联```PBM```表,返回结果:27230,查询时间:1.56秒
再尝试关联```PTD```表,返回结果:6241432,查询时间:8.52秒
尝试关联所有表,返回结果:290400318,查询时间:582秒
通过对比查询,发现查询2018-09-01
返回81000条记录,而查询2018-08-31
返回290400318条记录,记录数差距3585倍,所以查询出现”性能问题”。
解决办法
该问题表面上是”性能问题”,但实际上是”SQL正确性问题”,该SQL没有正确使用关联条件或该SQL需求存在问题,虽然查询SQL能执行成功并返回数据,但不代表该返回数据”合理正确”,应重新检查该SQL需求合理性和关联条件正确性。