案例一

2011 年,一税务局的朋友请求优化下面 SQL。

  1. select *
  2. from (select t.zxid
  3. t.gh
  4. t.xm
  5. t.bm
  6. t.fzjgdm
  7. select count(a.session_id)
  8. from test_v a
  9. where to_char(t.zxid) = a.ZCRYZH slzl
  10. select count(a.session_id)
  11. from test_v a
  12. where to_char(t.zxid) = a.ZCRYZH
  13. and a.myd = '0' 无评价,
  14. select count(a.session_id)
  15. from test_v a
  16. where to_char(t.zxid) = a.ZCRYZH
  17. and a.myd = '1' 满意,
  18. select count(a.session_id)
  19. from test_v a
  20. where to_char(t.zxid) = a.ZCRYZH
  21. and a.myd = '2' 较满意,
  22. select count(a.session_id)
  23. from test_v a
  24. where to_char(t.zxid) = a.ZCRYZH
  25. and a.myd = '3' 一般,
  26. select count(a.session_id)
  27. from test_v a
  28. where to_char(t.zxid) = a.ZCRYZH
  29. and a.myd = '4' 较不满意,
  30. select count(a.session_id)
  31. from test_v a
  32. where to_char(t.zxid) = a.ZCRYZH
  33. and a.myd = '5' 不满意
  34. from CC_ZXJBXX t
  35. WHERE t.yxbz = 'Y')
  36. where slzl <> 0;

该 SQL 有 7 个标量子查询,在 5.5 节中讲到,标量子查询类似嵌套循环,如果主表返回数据很多并且主表连接列基数很高,会导致子查询被多次扫描。该 SQL 竟然有 7 个标量子查询,而且每个标量子查询除了过滤条件不一样,其他都一样,显然我们可以将标量子查询等价改写为外连接,从而优化 SQL,等价改写之后的写法如下。

  1. SELECT T.ZXID
  2. T.GH
  3. T.XM
  4. T.BM
  5. T.FZJGDM
  6. SUM(1) SLZL
  7. SUMDECODE(A.MYD, '0', 1, 0)) 无评价,
  8. SUMDECODE(A.MYD, '1', 1, 0)) 满意,
  9. SUMDECODE(A.MYD, '2', 1, 0)) 较满意,
  10. SUMDECODE(A.MYD, '3', 1, 0)) 一般,
  11. SUMDECODE(A.MYD, '4', 1, 0)) 较不满意,
  12. SUMDECODE(A.MYD, '5', 1, 0)) 不满意
  13. FROM CC_ZXJBXX T test_v A
  14. where A.ZCRYZH = T.ZXID
  15. and T.YXBZ = 'Y'
  16. GROUP BY T.ZXID T.GH T.XM T.BM T.FZJGDM

SQL 改写之后,因为两表只有关联条件,没有过滤条件,所以两表关联走 HASH 连接,test_v 也只需要被扫描一次,从而大大提升 SQL 性能。

decode(条件,值1,返回值1,值2,返回值2,…值n,返回值n,缺省值)这个是decode的表达式,具体的含义解释为:

  1. IF 条件=值1 THEN
  2.     RETURN(翻译值1)
  3. ELSIF 条件=值2 THEN
  4.     RETURN(翻译值2)
  5.     ......
  6. ELSIF 条件=值n THEN
  7.     RETURN(翻译值n)
  8. ELSE
  9.     RETURN(缺省值)
  10. END IF

例如:

  1. select t.id,
  2. t.name,
  3. t.age,
  4. decode(t.sex, '1', '男生', '2', '女生', '其他') as sex
  5. from STUDENT2 t

上述 SQL 其实是一个典型的报表开发初学者在刚开始工作的时候编写的,强烈建议大家要加强 SQL 编程技能。

案例二

本案例发生在 2017 年,是一个比较经典的标量子查询改写优化案例。SQL 和执行计划如下。

  1. SELECT A.LXR_ID,
  2. A.SR,
  3. (SELECT C.JGID || '@' || C.DLS_BM || '@' || C.DLS_MC
  4. FROM KHGL_DLSJBXX C, KHGL_ZJKJ ZJKJ
  5. WHERE C.JGID = ZJKJ.JGID
  6. AND EXISTS (SELECT 1
  7. FROM LXR_YH YH
  8. WHERE YH.KH_ID = ZJKJ.KJ_ID
  9. AND YH.KHLX = '2'
  10. AND YH.LXR_ID = A.LXR_ID)) AS ZJJGXX
  11. FROM LXR_JBXX A
  12. WHERE A.STATUS = '1'
  13. AND A.GRDM = :v1
  14. AND EXISTS (SELECT 1
  15. FROM LXR_YH YH, KHGL_GRDLXX GRDL
  16. WHERE YH.FZGS_DM = :v2
  17. AND YH.KHLX = '2'
  18. AND YH.LXR_ID = A.LXR_ID
  19. AND GRDL.GRDL_ID = YH.KH_ID
  20. AND GRDL.STATUS = '1')
  21. AND ROWNUM < 21;
  1. Execution Plan
  2. ----------------------------------------------------------
  3. Plan hash value: 704492369
  4. ---------------------------------------------------------------------------------------------------------
  5. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  6. ---------------------------------------------------------------------------------------------------------
  7. | 0 | SELECT STATEMENT | | 1 | 41 | 12 (0)| 00:00:01 |
  8. |* 1 | FILTER | | | | | |
  9. |* 2 | HASH JOIN | | 28114 | 3761K| 156 (2)| 00:00:02 |
  10. | 3 | TABLE ACCESS FULL | KHGL_DLSJBXX | 15342 | 1063K| 89 (2)| 00:00:02 |
  11. | 4 | TABLE ACCESS FULL | KHGL_ZJKJ | 28114 | 1812K| 66 (0)| 00:00:01 |
  12. |* 5 | INDEX RANGE SCAN | LXR_YH_ID_LX | 1 | 68 | 4 (0)| 00:00:01 |
  13. |* 6 | COUNT STOPKEY | | | | | |
  14. | 7 | NESTED LOOPS SEMI | | 1 | 41 | 12 (0)| 00:00:01 |
  15. |* 8 | TABLE ACCESS BY INDEX ROWID | LXR_JBXX | 1 | 39 | 5 (0)| 00:00:01 |
  16. |* 9 | INDEX RANGE SCAN | IDX_LXR_JBXX_GRDM | 1 | | 3 (0)| 00:00:01 |
  17. | 10 | VIEW PUSHED PREDICATE | VW_SQ_1 | 1 | 2 | 7 (0)| 00:00:01 |
  18. | 11 | NESTED LOOPS | | 1 | 110 | 7 (0)| 00:00:01 |
  19. | 12 | TABLE ACCESS BY INDEX ROWID| LXR_YH | 1 | 75 | 5 (0)| 00:00:01 |
  20. |* 13 | INDEX RANGE SCAN | IDX_KHGL_LXRYH_FZGSDM | 1 | | 4 (0)| 00:00:01 |
  21. |* 14 | INDEX RANGE SCAN | IDX_GRDLXX_XZQH_FZGS | 1 | 35 | 2 (0)| 00:00:01 |
  22. ---------------------------------------------------------------------------------------------------------
  23. Predicate Information (identified by operation id):
  24. ---------------------------------------------------
  25. 1 - filter( EXISTS (SELECT 0 FROM "LXR_YH" "YH" WHERE "YH"."LXR_ID"=:B1 AND "YH"."KHLX"='2'
  26. AND "YH"."KH_ID"=:B2))
  27. 2 - access("C"."JGID"="ZJKJ"."JGID")
  28. 5 - access("YH"."KH_ID"=:B1 AND "YH"."KHLX"='2' AND "YH"."LXR_ID"=:B2)
  29. 6 - filter(ROWNUM<21)
  30. 8 - filter("A"."STATUS"='1')
  31. 9 - access("A"."GRDM"=:V1)
  32. 13 - access("YH"."FZGS_DM"=:V2 AND "YH"."LXR_ID"="A"."LXR_ID" AND "YH"."KHLX"='2')
  33. 14 - access("GRDL"."GRDL_ID"="YH"."KH_ID" AND "GRDL"."STATUS"='1')
  34. filter("GRDL"."STATUS"='1')
  35. Statistics
  36. ----------------------------------------------------------
  37. 1 recursive calls
  38. 2 db block gets
  39. 103172 consistent gets
  40. 21144 physical reads
  41. 0 redo size
  42. 533 bytes sent via SQL*Net to client
  43. 472 bytes received via SQL*Net from client
  44. 2 SQL*Net roundtrips to/from client
  45. 0 sorts (memory)
  46. 0 sorts (disk)
  47. 1 rows processed

该 SQL 只返回 1 行数据,但是逻辑读为 103 172,显然 SQL 还能进一步优化。从执行计划中可以看到,Id=1 是 Filter,Filter 下面有两个儿子,这属于有害的 Filter。Id=3 和 Id=4 的两个表走的是全表扫描,并且这两个表 Id 前面没有*,也就是说这两个表没有过滤条件。SQL 的逻辑读绝大部分应该是由 Id=1 的 Filter,以及 Id=3 和 Id=4 这两个表贡献而来的。

Id=3 和 Id=4 这两个表来自于标量子查询。注意观察原始 SQL,在标量子查询中,Id=3 与 Id=4 这两个表与主表 LXR_JBXX 没有直接关联,主表是与标量子查询中的半连接进行关联的(YH.LXR_ID = A.LXR_ID)。

大家还记得标量子查询的原理吗?标量子查询类似嵌套循环,主表通过连接列传值给子查询。因为本案例 SQL 比较特殊,主表是与标量子查询中的半连接的表进行关联的,主表没有直接与标量子查询中 From 后面的表进行关联,这就导致了标量子查询中 From 后面的表没能通过连接列进行传值,从而导致 Id=3 和 Id=4 的表走了全表扫描,也导致了 SQL 使用了 Filter,进而使整个 SQL 运行缓慢。

为了消除 Filter,同时也为了能使 Id=3 和 Id=4 的两个表能走索引,需要对 SQL 进行等价改写,将标量子查询中的半连接改写为内连接就能使 Id=3 和 Id=4 的两个表使用索引了。在标量子查询章节中提到过,标量子查询可以等价改写为外连接。因为标量子查询中没有聚合函数,因此判断Id=3Id=4两表关联之后应该是返回1的关系,因为如果两表关联后返回n的关系,SQL会报错。那么现在只需要考虑将标量子查询的半连接等价改写为内连接即可。因为原始的 SQL 写的是半连接,没有写成内连接,因此我们判断标量子查询中的半连接应该是属于n的关系,将半连接改写为内连接,如果半连接属于n的关系,要先将半连接变成 1 的关系。所以原始 SQL 可以等价改写为下面 SQL:

  1. SELECT A.LXR_ID, A.SR, B.MSG AS ZJJGXX
  2. FROM LXR_JBXX A,
  3. (SELECT C.JGID || 『@』 || C.DLS_BM || 『@』 || C.DLS_MC AS MSG
  4. YH.LXR_ID
  5. FROM KHGL_DLSJBXX C
  6. KHGL_ZJKJ ZJKJ
  7. SELECT LXR_ID, KH_ID
  8. FROM LXR_YH
  9. WHERE KHLX = '2'
  10. GROUP BY LXR_ID, KH_ID YH --对连接列分组将 n 的关系变为 1 的关系
  11. WHERE C.JGID = ZJKJ.JGID
  12. AND YH.KH_ID = ZJKJ.KJ_ID) B
  13. WHERE A.LXR_ID = B.LXR_ID(+)
  14. AND A.STATUS = '1'
  15. AND A.GRDM = :v1
  16. AND EXISTS (SELECT 1
  17. FROM LXR_YH YH, KHGL_GRDLXX GRDL
  18. WHERE YH.FZGS_DM = :v2
  19. AND YH.KHLX = '2'
  20. AND YH.LXR_ID = A.LXR_ID
  21. AND GRDL.GRDL_ID = YH.KH_ID
  22. AND GRDL.STATUS = '1')
  23. AND ROWNUM < 21;

改写之后,SQL 的执行计划如下:

  1. Elapsed: 00:00:00.01
  2. Execution Plan
  3. ----------------------------------------------------------
  4. Plan hash value: 2638330795
  5. -------------------------------------------------------------------------------------------------------------
  6. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  7. -------------------------------------------------------------------------------------------------------------
  8. | 0 | SELECT STATEMENT | | 1 | 124 | 87 (3)| 00:00:02 |
  9. |* 1 | COUNT STOPKEY | | | | | |
  10. | 2 | NESTED LOOPS OUTER | | 1 | 124 | 87 (3)| 00:00:02 |
  11. | 3 | NESTED LOOPS SEMI | | 1 | 41 | 12 (0)| 00:00:01 |
  12. |* 4 | TABLE ACCESS BY INDEX ROWID | LXR_JBXX | 1 | 39 | 5 (0)| 00:00:01 |
  13. |* 5 | INDEX RANGE SCAN | IDX_LXR_JBXX_GRDM | 1 | | 3 (0)| 00:00:01 |
  14. | 6 | VIEW PUSHED PREDICATE | VW_SQ_1 | 1 | 2 | 7 (0)| 00:00:01 |
  15. | 7 | NESTED LOOPS | | 1 | 110 | 7 (0)| 00:00:01 |
  16. | 8 | TABLE ACCESS BY INDEX ROWID | LXR_YH | 1 | 75 | 5 (0)| 00:00:01 |
  17. |* 9 | INDEX RANGE SCAN | IDX_KHGL_LXRYH_FZGSDM | 1 | | 4 (0)| 00:00:01 |
  18. |* 10 | INDEX RANGE SCAN | IDX_GRDLXX_XZQH_FZGS | 1 | 35 | 2 (0)| 00:00:01 |
  19. | 11 | VIEW PUSHED PREDICATE | | 1 | 83 | 75 (3)| 00:00:01 |
  20. | 12 | NESTED LOOPS | | | | | |
  21. | 13 | NESTED LOOPS | | 1 | 203 | 75 (3)| 00:00:01 |
  22. |* 14 | HASH JOIN | | 1 | 132 | 74 (3)| 00:00:01 |
  23. | 15 | VIEW | | 1 | 66 | 7 (15)| 00:00:01 |
  24. | 16 | SORT GROUP BY | | 1 | 68 | 7 (15)| 00:00:01 |
  25. |* 17 | TABLE ACCESS BY INDEX ROWID| LXR_YH | 1 | 68 | 6 (0)| 00:00:01 |
  26. |* 18 | INDEX RANGE SCAN | IDX_KHGL_LXRYH_LXRID | 1 | | 4 (0)| 00:00:01 |
  27. | 19 | TABLE ACCESS FULL | KHGL_ZJKJ | 28114 | 1812K| 66 (0)| 00:00:01 |
  28. |* 20 | INDEX UNIQUE SCAN | KHGL_DLSJBXX_PK | 1 | | 0 (0)| 00:00:01 |
  29. | 21 | TABLE ACCESS BY INDEX ROWID | KHGL_DLSJBXX | 1 | 71 | 1 (0)| 00:00:01 |
  30. -------------------------------------------------------------------------------------------------------------
  31. Predicate Information (identified by operation id):
  32. ---------------------------------------------------
  33. 1 - filter(ROWNUM<21)
  34. 4 - filter("A"."STATUS"='1')
  35. 5 - access("A"."GRDM"=:V1)
  36. 9 - access("YH"."FZGS_DM"=:V2 AND "YH"."LXR_ID"="A"."LXR_ID" AND "YH"."KHLX"='2')
  37. 10 - access("GRDL"."GRDL_ID"="YH"."KH_ID" AND "GRDL"."STATUS"='1')
  38. filter("GRDL"."STATUS"='1')
  39. 14 - access("YH"."KH_ID"="ZJKJ"."KJ_ID")
  40. 17 - filter("KHLX"='2')
  41. 18 - access("LXR_ID"="A"."LXR_ID")
  42. 20 - access("C"."JGID"="ZJKJ"."JGID")
  43. Statistics
  44. ----------------------------------------------------------
  45. 0 recursive calls
  46. 1 db block gets
  47. 400 consistent gets
  48. 0 physical reads
  49. 0 redo size
  50. 533 bytes sent via SQL*Net to client
  51. 472 bytes received via SQL*Net from client
  52. 2 SQL*Net roundtrips to/from client
  53. 1 sorts (memory)
  54. 0 sorts (disk)
  55. 1 rows processed

对 SQL 进行等价改写之后,SQL 的逻辑读下降到 400,本次优化也就到此为止。

通过本案例,各位读者应该对 SQL 等价改写引起足够重视,同时也要掌握标量子查询等价改写为外连接,半连接等价改写为内连接,反连接改写为外连接等最基本的 SQL 改写技巧,另外,大家还要对表与表之间关系引起足够重视。