2016 年,在上周末优化班的时候,一个同学请求现场优化如下 SQL。

    1. with temp as
    2. (select sgd.detail_id id,
    3. wmsys.wm_concat(distinct(sg.gp_name)) groupnames,
    4. wmsys.wm_concat(distinct(su.user_name)) usernames
    5. from sgd
    6. left join sg
    7. on sg.id = sgd.gp_id
    8. left join sug
    9. on sg.id = sug.gp_id
    10. left join su
    11. on sug.user_id = su.id
    12. group by sgd.detail_id)
    13. select zh.id,
    14. zh.id detailid,
    15. zh.name detailname,
    16. zh.p_level hospitallevel,
    17. zh.type hospitaltype,
    18. dza.name region,
    19. temp.groupnames,
    20. temp.usernames,
    21. (case
    22. when gd.gp_id is null then
    23. 0
    24. else
    25. 1
    26. end) isalloted
    27. from zh
    28. left join dza
    29. on zh.area_id = dza.id
    30. left join temp
    31. on zh.id = temp.id
    32. left join (select gp_id, detail_id from sys_gp_detail where gp_Id = :0) gd
    33. on zh.id = gd.detail_id order by length(id),zh.id asc;

    该 SQL 返回 20 779 行数据,要执行 4 分 32 秒。该执行计划中全是 HASH JOIN,这里就不贴执行计划了。

    首先这条 SQL 最终返回 20 779 行数据,该 SQL 语句最后部分没有 GROUP BY,表与表之间关联全是外连接,主表 zh 没有过滤条件,因此判断 zh 表最多 20 779 行数据,因为它是外连接的主表,不管关联有没有关联上,zh 会返回表中全部数据,如果 zh 与 dza 是 1:n 关系,那么 zh 表总行数还将少于 20 779 行数据。同时也判定 dza,TEMP 数据量都不大,因为所有表关联完只返回 20779 行数据。既然都是小表,为什么最终要执行 4 分 32 秒呢?遇到此类问题,我们需要将 SQL 拆开,分步执行,这样就能判断 SQL 中哪一步是性能瓶颈。9.10 节中案例也是采用分步执行方法找到问题根本原因。

    SQL 语句中有个 with as 子句,对其单独执行,发现要执行两分钟左右。with as 子句中有两个列转行函数:wmsys.wm_concat,将其注释之后 with as 子句能秒出。现在我们定位到,SQL 性能问题是由 wmsys.wm_concat 导致。对于列转行,Oracle 还提供了 Listagg 分析函数,wmsys.wm_concat 从 Oracle11g 之后返回的是 Clob 类型,而 Listagg 返回的是 varchar2 类型。因此我们尝试对 with as 子句进行等价改写,利用分析函数 Listagg 代替 wmsys.wm_concat,以验证改写之后是否还会出现性能问题。with as 子句原始 SQL 如下。

    1. select sgd.detail_id id,
    2. wmsys.wm_concat(distinct(sg.gp_name)) groupnames,
    3. wmsys.wm_concat(distinct(su.user_name)) usernames
    4. from sgd
    5. left join sg on sg.id = sgd.gp_id
    6. left join sug on sg.id = sug.gp_id
    7. left join su on sug.user_id = su.id
    8. group by sgd.detail_id;

    因为 with as 子句中有两个 wmsys.wm_concat,而且 wmsys.wm_concat 中有 distinct,而 Listagg 不支持 distinct,所以我们只能一个一个去掉 wmsys.wm_concat。现在将 with as 子句中 wmsys.wm_concat(distinct(su.user_name))usernames 去掉,只保留 wmsys.wm_concat(distinct (sg.gp_name))groupnames。因为 usernames 关联了 su,sug,而现在只保留 groupnames,所以我们需要将 su,sug 去掉,去掉 usernames 的 SQL 如下。

    1. select sgd.detail_id id, wmsys.wm_concat(distinct(sg.gp_name)) groupnames
    2. from sys_gp_detail sgd
    3. left join sys_gp sg on sg.id = sgd.gp_id
    4. group by sgd.detail_id;

    其执行计划如下。

    1. 已用时间:00: 00: 58.04.
    2. 执行计划
    3. ----------------------------------------------------------
    4. Plan hash value: 3491823204
    5. -----------------------------------------------------------------------------------
    6. | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
    7. -----------------------------------------------------------------------------------
    8. | 0 | SELECT STATEMENT | | 20584 | 824K| | 1308 (8)|
    9. | 1 | SORT GROUP BY | | 20584 | 824K| 15M| 1308 (8)|
    10. |* 2 | HASH JOIN RIGHT OUTER| | 313K| 12M| | 449 (6)|
    11. | 3 | TABLE ACCESS FULL | SYS_GP | 3 | 69 | | 3 (0)|
    12. | 4 | TABLE ACCESS FULL | SYS_GP_DETAIL | 313K| 5518K| | 438 (5)|
    13. -----------------------------------------------------------------------------------
    14. Predicate Information identified by operation id):
    15. ---------------------------------------------------
    16. 2 - access(「SG」.「ID」(+)=「SGD」.「GP_ID」)
    17. 统计信息
    18. ----------------------------------------------------------
    19. 1 recursive calls
    20. 249348 db block gets
    21. 44447 consistent gets
    22. 0 physical reads
    23. 0 redo size
    24. 9993548 bytes sent via SQL*Net to client
    25. 6067828 bytes received via SQL*Net from client
    26. 83118 SQL*Net roundtrips to/from client
    27. 1 sorts memory
    28. 0 sorts disk

    执行计划中的 db block gets 来自于 Clob。因为 Listagg 不支持 distinct,所以我们需要先去重,再采用 Listagg,Listagg 改写的 SQL 如下。

    1. select detail_id, listagg(gp_name, ',') within
    2. group(
    3. order by null)
    4. from (select sgd.detail_id, sg.gp_name
    5. from sys_gp_detail sgd
    6. left join sys_gp sg on sg.id = sgd.gp_id
    7. group by sgd.detail_id, sg.gp_name)
    8. group by detail_id;

    改写后的执行计划如下。

    1. 已用时间:00: 00: 01.12
    2. 执行计划
    3. ----------------------------------------------------------
    4. Plan hash value: 147456425
    5. -------------------------------------------------------------------------------------
    6. | Id | Operation | Name | Rows | Bytes |TempSpc| Cost(%CPU)|
    7. -------------------------------------------------------------------------------------
    8. | 0 | SELECT STATEMENT | | 20584 | 1547K| | 1467 (7)|
    9. | 1 | SORT GROUP BY | | 20584 | 1547K| | 1467 (7)|
    10. | 2 | VIEW | VM_NWVW_0 | 43666 | 3283K| | 1467 (7)|
    11. | 3 | HASH GROUP BY | | 43666 | 1748K| 15M| 1467 (7)|
    12. |* 4 | HASH JOIN RIGHT OUTER| | 313K| 12M| | 449 (6)|
    13. | 5 | TABLE ACCESS FULL | SYS_GP | 3 | 69 | | 3 (0)|
    14. | 6 | TABLE ACCESS FULL | SYS_GP_DETAIL | 313K| 5518K| | 438 (5)|
    15. -------------------------------------------------------------------------------------
    16. Predicate Information identified by operation id):
    17. ---------------------------------------------------
    18. 4 - access(「SG」.「ID」(+)=「SGD」.「GP_ID」)
    19. 统计信息
    20. ----------------------------------------------------------
    21. 1 recursive calls
    22. 0 db block gets
    23. 2775 consistent gets
    24. 0 physical reads
    25. 0 redo size
    26. 450516 bytes sent via SQL*Net to client
    27. 15595 bytes received via SQL*Net from client
    28. 1387 SQL*Net roundtrips to/from client
    29. 1 sorts memory
    30. 0 sorts disk
    31. 20779 rows processed

    使用 Listagg 改写之后,SQL 能在 1 秒执行完毕,而采用 wmsys.wm_concat 需要 58 秒,这说明采用 Listagg 代替 wmsys.wm_concat 能达到优化目的。

    下面我们改写另外一个 wmsys.wm_concat,改写的思路一模一样,先去重,再使用 Listagg。

    1. select detail_id, listagg(user_name, ',') within
    2. group(
    3. order by null)
    4. from (select sgd.detail_id id, su.user_name
    5. from sgd
    6. left join sg on sg.id = sgd.gp_id
    7. left join sug on sg.id = sug.gp_id
    8. left join su on sug.user_id = su.id
    9. group by sgd.detail_id, su.user_name)
    10. group by detail_id;

    最终的 with as 子句如下。

    1. select a.detail_id id , a.groupnames, b.usernames
    2. from (select detail_id, listagg(gp_name, ',') within
    3. group(
    4. order by null) groupnames
    5. from (select sgd.detail_id, sg.gp_name
    6. from sys_gp_detail sgd
    7. left join sys_gp sg on sg.id = sgd.gp_id
    8. group by sgd.detail_id, sg.gp_name)
    9. group by detail_id) a,
    10. (select detail_id, listagg(user_name, ',') within
    11. group(
    12. order by null) usernames
    13. from (select sgd.detail_id, su.user_name
    14. from sgd
    15. left join sg on sg.id = sgd.gp_id
    16. left join sug on sg.id = sug.gp_id
    17. left join su on sug.user_id = su.id
    18. group by sgd.detail_id, su.user_name)
    19. group by detail_id) b
    20. where a.. detail_id = b.detail_id;

    用改写后的 with as 子句替换原始 SQL 中的 with as 子句,最终 SQL 能在两秒左右执行完毕。

    在工作中尽量使用 Listagg 代替 wmsys.wm_concat。