当 where 条件中有多个谓词过滤条件,但是这些谓词过滤条件彼此是有关系的而不是相互独立的,这时我们可能需要收集扩展统计信息以便优化器能够估算出较为准确的行数(Rows)。

    我们创建一个表 T。

    1. create table t as
    2. select level as id, level || 'a' as a, level || level || 'b' as b
    3. from dual
    4. connect by level < 100;
    1. Table created.

    在 T 表中,知道 A 列的值就知道 B 列的值,A 和 B 这样的列就叫作相关列

    我们一直执行insert into t select * from t;直到 T 表中有 3244032 行数据。

    我们对 T 表收集统计信息。

    1. BEGIN
    2. DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT',
    3. tabname => 'T',
    4. estimate_percent => 100,
    5. method_opt => 'for all columns size skewonly',
    6. no_invalidate => FALSE,
    7. degree => 1,
    8. cascade => TRUE);
    9. END;
    1. PL/SQL procedure successfully completed.

    我们查看 T 表的统计信息。

    1. select a.column_name,
    2. b.num_rows,
    3. a.num_distinct Cardinality,
    4. round(a.num_distinct / b.num_rows * 100, 2) selectivity,
    5. a.histogram,
    6. a.num_buckets
    7. from dba_tab_col_statistics a, dba_tables b
    8. where a.owner = b.owner
    9. and a.table_name = b.table_name
    10. and a.owner = 'SCOTT'
    11. and a.table_name = 'T';
    1. COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
    2. --------------- ---------- ----------- ----------- --------------- -----------
    3. ID 3244032 99 0 FREQUENCY 99
    4. A 3244032 99 0 FREQUENCY 99
    5. B 3244032 99 0 FREQUENCY 99

    我们创建两个索引。

    1. SQL> create index idx1 on t(a);
    2. Index created.
    3. SQL> create index idx2 on t(a,b);
    4. Index created.

    现有如下 SQL 及其执行计划。

    1. select * from t where a='1a' and b='11b';
    1. 32768 rows selected.
    2. Execution Plan
    3. ----------------------------------------------------------
    4. Plan hash value: 2303463401
    5. -----------------------------------------------------------------------------------
    6. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    7. -----------------------------------------------------------------------------------
    8. | 0 | SELECT STATEMENT | | 331 | 4303 | 84 (0)| 00:00:02 |
    9. | 1 | TABLE ACCESS BY INDEX ROWID| T | 331 | 4303 | 84 (0)| 00:00:02 |
    10. |* 2 | INDEX RANGE SCAN | IDX2 | 331 | | 3 (0)| 00:00:01 |
    11. -----------------------------------------------------------------------------------
    12. Predicate Information (identified by operation id):
    13. ---------------------------------------------------
    14. 2 - access("A"='1a' AND "B"='11b')
    15. Statistics
    16. ----------------------------------------------------------
    17. 0 recursive calls
    18. 0 db block gets
    19. 11854 consistent gets
    20. 78 physical reads
    21. 0 redo size
    22. 775996 bytes sent via SQL*Net to client
    23. 24444 bytes received via SQL*Net from client
    24. 2186 SQL*Net roundtrips to/from client
    25. 0 sorts (memory)
    26. 0 sorts (disk)
    27. 32768 rows processed

    优化器估算返回 331 行数据,但是实际上返回了 32 768 行数据。为什么优化器估算返回的行数与真实返回的行数有这么大差异呢?这是因为优化器不知道 A 与 B 的关系,所以在估算返回行数的时候采用的是扩展统计信息 - 图1

    1. select round(1/99/99*3244032) from dual;
    1. round(1/99/99*3244032)
    2. ---------------------
    3. 331

    因为 A 列的值可以决定 B 列的值,所以上述 SQL 可以去掉 B 列的过滤条件。

    1. select * from t where a='1a';
    1. 32768 rows selected.
    2. Execution Plan
    3. ----------------------------------------------------------
    4. Plan hash value: 1601196873
    5. --------------------------------------------------------------------------
    6. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    7. --------------------------------------------------------------------------
    8. | 0 | SELECT STATEMENT | | 32768 | 416K| 1775 (3)| 00:00:22 |
    9. |* 1 | TABLE ACCESS FULL| T | 32768 | 416K| 1775 (3)| 00:00:22 |
    10. --------------------------------------------------------------------------
    11. Predicate Information (identified by operation id):
    12. ---------------------------------------------------
    13. 1 - filter("A"='1a')
    14. Statistics
    15. ----------------------------------------------------------
    16. 0 recursive calls
    17. 0 db block gets
    18. 10118 consistent gets
    19. 0 physical reads
    20. 0 redo size
    21. 441776 bytes sent via SQL*Net to client
    22. 24444 bytes received via SQL*Net from client
    23. 2186 SQL*Net roundtrips to/from client
    24. 0 sorts (memory)
    25. 0 sorts (disk)
    26. 32768 rows processed

    这时优化器能正确地估算返回的 Rows。如果不想改写 SQL,怎么才能让优化器得到比较准确的 Rows 呢?在 Oracle11g 之前可以使用动态采样(至少 Level 4)。

    1. alter session set optimizer_dynamic_sampling=4;
    1. Session altered.
    1. select * from t where a='1a' and b='11b';
    1. 32768 rows selected.
    2. Execution Plan
    3. ----------------------------------------------------------
    4. Plan hash value: 1601196873
    5. --------------------------------------------------------------------------
    6. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    7. --------------------------------------------------------------------------
    8. | 0 | SELECT STATEMENT | | 33845 | 429K| 1778 (3)| 00:00:22 |
    9. |* 1 | TABLE ACCESS FULL| T | 33845 | 429K| 1778 (3)| 00:00:22 |
    10. --------------------------------------------------------------------------
    11. Predicate Information (identified by operation id):
    12. ---------------------------------------------------
    13. 1 - filter("A"='1a' AND "B"='11b')
    14. Note
    15. -----
    16. - dynamic sampling used for this statement (level=4)
    17. Statistics
    18. ----------------------------------------------------------
    19. 0 recursive calls
    20. 0 db block gets
    21. 10118 consistent gets
    22. 0 physical reads
    23. 0 redo size
    24. 441776 bytes sent via SQL*Net to client
    25. 24444 bytes received via SQL*Net from client
    26. 2186 SQL*Net roundtrips to/from client
    27. 0 sorts (memory)
    28. 0 sorts (disk)
    29. 32768 rows processed

    使用动态采样 Level4 采样之后,优化器估算返回 33 845 行数据,实际返回了 32 768 行数据,这已经比较精确了。在 Oracle11g 以后,我们可以使用扩展统计信息将相关的列组合成一个列。

    1. SELECT DBMS_STATS.CREATE_EXTENDED_STATS(USER, 'T', '(A, B)') FROM DUAL;
    1. DBMS_STATS.CREATE_EXTENDED_STATS(USER,'T','(A,B)')
    2. ----------------------------------------------------------------------------
    3. SYS_STUNA$6DVXJXTP05EH56DTIR0X

    现在我们对表重新收集统计信息。

    1. BEGIN
    2. DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT',
    3. tabname => 'T',
    4. estimate_percent => 100,
    5. method_opt => 'for columns SYS_STUNA$6DVXJXTP05EH56DTIR0X size skewonly',
    6. no_invalidate => FALSE,
    7. degree => 1,
    8. cascade => TRUE);
    9. END;
    1. PL/SQL procedure successfully completed.

    我们查看 T 表的统计信息。

    1. select a.column_name,
    2. b.num_rows,
    3. a.num_distinct Cardinality,
    4. round(a.num_distinct / b.num_rows * 100, 2) selectivity,
    5. a.histogram,
    6. a.num_buckets
    7. from dba_tab_col_statistics a, dba_tables b
    8. where a.owner = b.owner
    9. and a.table_name = b.table_name
    10. and a.owner = 'SCOTT'
    11. and a.table_name = 'T';
    1. COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
    2. ------------------------------ ---------- ----------- ----------- ---------- --------
    3. ID 3244032 99 0 FREQUENCY 99
    4. A 3244032 99 0 FREQUENCY 99
    5. B 3244032 99 0 FREQUENCY 99
    6. SYS_STUNA$6DVXJXTP05EH56DTIR0X 3244032 99 0 FREQUENCY 99

    重新收集统计信息之后,扩展列 SYS_STUNA$6DVXJXTP05EH56DTIR0X 也收集了直方图。

    我们再次执行 SQL。

    1. select * from t where a='1a' and b='11b';
    1. 32768 rows selected.
    2. Execution Plan
    3. ----------------------------------------------------------
    4. Plan hash value: 1601196873
    5. --------------------------------------------------------------------------
    6. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    7. --------------------------------------------------------------------------
    8. | 0 | SELECT STATEMENT | | 32768 | 416K| 1778 (3)| 00:00:22 |
    9. |* 1 | TABLE ACCESS FULL| T | 32768 | 416K| 1778 (3)| 00:00:22 |
    10. --------------------------------------------------------------------------
    11. Predicate Information (identified by operation id):
    12. ---------------------------------------------------
    13. 1 - filter("A"='1a' AND "B"='11b')
    14. Statistics
    15. ----------------------------------------------------------
    16. 1 recursive calls
    17. 0 db block gets
    18. 10118 consistent gets
    19. 0 physical reads
    20. 0 redo size
    21. 441776 bytes sent via SQL*Net to client
    22. 24444 bytes received via SQL*Net from client
    23. 2186 SQL*Net roundtrips to/from client
    24. 0 sorts (memory)
    25. 0 sorts (disk)
    26. 32768 rows processed

    收集完扩展统计信息之后,优化器就能估算出较为准确的 Rows。

    需要注意的是,扩展统计信息只能用于等值查询,不能用于非等值查询。

    在本书的 SQL 优化案例赏析章节中,我们将会为各位读者分享一个经典的扩展统计信息优化案例。