当 where 条件中有多个谓词过滤条件,但是这些谓词过滤条件彼此是有关系的而不是相互独立的,这时我们可能需要收集扩展统计信息以便优化器能够估算出较为准确的行数(Rows)。
我们创建一个表 T。
create table t as
select level as id, level || 'a' as a, level || level || 'b' as b
from dual
connect by level < 100;
Table created.
在 T 表中,知道 A 列的值就知道 B 列的值,A 和 B 这样的列就叫作相关列。
我们一直执行insert into t select * from t;
直到 T 表中有 3244032 行数据。
我们对 T 表收集统计信息。
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT',
tabname => 'T',
estimate_percent => 100,
method_opt => 'for all columns size skewonly',
no_invalidate => FALSE,
degree => 1,
cascade => TRUE);
END;
PL/SQL procedure successfully completed.
我们查看 T 表的统计信息。
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'SCOTT'
and a.table_name = 'T';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
--------------- ---------- ----------- ----------- --------------- -----------
ID 3244032 99 0 FREQUENCY 99
A 3244032 99 0 FREQUENCY 99
B 3244032 99 0 FREQUENCY 99
我们创建两个索引。
SQL> create index idx1 on t(a);
Index created.
SQL> create index idx2 on t(a,b);
Index created.
现有如下 SQL 及其执行计划。
select * from t where a='1a' and b='11b';
32768 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2303463401
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 331 | 4303 | 84 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 331 | 4303 | 84 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX2 | 331 | | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"='1a' AND "B"='11b')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11854 consistent gets
78 physical reads
0 redo size
775996 bytes sent via SQL*Net to client
24444 bytes received via SQL*Net from client
2186 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
32768 rows processed
优化器估算返回 331 行数据,但是实际上返回了 32 768 行数据。为什么优化器估算返回的行数与真实返回的行数有这么大差异呢?这是因为优化器不知道 A 与 B 的关系,所以在估算返回行数的时候采用的是。
select round(1/99/99*3244032) from dual;
round(1/99/99*3244032)
---------------------
331
因为 A 列的值可以决定 B 列的值,所以上述 SQL 可以去掉 B 列的过滤条件。
select * from t where a='1a';
32768 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32768 | 416K| 1775 (3)| 00:00:22 |
|* 1 | TABLE ACCESS FULL| T | 32768 | 416K| 1775 (3)| 00:00:22 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"='1a')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10118 consistent gets
0 physical reads
0 redo size
441776 bytes sent via SQL*Net to client
24444 bytes received via SQL*Net from client
2186 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
32768 rows processed
这时优化器能正确地估算返回的 Rows。如果不想改写 SQL,怎么才能让优化器得到比较准确的 Rows 呢?在 Oracle11g 之前可以使用动态采样(至少 Level 4)。
alter session set optimizer_dynamic_sampling=4;
Session altered.
select * from t where a='1a' and b='11b';
32768 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 33845 | 429K| 1778 (3)| 00:00:22 |
|* 1 | TABLE ACCESS FULL| T | 33845 | 429K| 1778 (3)| 00:00:22 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"='1a' AND "B"='11b')
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10118 consistent gets
0 physical reads
0 redo size
441776 bytes sent via SQL*Net to client
24444 bytes received via SQL*Net from client
2186 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
32768 rows processed
使用动态采样 Level4 采样之后,优化器估算返回 33 845 行数据,实际返回了 32 768 行数据,这已经比较精确了。在 Oracle11g 以后,我们可以使用扩展统计信息将相关的列组合成一个列。
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(USER, 'T', '(A, B)') FROM DUAL;
DBMS_STATS.CREATE_EXTENDED_STATS(USER,'T','(A,B)')
----------------------------------------------------------------------------
SYS_STUNA$6DVXJXTP05EH56DTIR0X
现在我们对表重新收集统计信息。
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCOTT',
tabname => 'T',
estimate_percent => 100,
method_opt => 'for columns SYS_STUNA$6DVXJXTP05EH56DTIR0X size skewonly',
no_invalidate => FALSE,
degree => 1,
cascade => TRUE);
END;
PL/SQL procedure successfully completed.
我们查看 T 表的统计信息。
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'SCOTT'
and a.table_name = 'T';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ----------- ----------- ---------- --------
ID 3244032 99 0 FREQUENCY 99
A 3244032 99 0 FREQUENCY 99
B 3244032 99 0 FREQUENCY 99
SYS_STUNA$6DVXJXTP05EH56DTIR0X 3244032 99 0 FREQUENCY 99
重新收集统计信息之后,扩展列 SYS_STUNA$6DVXJXTP05EH56DTIR0X 也收集了直方图。
我们再次执行 SQL。
select * from t where a='1a' and b='11b';
32768 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32768 | 416K| 1778 (3)| 00:00:22 |
|* 1 | TABLE ACCESS FULL| T | 32768 | 416K| 1778 (3)| 00:00:22 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"='1a' AND "B"='11b')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
10118 consistent gets
0 physical reads
0 redo size
441776 bytes sent via SQL*Net to client
24444 bytes received via SQL*Net from client
2186 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
32768 rows processed
收集完扩展统计信息之后,优化器就能估算出较为准确的 Rows。
需要注意的是,扩展统计信息只能用于等值查询,不能用于非等值查询。
在本书的 SQL 优化案例赏析章节中,我们将会为各位读者分享一个经典的扩展统计信息优化案例。