现在有如下两个表,a 表是远端表(1800 万),b 表是本地表(100 行)。
desc a@dblink
Name Null? Type
--------------------- -------- ------
ID NUMBER
NAME VARCHAR2(100)
ADDRESS VARCHAR2(100)
select count(*) from a@dblink;
COUNT(*)
----------
18550272
desc b
Name Null? Type
--------------------- -------- ------
ID NUMBER
NAME VARCHAR2(100)
ADDRESS VARCHAR2(100)
select count(*) from b;
COUNT(*)
----------
100
现有如下 SQL。
select * from a@dblink, b where a.id = b.id;
默认情况下,会将远端表 a 的数据传输到本地,然后再进行关联,autotrace 的执行计划如下。
set timi on
set autot trace
select * from a@dblink, b where a.id = b.id;
25600 rows selected.
Elapsed: 00:03:13.80
Execution Plan
----------------------------------------------------------
Plan hash value: 657970699
-------------------------------------------------------------------------------------
| Id | Operation |Name|Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82| 19188 | 6 (17)| 00:00:01 | | |
|* 1 | HASH JOIN | | 82| 19188 | 6 (17)| 00:00:01 | | |
| 2 | REMOTE |A | 82| 9594 | 2 (0)| 00:00:01 | DBLINK | R->S |
| 3 | TABLE ACCESS FULL|B | 100| 11700 | 3 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."ID"="B"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
2 - SELECT "ID","NAME","ADDRESS" FROM "A" "A" (accessing 'DBLINK' )
Statistics
----------------------------------------------------------
769 recursive calls
1 db block gets
15 consistent gets
91755 physical reads
212 redo size
1477532 bytes sent via SQL*Net to client
19185 bytes received via SQL*Net from client
1708 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
25600 rows processed
远端表 a 很大,对数据进行传输会耗费大量时间,本地表 b 表很小,而且 a 和 b 关联之后返回数据量很少,我们可以将本地表 b 传输到远端,在远端进行关联,然后再将结果集传回本地,这时需要使用hint:driving_site
,下面 SQL 就是将 b 传递到远端关联的示例。
select /*+ driving_site(a) */ * from a@dblink, b where a.id = b.id;
autotrace 的执行计划如下。
select /*+ driving_site(a) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.
Elapsed: 00:00:06.08
Execution Plan
----------------------------------------------------------
Plan hash value: 4284963264
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE| | 20931 | 4783K| 25565 (2)| | |
|* 1 | HASH JOIN | | 20931 | 4783K| 25565 (2)| | |
| 2 | REMOTE | B | 82 | 9594 | 2 (0)| ! | R->S |
| 3 | TABLE ACCESS FULL | A | 19M| 2173M| 25466 (1)| ORCL | |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A2"."ID"="A1"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
2 - SELECT "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
Note
-----
- fully remote statement
Statistics
----------------------------------------------------------
6 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
1428836 bytes sent via SQL*Net to client
19185 bytes received via SQL*Net from client
1708 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
25600 rows processed
将本地小表传输到远端关联,再返回结果只需 6 秒,相比将大表传输到本地,在性能上有巨大提升。
现在我们在远端表 a 的连接列建立索引。
create index idx_id on a(id);
Index created.
因为 b 表只有 100 行数据,a 表有 1 800 万行数据,两表关联之后返回 2.5 万行数据,我们可以让 a 与 b 走嵌套循环,b 作为驱动表,a 作为被驱动表,而且走连接索引。
select /*+ index(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.
Elapsed: 00:00:00.84
Execution Plan
----------------------------------------------------------
Plan hash value: 1489534455
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7614K| 1699M| 54680 (100)| | |
| 1 | NESTED LOOPS | | 7614K| 1699M| 54680 (100)| | |
| 2 | TABLE ACCESS FULL| B | 100 | 11700 | 3 (0)| | |
| 3 | REMOTE | A | 76146 | 8700K| 3 (0)| DBLINK | R->S |
-------------------------------------------------------------------------------
Remote SQL Information (identified by operation id):
----------------------------------------------------
3 - SELECT /*+ USE_NL ("A") INDEX ("A") */ "ID","NAME","ADDRESS" FROM "A" "A"
WHERE "ID"=:1 (accessing 'DBLINK' )
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
106 consistent gets
0 physical reads
0 redo size
349986 bytes sent via SQL*Net to client
19185 bytes received via SQL*Net from client
1708 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
25600 rows processed
强制 a 表走索引之后,这时我们只需将索引过滤之后的数据传输到本地,而无需将 a 表所有数据传到本地,性能得到极大提升,SQL 耗时不到 1 秒。
现在我们将 b 表传输到远端,强制 b 表作为嵌套循环驱动表。
select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.
Elapsed: 00:00:02.92
Execution Plan
----------------------------------------------------------
Plan hash value: 557259519
-------------------------------------------------------------------------------------
| Id | Operation |Name |Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | |20931| 4783K| 20182 (1)| | |
| 1 | NESTED LOOPS | | | | | | |
| 2 | NESTED LOOPS | |20931| 4783K| 20182 (1)| | |
| 3 | REMOTE |B | 82| 9594 | 2 (0)| ! | R->S |
|* 4 | INDEX RANGE SCAN |IDX_ID| 255| | 2 (0)| ORCL | |
| 5 | TABLE ACCESS BY INDEX ROWID|A | 255| 29835 | 246 (0)| ORCL | |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("A2"."ID"="A1"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
Note
-----
- fully remote statement
Statistics
----------------------------------------------------------
6 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
426684 bytes sent via SQL*Net to client
19185 bytes received via SQL*Net from client
1708 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
25600 rows processed
该查询耗时 2.9 秒,主要开销耗费在网络传输上,首先我们要将 b 表传输到远端,然后将 a 与 b 的关联结果传输到本地,网络传输耗费了两次。我们可以设置 arraysize 减少网络交互次数,从而减少网络开销,如下所示。
set arraysize 1000
select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.
Elapsed: 00:00:00.29
Execution Plan
----------------------------------------------------------
Plan hash value: 557259519
-------------------------------------------------------------------------------------
| Id | Operation |Name |Rows | Bytes | Cost (%CPU)|Inst |IN-OUT|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | |20931| 4783K| 20182 (1)| | |
| 1 | NESTED LOOPS | | | | | | |
| 2 | NESTED LOOPS | |20931| 4783K| 20182 (1)| | |
| 3 | REMOTE |B | 82| 9594 | 2 (0)| !| R->S |
|* 4 | INDEX RANGE SCAN |IDX_ID| 255| | 2 (0)| ORCL| |
| 5 | TABLE ACCESS BY INDEX ROWID|A | 255| 29835 | 246 (0)| ORCL| |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("A2"."ID"="A1"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
Note
-----
- fully remote statement
Statistics
----------------------------------------------------------
3 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
137698 bytes sent via SQL*Net to client
694 bytes received via SQL*Net from client
27 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
25600 rows processed
注意观察执行计划中统计信息栏目 SQL*Net roundtrips
从 1 708 减少到 27。当需要将本地表传输到远端关联、再将关联结果传输到本地的时候,我们可以设置 arraysize 优化 SQL。
如果远端表 a 很大,本地表 b 也很大,两表关联返回数据量多,这时既不能将远端表 a 传到本地,也不能将本地表 b 传到远端,因为无论采用哪种方法,SQL 都很慢。我们可以在本地创建一个带有 dblink 的物化视图,将远端表 a 的数据刷新到本地,然后再进行关联。
如果 SQL 语句中有多个 dblink 源,最好在本地针对每个 dblink 源建立带有 dblink 的物化视图,因为多个 dblink 源之间进行数据传输,网络信息交换会导致严重性能问题。
有时候会使用 dblink 对数据进行迁移,如果要迁移的数据量很大,我们可以使用批量游标进行迁移。以下是使用批量游标迁移数据的示例(将 a@dblink 的数据迁移到 b)。
declare
cursor cur is
select id, name, address from a@dblink;
type cur_type is table of cur%rowtype index by binary_integer;
v_cur cur_type;
begin
open cur;
loop
fetch cur bulk collect
into v_cur limit 100000;
forall i in 1 .. v_cur.count
insert into b
(id, name, address)
values
(v_cur(i).id, v_cur(i).name, v_cur(i).address);
commit;
exit when cur%notfound or cur%notfound is null;
end loop;
close cur;
commit;
end;