现在有如下两个表,a 表是远端表(1800 万),b 表是本地表(100 行)。
desc a@dblink
Name Null? Type--------------------- -------- ------ID NUMBERNAME VARCHAR2(100)ADDRESS VARCHAR2(100)
select count(*) from a@dblink;
COUNT(*)----------18550272
desc b
Name Null? Type--------------------- -------- ------ID NUMBERNAME VARCHAR2(100)ADDRESS VARCHAR2(100)
select count(*) from b;
COUNT(*)----------100
现有如下 SQL。
select * from a@dblink, b where a.id = b.id;
默认情况下,会将远端表 a 的数据传输到本地,然后再进行关联,autotrace 的执行计划如下。
set timi onset autot traceselect * from a@dblink, b where a.id = b.id;
25600 rows selected.Elapsed: 00:03:13.80Execution Plan----------------------------------------------------------Plan hash value: 657970699-------------------------------------------------------------------------------------| Id | Operation |Name|Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|-------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 82| 19188 | 6 (17)| 00:00:01 | | ||* 1 | HASH JOIN | | 82| 19188 | 6 (17)| 00:00:01 | | || 2 | REMOTE |A | 82| 9594 | 2 (0)| 00:00:01 | DBLINK | R->S || 3 | TABLE ACCESS FULL|B | 100| 11700 | 3 (0)| 00:00:01 | | |-------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------1 - access("A"."ID"="B"."ID")Remote SQL Information (identified by operation id):----------------------------------------------------2 - SELECT "ID","NAME","ADDRESS" FROM "A" "A" (accessing 'DBLINK' )Statistics----------------------------------------------------------769 recursive calls1 db block gets15 consistent gets91755 physical reads212 redo size1477532 bytes sent via SQL*Net to client19185 bytes received via SQL*Net from client1708 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)25600 rows processed
远端表 a 很大,对数据进行传输会耗费大量时间,本地表 b 表很小,而且 a 和 b 关联之后返回数据量很少,我们可以将本地表 b 传输到远端,在远端进行关联,然后再将结果集传回本地,这时需要使用hint:driving_site,下面 SQL 就是将 b 传递到远端关联的示例。
select /*+ driving_site(a) */ * from a@dblink, b where a.id = b.id;
autotrace 的执行计划如下。
select /*+ driving_site(a) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.Elapsed: 00:00:06.08Execution Plan----------------------------------------------------------Plan hash value: 4284963264------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|------------------------------------------------------------------------------------| 0 | SELECT STATEMENT REMOTE| | 20931 | 4783K| 25565 (2)| | ||* 1 | HASH JOIN | | 20931 | 4783K| 25565 (2)| | || 2 | REMOTE | B | 82 | 9594 | 2 (0)| ! | R->S || 3 | TABLE ACCESS FULL | A | 19M| 2173M| 25466 (1)| ORCL | |------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------1 - access("A2"."ID"="A1"."ID")Remote SQL Information (identified by operation id):----------------------------------------------------2 - SELECT "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )Note------ fully remote statementStatistics----------------------------------------------------------6 recursive calls0 db block gets8 consistent gets0 physical reads0 redo size1428836 bytes sent via SQL*Net to client19185 bytes received via SQL*Net from client1708 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)25600 rows processed
将本地小表传输到远端关联,再返回结果只需 6 秒,相比将大表传输到本地,在性能上有巨大提升。
现在我们在远端表 a 的连接列建立索引。
create index idx_id on a(id);
Index created.
因为 b 表只有 100 行数据,a 表有 1 800 万行数据,两表关联之后返回 2.5 万行数据,我们可以让 a 与 b 走嵌套循环,b 作为驱动表,a 作为被驱动表,而且走连接索引。
select /*+ index(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.Elapsed: 00:00:00.84Execution Plan----------------------------------------------------------Plan hash value: 1489534455-------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 7614K| 1699M| 54680 (100)| | || 1 | NESTED LOOPS | | 7614K| 1699M| 54680 (100)| | || 2 | TABLE ACCESS FULL| B | 100 | 11700 | 3 (0)| | || 3 | REMOTE | A | 76146 | 8700K| 3 (0)| DBLINK | R->S |-------------------------------------------------------------------------------Remote SQL Information (identified by operation id):----------------------------------------------------3 - SELECT /*+ USE_NL ("A") INDEX ("A") */ "ID","NAME","ADDRESS" FROM "A" "A"WHERE "ID"=:1 (accessing 'DBLINK' )Statistics----------------------------------------------------------0 recursive calls0 db block gets106 consistent gets0 physical reads0 redo size349986 bytes sent via SQL*Net to client19185 bytes received via SQL*Net from client1708 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)25600 rows processed
强制 a 表走索引之后,这时我们只需将索引过滤之后的数据传输到本地,而无需将 a 表所有数据传到本地,性能得到极大提升,SQL 耗时不到 1 秒。
现在我们将 b 表传输到远端,强制 b 表作为嵌套循环驱动表。
select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.Elapsed: 00:00:02.92Execution Plan----------------------------------------------------------Plan hash value: 557259519-------------------------------------------------------------------------------------| Id | Operation |Name |Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|-------------------------------------------------------------------------------------| 0 | SELECT STATEMENT REMOTE | |20931| 4783K| 20182 (1)| | || 1 | NESTED LOOPS | | | | | | || 2 | NESTED LOOPS | |20931| 4783K| 20182 (1)| | || 3 | REMOTE |B | 82| 9594 | 2 (0)| ! | R->S ||* 4 | INDEX RANGE SCAN |IDX_ID| 255| | 2 (0)| ORCL | || 5 | TABLE ACCESS BY INDEX ROWID|A | 255| 29835 | 246 (0)| ORCL | |-------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------4 - access("A2"."ID"="A1"."ID")Remote SQL Information (identified by operation id):----------------------------------------------------3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )Note------ fully remote statementStatistics----------------------------------------------------------6 recursive calls0 db block gets8 consistent gets0 physical reads0 redo size426684 bytes sent via SQL*Net to client19185 bytes received via SQL*Net from client1708 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)25600 rows processed
该查询耗时 2.9 秒,主要开销耗费在网络传输上,首先我们要将 b 表传输到远端,然后将 a 与 b 的关联结果传输到本地,网络传输耗费了两次。我们可以设置 arraysize 减少网络交互次数,从而减少网络开销,如下所示。
set arraysize 1000select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
25600 rows selected.Elapsed: 00:00:00.29Execution Plan----------------------------------------------------------Plan hash value: 557259519-------------------------------------------------------------------------------------| Id | Operation |Name |Rows | Bytes | Cost (%CPU)|Inst |IN-OUT|-------------------------------------------------------------------------------------| 0 | SELECT STATEMENT REMOTE | |20931| 4783K| 20182 (1)| | || 1 | NESTED LOOPS | | | | | | || 2 | NESTED LOOPS | |20931| 4783K| 20182 (1)| | || 3 | REMOTE |B | 82| 9594 | 2 (0)| !| R->S ||* 4 | INDEX RANGE SCAN |IDX_ID| 255| | 2 (0)| ORCL| || 5 | TABLE ACCESS BY INDEX ROWID|A | 255| 29835 | 246 (0)| ORCL| |-------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------4 - access("A2"."ID"="A1"."ID")Remote SQL Information (identified by operation id):----------------------------------------------------3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )Note------ fully remote statementStatistics----------------------------------------------------------3 recursive calls0 db block gets8 consistent gets0 physical reads0 redo size137698 bytes sent via SQL*Net to client694 bytes received via SQL*Net from client27 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)25600 rows processed
注意观察执行计划中统计信息栏目 SQL*Net roundtrips 从 1 708 减少到 27。当需要将本地表传输到远端关联、再将关联结果传输到本地的时候,我们可以设置 arraysize 优化 SQL。
如果远端表 a 很大,本地表 b 也很大,两表关联返回数据量多,这时既不能将远端表 a 传到本地,也不能将本地表 b 传到远端,因为无论采用哪种方法,SQL 都很慢。我们可以在本地创建一个带有 dblink 的物化视图,将远端表 a 的数据刷新到本地,然后再进行关联。
如果 SQL 语句中有多个 dblink 源,最好在本地针对每个 dblink 源建立带有 dblink 的物化视图,因为多个 dblink 源之间进行数据传输,网络信息交换会导致严重性能问题。
有时候会使用 dblink 对数据进行迁移,如果要迁移的数据量很大,我们可以使用批量游标进行迁移。以下是使用批量游标迁移数据的示例(将 a@dblink 的数据迁移到 b)。
declarecursor cur isselect id, name, address from a@dblink;type cur_type is table of cur%rowtype index by binary_integer;v_cur cur_type;beginopen cur;loopfetch cur bulk collectinto v_cur limit 100000;forall i in 1 .. v_cur.countinsert into b(id, name, address)values(v_cur(i).id, v_cur(i).name, v_cur(i).address);commit;exit when cur%notfound or cur%notfound is null;end loop;close cur;commit;end;
