现在有如下两个表,a 表是远端表(1800 万),b 表是本地表(100 行)。

    1. desc a@dblink
    1. Name Null? Type
    2. --------------------- -------- ------
    3. ID NUMBER
    4. NAME VARCHAR2(100)
    5. ADDRESS VARCHAR2(100)
    1. select count(*) from a@dblink;
    1. COUNT(*)
    2. ----------
    3. 18550272
    1. desc b
    1. Name Null? Type
    2. --------------------- -------- ------
    3. ID NUMBER
    4. NAME VARCHAR2(100)
    5. ADDRESS VARCHAR2(100)
    1. select count(*) from b;
    1. COUNT(*)
    2. ----------
    3. 100

    现有如下 SQL。

    1. select * from a@dblink, b where a.id = b.id;

    默认情况下,会将远端表 a 的数据传输到本地,然后再进行关联,autotrace 的执行计划如下。

    1. set timi on
    2. set autot trace
    3. select * from a@dblink, b where a.id = b.id;
    1. 25600 rows selected.
    2. Elapsed: 00:03:13.80
    3. Execution Plan
    4. ----------------------------------------------------------
    5. Plan hash value: 657970699
    6. -------------------------------------------------------------------------------------
    7. | Id | Operation |Name|Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
    8. -------------------------------------------------------------------------------------
    9. | 0 | SELECT STATEMENT | | 82| 19188 | 6 (17)| 00:00:01 | | |
    10. |* 1 | HASH JOIN | | 82| 19188 | 6 (17)| 00:00:01 | | |
    11. | 2 | REMOTE |A | 82| 9594 | 2 (0)| 00:00:01 | DBLINK | R->S |
    12. | 3 | TABLE ACCESS FULL|B | 100| 11700 | 3 (0)| 00:00:01 | | |
    13. -------------------------------------------------------------------------------------
    14. Predicate Information (identified by operation id):
    15. ---------------------------------------------------
    16. 1 - access("A"."ID"="B"."ID")
    17. Remote SQL Information (identified by operation id):
    18. ----------------------------------------------------
    19. 2 - SELECT "ID","NAME","ADDRESS" FROM "A" "A" (accessing 'DBLINK' )
    20. Statistics
    21. ----------------------------------------------------------
    22. 769 recursive calls
    23. 1 db block gets
    24. 15 consistent gets
    25. 91755 physical reads
    26. 212 redo size
    27. 1477532 bytes sent via SQL*Net to client
    28. 19185 bytes received via SQL*Net from client
    29. 1708 SQL*Net roundtrips to/from client
    30. 0 sorts (memory)
    31. 0 sorts (disk)
    32. 25600 rows processed

    远端表 a 很大,对数据进行传输会耗费大量时间,本地表 b 表很小,而且 a 和 b 关联之后返回数据量很少,我们可以将本地表 b 传输到远端,在远端进行关联,然后再将结果集传回本地,这时需要使用hint:driving_site,下面 SQL 就是将 b 传递到远端关联的示例。

    1. select /*+ driving_site(a) */ * from a@dblink, b where a.id = b.id;

    autotrace 的执行计划如下。

    1. select /*+ driving_site(a) */ * from a@dblink, b where a.id = b.id;
    1. 25600 rows selected.
    2. Elapsed: 00:00:06.08
    3. Execution Plan
    4. ----------------------------------------------------------
    5. Plan hash value: 4284963264
    6. ------------------------------------------------------------------------------------
    7. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
    8. ------------------------------------------------------------------------------------
    9. | 0 | SELECT STATEMENT REMOTE| | 20931 | 4783K| 25565 (2)| | |
    10. |* 1 | HASH JOIN | | 20931 | 4783K| 25565 (2)| | |
    11. | 2 | REMOTE | B | 82 | 9594 | 2 (0)| ! | R->S |
    12. | 3 | TABLE ACCESS FULL | A | 19M| 2173M| 25466 (1)| ORCL | |
    13. ------------------------------------------------------------------------------------
    14. Predicate Information (identified by operation id):
    15. ---------------------------------------------------
    16. 1 - access("A2"."ID"="A1"."ID")
    17. Remote SQL Information (identified by operation id):
    18. ----------------------------------------------------
    19. 2 - SELECT "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
    20. Note
    21. -----
    22. - fully remote statement
    23. Statistics
    24. ----------------------------------------------------------
    25. 6 recursive calls
    26. 0 db block gets
    27. 8 consistent gets
    28. 0 physical reads
    29. 0 redo size
    30. 1428836 bytes sent via SQL*Net to client
    31. 19185 bytes received via SQL*Net from client
    32. 1708 SQL*Net roundtrips to/from client
    33. 0 sorts (memory)
    34. 0 sorts (disk)
    35. 25600 rows processed

    将本地小表传输到远端关联,再返回结果只需 6 秒,相比将大表传输到本地,在性能上有巨大提升。

    现在我们在远端表 a 的连接列建立索引。

    1. create index idx_id on a(id);
    1. Index created.

    因为 b 表只有 100 行数据,a 表有 1 800 万行数据,两表关联之后返回 2.5 万行数据,我们可以让 a 与 b 走嵌套循环,b 作为驱动表,a 作为被驱动表,而且走连接索引。

    1. select /*+ index(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
    1. 25600 rows selected.
    2. Elapsed: 00:00:00.84
    3. Execution Plan
    4. ----------------------------------------------------------
    5. Plan hash value: 1489534455
    6. -------------------------------------------------------------------------------
    7. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
    8. -------------------------------------------------------------------------------
    9. | 0 | SELECT STATEMENT | | 7614K| 1699M| 54680 (100)| | |
    10. | 1 | NESTED LOOPS | | 7614K| 1699M| 54680 (100)| | |
    11. | 2 | TABLE ACCESS FULL| B | 100 | 11700 | 3 (0)| | |
    12. | 3 | REMOTE | A | 76146 | 8700K| 3 (0)| DBLINK | R->S |
    13. -------------------------------------------------------------------------------
    14. Remote SQL Information (identified by operation id):
    15. ----------------------------------------------------
    16. 3 - SELECT /*+ USE_NL ("A") INDEX ("A") */ "ID","NAME","ADDRESS" FROM "A" "A"
    17. WHERE "ID"=:1 (accessing 'DBLINK' )
    18. Statistics
    19. ----------------------------------------------------------
    20. 0 recursive calls
    21. 0 db block gets
    22. 106 consistent gets
    23. 0 physical reads
    24. 0 redo size
    25. 349986 bytes sent via SQL*Net to client
    26. 19185 bytes received via SQL*Net from client
    27. 1708 SQL*Net roundtrips to/from client
    28. 0 sorts (memory)
    29. 0 sorts (disk)
    30. 25600 rows processed

    强制 a 表走索引之后,这时我们只需将索引过滤之后的数据传输到本地,而无需将 a 表所有数据传到本地,性能得到极大提升,SQL 耗时不到 1 秒。

    现在我们将 b 表传输到远端,强制 b 表作为嵌套循环驱动表。

    1. select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
    1. 25600 rows selected.
    2. Elapsed: 00:00:02.92
    3. Execution Plan
    4. ----------------------------------------------------------
    5. Plan hash value: 557259519
    6. -------------------------------------------------------------------------------------
    7. | Id | Operation |Name |Rows | Bytes | Cost (%CPU)| Inst |IN-OUT|
    8. -------------------------------------------------------------------------------------
    9. | 0 | SELECT STATEMENT REMOTE | |20931| 4783K| 20182 (1)| | |
    10. | 1 | NESTED LOOPS | | | | | | |
    11. | 2 | NESTED LOOPS | |20931| 4783K| 20182 (1)| | |
    12. | 3 | REMOTE |B | 82| 9594 | 2 (0)| ! | R->S |
    13. |* 4 | INDEX RANGE SCAN |IDX_ID| 255| | 2 (0)| ORCL | |
    14. | 5 | TABLE ACCESS BY INDEX ROWID|A | 255| 29835 | 246 (0)| ORCL | |
    15. -------------------------------------------------------------------------------------
    16. Predicate Information (identified by operation id):
    17. ---------------------------------------------------
    18. 4 - access("A2"."ID"="A1"."ID")
    19. Remote SQL Information (identified by operation id):
    20. ----------------------------------------------------
    21. 3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
    22. Note
    23. -----
    24. - fully remote statement
    25. Statistics
    26. ----------------------------------------------------------
    27. 6 recursive calls
    28. 0 db block gets
    29. 8 consistent gets
    30. 0 physical reads
    31. 0 redo size
    32. 426684 bytes sent via SQL*Net to client
    33. 19185 bytes received via SQL*Net from client
    34. 1708 SQL*Net roundtrips to/from client
    35. 0 sorts (memory)
    36. 0 sorts (disk)
    37. 25600 rows processed

    该查询耗时 2.9 秒,主要开销耗费在网络传输上,首先我们要将 b 表传输到远端,然后将 a 与 b 的关联结果传输到本地,网络传输耗费了两次。我们可以设置 arraysize 减少网络交互次数,从而减少网络开销,如下所示。

    1. set arraysize 1000
    2. select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b where a.id = b.id;
    1. 25600 rows selected.
    2. Elapsed: 00:00:00.29
    3. Execution Plan
    4. ----------------------------------------------------------
    5. Plan hash value: 557259519
    6. -------------------------------------------------------------------------------------
    7. | Id | Operation |Name |Rows | Bytes | Cost (%CPU)|Inst |IN-OUT|
    8. -------------------------------------------------------------------------------------
    9. | 0 | SELECT STATEMENT REMOTE | |20931| 4783K| 20182 (1)| | |
    10. | 1 | NESTED LOOPS | | | | | | |
    11. | 2 | NESTED LOOPS | |20931| 4783K| 20182 (1)| | |
    12. | 3 | REMOTE |B | 82| 9594 | 2 (0)| !| R->S |
    13. |* 4 | INDEX RANGE SCAN |IDX_ID| 255| | 2 (0)| ORCL| |
    14. | 5 | TABLE ACCESS BY INDEX ROWID|A | 255| 29835 | 246 (0)| ORCL| |
    15. -------------------------------------------------------------------------------------
    16. Predicate Information (identified by operation id):
    17. ---------------------------------------------------
    18. 4 - access("A2"."ID"="A1"."ID")
    19. Remote SQL Information (identified by operation id):
    20. ----------------------------------------------------
    21. 3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
    22. Note
    23. -----
    24. - fully remote statement
    25. Statistics
    26. ----------------------------------------------------------
    27. 3 recursive calls
    28. 0 db block gets
    29. 8 consistent gets
    30. 0 physical reads
    31. 0 redo size
    32. 137698 bytes sent via SQL*Net to client
    33. 694 bytes received via SQL*Net from client
    34. 27 SQL*Net roundtrips to/from client
    35. 0 sorts (memory)
    36. 0 sorts (disk)
    37. 25600 rows processed

    注意观察执行计划中统计信息栏目 SQL*Net roundtrips 从 1 708 减少到 27。当需要将本地表传输到远端关联、再将关联结果传输到本地的时候,我们可以设置 arraysize 优化 SQL。

    如果远端表 a 很大,本地表 b 也很大,两表关联返回数据量多,这时既不能将远端表 a 传到本地,也不能将本地表 b 传到远端,因为无论采用哪种方法,SQL 都很慢。我们可以在本地创建一个带有 dblink 的物化视图,将远端表 a 的数据刷新到本地,然后再进行关联。

    如果 SQL 语句中有多个 dblink 源,最好在本地针对每个 dblink 源建立带有 dblink 的物化视图,因为多个 dblink 源之间进行数据传输,网络信息交换会导致严重性能问题。

    有时候会使用 dblink 对数据进行迁移,如果要迁移的数据量很大,我们可以使用批量游标进行迁移。以下是使用批量游标迁移数据的示例(将 a@dblink 的数据迁移到 b)。

    1. declare
    2. cursor cur is
    3. select id, name, address from a@dblink;
    4. type cur_type is table of cur%rowtype index by binary_integer;
    5. v_cur cur_type;
    6. begin
    7. open cur;
    8. loop
    9. fetch cur bulk collect
    10. into v_cur limit 100000;
    11. forall i in 1 .. v_cur.count
    12. insert into b
    13. (id, name, address)
    14. values
    15. (v_cur(i).id, v_cur(i).name, v_cur(i).address);
    16. commit;
    17. exit when cur%notfound or cur%notfound is null;
    18. end loop;
    19. close cur;
    20. commit;
    21. end;