调优技巧 - DBLINK 优化 - 《SQL优化入门》

现在有如下两个表，a 表是远端表（1800 万），b 表是本地表（100 行）。

desc a@dblink

Name          Null?    Type
 --------------------- -------- ------
 ID                     NUMBER
 NAME                   VARCHAR2(100)
 ADDRESS                VARCHAR2(100)

select count(*) from a@dblink;

COUNT(*)
----------
  18550272

desc b

 Name          Null?    Type
 --------------------- -------- ------
 ID                     NUMBER
 NAME                   VARCHAR2(100)
 ADDRESS                VARCHAR2(100)

select count(*) from b;

COUNT(*)
----------
       100

现有如下 SQL。

select * from a@dblink, b where a.id = b.id;

默认情况下，会将远端表 a 的数据传输到本地，然后再进行关联，autotrace 的执行计划如下。

 set timi on
 set autot trace
 select * from a@dblink, b where a.id = b.id;

25600 rows selected.
Elapsed: 00:03:13.80
Execution Plan
----------------------------------------------------------
Plan hash value: 657970699
-------------------------------------------------------------------------------------
| Id | Operation          |Name|Rows | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT   |    |   82| 19188 |     6  (17)| 00:00:01 |        |      |
|* 1 |  HASH JOIN         |    |   82| 19188 |     6  (17)| 00:00:01 |        |      |
|  2 |   REMOTE           |A   |   82|  9594 |     2   (0)| 00:00:01 | DBLINK | R->S |
|  3 |   TABLE ACCESS FULL|B   |  100| 11700 |     3   (0)| 00:00:01 |        |      |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."ID"="B"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - SELECT "ID","NAME","ADDRESS" FROM "A" "A" (accessing 'DBLINK' )
Statistics
----------------------------------------------------------
        769  recursive calls
          1  db block gets
         15  consistent gets
      91755  physical reads
        212  redo size
    1477532  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

远端表 a 很大，对数据进行传输会耗费大量时间，本地表 b 表很小，而且 a 和 b 关联之后返回数据量很少，我们可以将本地表 b 传输到远端，在远端进行关联，然后再将结果集传回本地，这时需要使用hint：driving_site，下面 SQL 就是将 b 传递到远端关联的示例。

select /*+ driving_site(a) */ * from a@dblink, b  where a.id = b.id;

autotrace 的执行计划如下。

select /*+ driving_site(a) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.
Elapsed: 00:00:06.08
Execution Plan
----------------------------------------------------------
Plan hash value: 4284963264
------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Inst   |IN-OUT|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      | 20931 |  4783K| 25565   (2)|        |      |
|*  1 |  HASH JOIN             |      | 20931 |  4783K| 25565   (2)|        |      |
|   2 |   REMOTE               | B    |    82 |  9594 |     2   (0)|      ! | R->S |
|   3 |   TABLE ACCESS FULL    | A    |    19M|  2173M| 25466   (1)|   ORCL |      |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A2"."ID"="A1"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
   2 - SELECT "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
Note
-----
   - fully remote statement
Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
    1428836  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

将本地小表传输到远端关联，再返回结果只需 6 秒，相比将大表传输到本地，在性能上有巨大提升。

现在我们在远端表 a 的连接列建立索引。

create index idx_id on a(id);

Index created.

因为 b 表只有 100 行数据，a 表有 1 800 万行数据，两表关联之后返回 2.5 万行数据，我们可以让 a 与 b 走嵌套循环，b 作为驱动表，a 作为被驱动表，而且走连接索引。

select /*+ index(a) use_nl(a,b) leading(b) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.
Elapsed: 00:00:00.84
Execution Plan
----------------------------------------------------------
Plan hash value: 1489534455
-------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Inst   |IN-OUT|
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  7614K|  1699M| 54680 (100)|        |      |
|   1 |  NESTED LOOPS      |      |  7614K|  1699M| 54680 (100)|        |      |
|   2 |   TABLE ACCESS FULL| B    |   100 | 11700 |     3   (0)|        |      |
|   3 |   REMOTE           | A    | 76146 |  8700K|     3   (0)| DBLINK | R->S |
-------------------------------------------------------------------------------
Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ USE_NL ("A") INDEX ("A") */ "ID","NAME","ADDRESS" FROM "A" "A"
       WHERE "ID"=:1 (accessing 'DBLINK' )
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        106  consistent gets
          0  physical reads
          0  redo size
     349986  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

强制 a 表走索引之后，这时我们只需将索引过滤之后的数据传输到本地，而无需将 a 表所有数据传到本地，性能得到极大提升，SQL 耗时不到 1 秒。

现在我们将 b 表传输到远端，强制 b 表作为嵌套循环驱动表。

select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.
Elapsed: 00:00:02.92
Execution Plan
----------------------------------------------------------
Plan hash value: 557259519
-------------------------------------------------------------------------------------
| Id | Operation                    |Name  |Rows | Bytes | Cost (%CPU)| Inst  |IN-OUT|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT REMOTE      |      |20931|  4783K| 20182   (1)|       |      |
|  1 |  NESTED LOOPS                |      |     |       |            |       |      |
|  2 |   NESTED LOOPS               |      |20931|  4783K| 20182   (1)|       |      |
|  3 |    REMOTE                    |B     |   82|  9594 |     2   (0)|     ! | R->S |
|* 4 |    INDEX RANGE SCAN          |IDX_ID|  255|       |     2   (0)|  ORCL |      |
|  5 |   TABLE ACCESS BY INDEX ROWID|A     |  255| 29835 |   246   (0)|  ORCL |      |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A2"."ID"="A1"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
Note
-----
   - fully remote statement
Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
     426684  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

该查询耗时 2.9 秒，主要开销耗费在网络传输上，首先我们要将 b 表传输到远端，然后将 a 与 b 的关联结果传输到本地，网络传输耗费了两次。我们可以设置 arraysize 减少网络交互次数，从而减少网络开销，如下所示。

 set arraysize 1000
 select /*+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.
Elapsed: 00:00:00.29
Execution Plan
----------------------------------------------------------
Plan hash value: 557259519
-------------------------------------------------------------------------------------
| Id | Operation                    |Name  |Rows | Bytes | Cost (%CPU)|Inst  |IN-OUT|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT REMOTE      |      |20931|  4783K| 20182   (1)|      |      |
|  1 |  NESTED LOOPS                |      |     |       |            |      |      |
|  2 |   NESTED LOOPS               |      |20931|  4783K| 20182   (1)|      |      |
|  3 |    REMOTE                    |B     |   82|  9594 |     2   (0)|     !| R->S |
|* 4 |    INDEX RANGE SCAN          |IDX_ID|  255|       |     2   (0)|  ORCL|      |
|  5 |   TABLE ACCESS BY INDEX ROWID|A     |  255| 29835 |   246   (0)|  ORCL|      |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A2"."ID"="A1"."ID")
Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT /*+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )
Note
-----
   - fully remote statement
Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
     137698  bytes sent via SQL*Net to client
        694  bytes received via SQL*Net from client
         27  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

注意观察执行计划中统计信息栏目 SQL*Net roundtrips 从 1 708 减少到 27。当需要将本地表传输到远端关联、再将关联结果传输到本地的时候，我们可以设置 arraysize 优化 SQL。

如果远端表 a 很大，本地表 b 也很大，两表关联返回数据量多，这时既不能将远端表 a 传到本地，也不能将本地表 b 传到远端，因为无论采用哪种方法，SQL 都很慢。我们可以在本地创建一个带有 dblink 的物化视图，将远端表 a 的数据刷新到本地，然后再进行关联。

如果 SQL 语句中有多个 dblink 源，最好在本地针对每个 dblink 源建立带有 dblink 的物化视图，因为多个 dblink 源之间进行数据传输，网络信息交换会导致严重性能问题。

有时候会使用 dblink 对数据进行迁移，如果要迁移的数据量很大，我们可以使用批量游标进行迁移。以下是使用批量游标迁移数据的示例（将 a@dblink 的数据迁移到 b）。

declare
  cursor cur is
    select id, name, address from a@dblink;
  type cur_type is table of cur%rowtype index by binary_integer; 
  v_cur cur_type;
begin
  open cur;
  loop
    fetch cur bulk collect
      into v_cur limit 100000; 
    forall i in 1 .. v_cur.count
      insert into b
        (id, name, address)
      values
        (v_cur(i).id, v_cur(i).name, v_cur(i).address);
    commit;
    exit when cur%notfound or cur%notfound is null;
  end loop;
  close cur;
  commit;
end;