两表关联只返回主表的数据，而且只返回主表与子表没关联上的数据，这种连接就叫反连接。反连接一般就是指的 not in 和 not exists。

反连接等价改写

not in 与 not exists 一般情况下也可以进行等价改写。

not in 的写法如下。

select * from dept where deptno not in (select deptno from emp);

    DEPTNO DNAME           LOC
---------- --------------- ---------------------------------------
        40 OPERATIONS      BOSTON

not exists 的写法如下。

select *
    from dept
   where not exists (select null from emp where dept.deptno = emp.deptno);

    DEPTNO DNAME           LOC
---------- --------------- ---------------------------------------
        40 OPERATIONS      BOSTON

需要注意的是，not in 里面如果有 null，整个查询会返回空，而 in 里面有 null，查询不受 null 影响，例子如下。

select * from dept where deptno not in (10,null);

no rows selected

select * from dept where deptno in (10,null);

   DEPTNO DNAME           LOC
---------- --------------- -----------------------
        10 ACCOUNTING      NEW YORK

所以在将 not exists 等价改写为 not in 的时候，要注意 null。一般情况下，如果反连接采用 not in 写法，我们需要在 where 条件中剔除 null。

select *
   from dept
  where deptno not in (select deptno from emp where deptno is not null);

not in 与 not exists 除了可以相互等价改写以外，还可以等价地改写为外连接，例如，上面查询可以等价改写为如下写法。

 select d.*
    from dept d
    left join emp e on d.deptno = e.deptno
   where e.deptno is null;

    DEPTNO DNAME           LOC
---------- --------------- ------------------
        40 OPERATIONS      BOSTON

为什么反连接可以改写为「外连接 + 子表连接条件 is null」？我们再来回顾一下反连接定义：两表关联只返回主表的数据，而且只返回主表与子表没有关联上的数据。根据反连接定义，翻译为标准 SQL 写法就是「外连接 + 子表连接条件 is null」。与半连接改写为内连接不同的是，反连接改写为外连接不需要考虑两表之间的关系。

控制反连接执行计划

我们先来查看示例 SQL 的原始执行计划。

select * from dept where deptno not in (select deptno from emp);

Execution Plan
----------------------------------------------------------
Plan hash value: 2230682264
-------------------------------------------------------------------------------------
| Id | Operation                     | Name    | Rows  | Bytes | Cost(%CPU)|Time    |
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT              |         |     1 |    23 |    6  (17)|00:00:01|
|  1 |  MERGE JOIN ANTI NA           |         |     1 |    23 |    6  (17)|00:00:01|
|  2 |   SORT JOIN                   |         |     4 |    80 |    2   (0)|00:00:01|
|  3 |    TABLE ACCESS BY INDEX ROWID| DEPT    |     4 |    80 |    2   (0)|00:00:01|
|  4 |     INDEX FULL SCAN           | PK_DEPT |     4 |       |    1   (0)|00:00:01|
|* 5 |   SORT UNIQUE                 |         |    14 |    42 |    4  (25)|00:00:01|
|  6 |    TABLE ACCESS FULL          | EMP     |    14 |    42 |    3   (0)|00:00:01|
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("DEPTNO"="DEPTNO")
       filter("DEPTNO"="DEPTNO")

原始执行计划中 DEPT 与 EMP 是采用排序合并连接进行关联的。

我们现在让 DEPT 与 EMP 使用嵌套循环进行关联，不指定驱动表。

select /*+ use_nl(dept,emp@a) */ *
    from dept
   where deptno not in (select /*+ qb_name(a) */
                         deptno
                          from emp);

Execution Plan
----------------------------------------------------------
Plan hash value: 1831344308
-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    23 |    11   (0)| 00:00:01 |
|*  1 |  FILTER                |      |       |       |            |          |
|   2 |   NESTED LOOPS ANTI SNA|      |     1 |    23 |    11  (28)| 00:00:01 |
|   3 |    TABLE ACCESS FULL   | DEPT |     4 |    80 |     3   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL   | EMP  |     9 |    27 |     1   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL    | EMP  |     1 |     3 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( NOT EXISTS (SELECT /*+ QB_NAME ("A") */ 0 FROM "EMP"
              "EMP" WHERE "DEPTNO" IS NULL))
   4 - filter("DEPTNO"="DEPTNO")
   5 - filter("DEPTNO" IS NULL)

执行计划居然变成了 FILTER，我们指定的 HINT 被 CBO 忽略了。这究竟是什么原因呢？注意观察 FILTER 对应的谓词部分我们就能发现原因。因为子表 EMP 的连接列 DEPTNO 没有排除存在 null 的情况，所以 CBO 选择了 FILTER。现在我们给子查询加上语句where deptno is not null再看一下执行计划。

select /*+ use_nl(dept,emp@a) */ *
    from dept
   where deptno not in (select /*+ qb_name(a) */
                         deptno
                          from emp where deptno is not null);

Execution Plan
----------------------------------------------------------
Plan hash value: 1522491139
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    23 |     8   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |      |     1 |    23 |     8   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPT |     4 |    80 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     9 |    27 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("DEPTNO" IS NOT NULL AND "DEPTNO"="DEPTNO")

现在我们将 not in 改写为 not exists，加上 HINT，再查看执行计划。

select /*+ use_nl(dept,emp@a) */ *
    from dept
   where not exists
   (select /*+ qb_name(a) */ null from emp where emp.deptno = dept.deptno);

Execution Plan
----------------------------------------------------------
Plan hash value: 1522491139
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    23 |     8   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |      |     1 |    23 |     8   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPT |     4 |    80 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     9 |    27 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("EMP"."DEPTNO"="DEPT"."DEPTNO")

在执行计划中，DEPT 是嵌套循环的驱动表，EMP 是嵌套循环的被驱动表。现在我们让 DEPT 与 EMP 还进行嵌套循环连接，但是让 EMP 作为驱动表。

select /*+ use_nl(dept,emp@a) leading(emp@a) */ *
    from dept
   where not exists
   (select /*+ qb_name(a) */ null from emp where emp.deptno = dept.deptno);

Execution Plan
----------------------------------------------------------
Plan hash value: 1522491139
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    23 |     8   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |      |     1 |    23 |     8   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPT |     4 |    80 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     9 |    27 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("EMP"."DEPTNO"="DEPT"."DEPTNO")

注意观察执行计划，虽然我们使用了 leading（emp@a）强制让 EMP 作为驱动表，但是执行计划中驱动表还是 DEPT。这是为什么呢？因为反连接等价于「外连接 + 子表连接条件 is null」，大家是否还记得：当两表关联是外连接，使用嵌套循环进行关联的时候无法更改驱动表，驱动表会被固定为主表。

现在我们让 DEPT 与 EMP 进行 HASH 连接，而且让 EMP 作为驱动表。

select /*+ use_hash(dept,emp@a) leading(emp@a) */ *
    from dept
   where not exists
   (select /*+ qb_name(a) */ null from emp where emp.deptno = dept.deptno);

Execution Plan
----------------------------------------------------------
Plan hash value: 474461924
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    23 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN ANTI    |      |     1 |    23 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPT |     4 |    80 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |    14 |    42 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")

虽然 DEPT 与 EMP 采用的是 HASH 连接，但是驱动表还是 DEPT。为什么 leading（emp@a）失效了呢？因为两表关联如果是外连接，要改变 HASH 连接的驱动表必须使用 swap_join_inputs。现在我们使用 swap_join_inputs 来更改 HASH 连接的驱动表。

 select /*+ use_hash(dept,emp@a) swap_join_inputs(emp@a) */ *
    from dept
   where not exists
   (select /*+ qb_name(a) */ null from emp where emp.deptno = dept.deptno);

Execution Plan
----------------------------------------------------------
Plan hash value: 152508289
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    23 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI|      |     1 |    23 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | EMP  |    14 |    42 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | DEPT |     4 |    80 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")

读者思考

现有如下 SQL。

select * from a where a.id not in (select id from b where id is not null);

假设 a 有 1 000 万条，b 有 1 000 条，请问如何优化该 SQL？

假设 a 有 1 000 条，b 有 1 000 万条，请问如何优化该 SQL？

假设 a 有 100 万条，b 有 1 000 万条，请问如何优化该 SQL？

SQL优化入门

反连接（ANTI JOIN）

反连接等价改写

控制反连接执行计划

读者思考