表连接方式 - 标量子查询（SCALAR SUBQUERY） - 《SQL优化入门》

当一个子查询介于 select 与 from 之间，这种子查询就叫标量子查询，例子如下。

select e.ename,
       e.sal,
       (select d.dname from dept d where d.deptno = e.deptno) dname
  from emp e;

我们在测试账号 scott 中运行如下 SQL。

select /*+ gather_plan_statistics */ e.ename，
         e.sal，
         （select d.dname from dept d where d.deptno = e.deptno） dname
    from emp e;

select * from table（dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')）;

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  ggmw3tv6xypx1， child number 0
-------------------------------------
select /*+ gather_plan_statistics */ e.ename，        e.sal，
（select d.dname from dept d where d.deptno = e.deptno） dname   from emp e
Plan hash value: 2981343222
-------------------------------------------------------------------------------------
| Id |Operation                   |Name   |Starts|E-Rows|A-Rows|   A-Time  | Buffers |
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT            |       |     1|      |    14|00:00:00.01|       8 |
|  1 | TABLE ACCESS BY INDEX ROWID|DEPT   |     3|     1|     3|00:00:00.01|       5 |
|* 2 |  INDEX UNIQUE SCAN         |PK_DEPT|     3|     1|     3|00:00:00.01|       2 |
|  3 | TABLE ACCESS FULL          |EMP    |     1|    14|    14|00:00:00.01|       8 |
-------------------------------------------------------------------------------------
Predicate Information （identified by operation id）:
---------------------------------------------------
   2 - access（「D」.「DEPTNO」=:B1）

标量子查询类似一个天然的嵌套循环，而且驱动表固定为主表。大家是否还记得：嵌套循环被驱动表的连接列必须包含在索引中。同理，标量子查询中子查询的表连接列也必须包含在索引中。主表 EMP 通过连接列（DEPTNO）传值给子查询中的表（DEPT），执行计划中:B1 就表示传值，这个传值过程一共进行了 3 次，因为主表（EMP）的连接列（DEPTNO）基数等于 3。

select count(distinct deptno) from emp;

COUNT(DISTINCTDEPTNO)
---------------------
                    3

我们建议在工作中，尽量避免使用标量子查询，假如主表返回大量数据，主表的连接列基数很高，那么子查询中的表会被多次扫描，从而严重影响 SQL 性能。如果主表数据量小，或者主表的连接列基数很低，那么这个时候我们也可以使用标量子查询，但是记得要给子查询中表的连接列建立索引。

当 SQL 里面有标量子查询，我们可以将标量子查询等价改写为外连接，从而使它们可以进行 HASH 连接。为什么要将标量子查询改写为外连接而不是内连接呢？因为标量子查询是一个传值的过程，如果主表传值给子查询，子查询没有查询到数据，这个时候会显示 NULL。如果将标量子查询改写为内连接，会丢失没有关联上的数据。

现有如下标量子查询。

select d.dname,
         d.loc,
         (select max(e.sal) from emp e where e.deptno = d.deptno) max_sal
    from dept d;

DNAME          LOC              MAX_SAL
-------------- ------------- ----------
ACCOUNTING     NEW YORK            5000
RESEARCH       DALLAS              3000
SALES          CHICAGO             2850
OPERATIONS     BOSTON           ---NULL

我们可以将其等价改写为外连接：

select d.dname， d.loc， e.max_sal
    from dept d
   left join （select max(sal) max_sal,
                      deptno
                   from emp
               group by deptno）e
  on d.deptno = e.deptno；

DNAME          LOC              MAX_SAL
-------------- ------------- ----------
ACCOUNTING     NEW YORK            5000
RESEARCH       DALLAS              3000
SALES          CHICAGO             2850
OPERATIONS     BOSTON           ---NULL

当然了，如果主表的连接列是外键，而子查询的连接列是主键，我们就没必要改写为外连接了，因为外键不可能存 NULL 值，可以直接改写为内连接。例如本书中所用的标量子查询示例就可以改写为内连接，因为 DEPT 与 EMP 有主外键关系。

select e.ename, e.sal, d.dname
  from emp e
 inner join dept d on e.deptno = d.deptno;

在 Oracle12c 中，简单的标量子查询会被优化器等价改写为外连接。