版权声明:本文为CSDN博主「詩和遠方」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/helloword4217/article/details/103895643
SQL的开发者都知道Full Join(同Full Outer Join)是左右两个表的记录都取,但若多个表之间Full Join,较容易出现意料之外的结果,数据可能会重复,这里做一下简单介绍。
数据准备
CREATE TABLE TA(ID INT,Attr NVARCHAR(100));
CREATE TABLE TB(ID INT,Attr NVARCHAR(100));
CREATE TABLE TC(ID INT,Attr NVARCHAR(100));
INSERT INTO TA VALUES(1,'a'),(2,'b'),(3,'c'),(4,NULL),(5,'e');
INSERT INTO TB VALUES(2,'aa'),(3,'bb'),(4,'cc'),(8,NULL);
INSERT INTO TC VALUES(4,'aaa'),(5,'bbb'),(6,'ccc'),(8,'ddd');
需求
将TA、TB、TC三个表数据集中到一个结果集,若ID值重复,Attr字段依次从TA、TB、TC中取第一个非空值。
注:以上需求无法简单地用Union实现
SQL写法
错误写法
一般会像用 Inner Join 或 Left Join 一样,直接多次 Full Join
SELECT
COALESCE(a.ID, b.ID, c.ID) AS ID
,COALESCE(a.Attr, b.Attr, c.Attr) AS Attr
FROM TA a
FULL OUTER JOIN TB b ON a.ID = b.ID
FULL OUTER JOIN TC c ON a.ID = c.ID
查询结果如下:
可以发现ID会有重复,所以这样写是不稳妥的,但为什么会错误呢,请看下图:
由于两个Full Join的条件都是与TA进行ID相等性连接,那么对TA中不存在且TB和TC都存在的ID,就会出现两条不同的记录。
正确写法
写法一:改进Join条件
SELECT
COALESCE(a.ID, b.ID, c.ID) AS ID
,COALESCE(a.Attr, b.Attr, c.Attr) AS Attr
FROM TA a
FULL OUTER JOIN TB b ON (a.ID = b.ID)
FULL OUTER JOIN TC c ON (a.ID = c.ID OR b.ID = c.ID)
此方法在表非常多的时候条件会比较复杂,易出错
写法二:每次Join两个表,结果再与后面的表Join
WITH cte_a AS
(
SELECT
COALESCE(a.ID, b.ID) AS ID
,COALESCE(a.Attr, b.Attr) AS Attr
FROM TA a
FULL OUTER JOIN TB b ON a.ID = b.ID
)
SELECT
COALESCE(a.ID, b.ID) AS ID
,COALESCE(a.Attr,b.Attr) AS Attr
FROM cte_a AS a
FULL OUTER JOIN TC b ON a.ID = b.ID
此方法在表非常多时,代码会比较冗长
写法三:构造全量ID,转换为Left Join
WITH cte_a AS
(
SELECT ID FROM TA UNION
SELECT ID FROM TB UNION
SELECT ID FROM TC
)
SELECT
COALESCE(b.ID, c.ID, d.ID) AS ID
,COALESCE(b.Attr, c.Attr, d.Attr) AS Attr
FROM cte_a AS a
LEFT JOIN TA b ON a.ID = b.ID
LEFT JOIN TB c ON a.ID = c.ID
LEFT JOIN TC d ON a.ID = d.ID
此方法使用Union去重,会存在Sort操作,必要时可以加上索引
————————————————
版权声明:本文为CSDN博主「詩和遠方」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/helloword4217/article/details/103895643链接