方法介绍:distinct 关键字、group by 、row_number ()over(partition by 列 order by 列 desc)
需求:求出所有购入商品为两种或两种以上的购物人记录
思路:
1、userid A 和 product 甲 ,这组数据明显有重复,需要去重统计
方式一、distinct 关键字
SELECT
userid, product, amount
FROM
hql_10_shop
WHERE
userid IN (
SELECT
userid
FROM
hql_10_shop
GROUP BY userid
HAVING
count(DISTINCT product) >= 2
distinct 单列
select distinct(a) from tableA;
distinct 多列
--会报错,distinct 必须放在 select 语句的最前方
select a, distinct b,c from tableA;
--正确
select distinct b,a,c from tableA;
多列的时候,上述语句是针对abc的组合来distinct的,列出的是abc的所有不同组合,相当于下列语句
select a,b,c from tableA group by a,b,c;
那么,如果想要分别查询a,b,c三个字段的distinct值怎么办?一开始可能只想到下列三个语句
select distinct a from tableA;
select distinct b from tableA;
select distinct c from tableA;
麻烦,如何合并成一个语句呢,使用union
解决:
select distinct(a) from tableA
union all
select distinct(b) from tableA
union all
select distinct(c) from tableA
方式二、group by
属于分组去重,需要套用子查询
1、先对userid和product进行分组,那么userid和product都相同的记录只有一条,更细粒度
2、再对上面的结果表,根据userid分组
SELECT
userid, product, amount
FROM
hql_10_shop
WHERE
userid IN (
SELECT
userid
FROM
(
SELECT
userid
FROM
hql_10_shop
GROUP BY userid, product
) t1
GROUP BY
userid
HAVING
count(*) >= 2
)
方式三、row_number() over(partition by 列 order by 列 desc)
使用窗口函数,对相同的数据进行编号,只保留编号为1的,即可实现去重
SELECT
userid, product, amount
FROM
hql_10_shop
WHERE
userid IN
(
SELECT
userid
FROM
(
SELECT
userid, product, amount,
row_number() OVER (PARTITION BY userid,product ORDER BY amount DESC) cn
FROM
hql_10_shop
) t1
WHERE
cn = 1
GROUP BY userid
HAVING
count(*) >= 2
);
对userid和product分区 —t1表
根据t1表,保留cn=1的即可
根据userid分组,再having过滤出count(*)>=2的组
最终结果