手写HQL 第1题
表结构:uid,subject_id,score
求:找出所有科目成绩都大于某一学科平均成绩的学生
数据集如下
1001 01 901001 02 901001 03 901002 01 851002 02 851002 03 701003 01 701003 02 701003 03 85
1)建表语句
CREATE TABLE score(uid string,subject_id string,score int) ROW format delimited fields terminated BY '\t';
2)求出每个学科平均成绩
SELECT uid,score,AVG(score) over(partition BY subject_id) avg_scoreFROM score;t1
3)根据是否大于平均成绩记录flag,大于则记为0否则记为1
SELECT uid,if(score > avg_score, 0, 1) flagFROM t1;t2
4)根据学生id进行分组统计flag的和,和为0则是所有学科都大于平均成绩
SELECT uidFROM t2GROUP BY uidHAVING SUM(flag) = 0;
5)最终SQL
SELECT uidFROM (SELECT uid,if(score > avg_score, 0, 1) flagFROM (SELECT uid,score,AVG(score) over(partition BY subject_id) avg_scoreFROM score) t1) t2GROUP BY uidHAVING SUM(flag) = 0;
手写HQL 第2题
我们有如下的用户访问数据
| userId | visitDate | visitCount |
|---|---|---|
| u01 | 2017/1/21 | 5 |
| u02 | 2017/1/23 | 6 |
| u03 | 2017/1/22 | 8 |
| u04 | 2017/1/20 | 3 |
| u01 | 2017/1/23 | 6 |
| u01 | 2017/2/21 | 8 |
| U02 | 2017/1/23 | 6 |
| U01 | 2017/2/22 | 4 |
要求使用SQL统计出每个用户的累积访问次数,如下表所示:
| 用户id | 月份 | 小计 | 累积 |
|---|---|---|---|
| u01 | 2017-01 | 11 | 11 |
| u01 | 2017-02 | 12 | 23 |
| u02 | 2017-01 | 12 | 12 |
| u03 | 2017-01 | 8 | 8 |
| u04 | 2017-01 | 3 | 3 |
数据集
u01 2017/1/21 5u02 2017/1/23 6u03 2017/1/22 8u04 2017/1/20 3u01 2017/1/23 6u01 2017/2/21 8u02 2017/1/23 6u01 2017/2/22 4
1)创建表
CREATE TABLE action(userId string,visitDate string,visitCount int) ROW format delimited fields terminated BY "\t";
2)修改数据格式
SELECT userId,date_format(regexp_replace(visitDate, '/', '-'), 'yyyy-MM') mn,visitCountFROM action;t1
3)计算每人单月访问量
SELECT userId,mn,SUM(visitCount) mn_countFROM t1GROUP BY userId, mn;t2
4)按月累计访问量
SELECT userId,mn,mn_count,SUM(mn_count) over(partition BY userId ORDER BY mn)FROM t2;
5)最终SQL
SELECT userId,mn,mn_count,SUM(mn_count) over(partition BY userId ORDER BY mn)FROM (SELECT userId,mn,SUM(visitCount) mn_countFROM (SELECT userId,date_format(regexp_replace(visitDate, '/', '-'), 'yyyy-MM') mn,visitCountFROM action) t1GROUP BY userId, mn) t2;
手写HQL 第3题
有50W个京东店铺,每个顾客访客访问任何一个店铺的任何一个商品时都会产生一条访问日志,访问日志存储的表名为Visit,访客的用户id为user_id,被访问的店铺名称为shop,请统计:
1)每个店铺的UV(访客数)
2)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数
数据集
u1 au2 bu1 bu1 au3 cu4 bu1 au2 cu5 bu4 bu6 cu2 cu1 bu2 au2 au3 au5 au5 au5 a
1)建表
CREATE TABLE visit(user_id string,shop string) ROW format delimited fields terminated BY '\t';
2)每个店铺的UV(访客数)
SELECT shop, COUNT(DISTINCT user_id)FROM visitGROUP BY shop;
3)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数
(1)查询每个店铺被每个用户访问次数
SELECT shop, COUNT(DISTINCT user_id)FROM visitGROUP BY shop;
(2)计算每个店铺被用户访问次数排名
SELECT shop, COUNT(DISTINCT user_id)FROM visitGROUP BY shop;
(3)取每个店铺排名前3的
select shop,user_id,ctfrom t2where rk<=3;
(4)最终SQL
SELECT shop,user_id,ctFROM (SELECT shop,user_id,ct,rank() over(partition BY shop ORDER BY ct) rkFROM (SELECT shop,user_id,COUNT(*) ctFROM visitGROUP BY shop,user_id) t1) t2WHERE rk <= 3;
