题目1:每日留存率(一个user_register表,另一个是user_active表)统计指定日期新客数及这些新客的次日留存、七日留存

步骤:1.算每日新增客数2. 每日留存数量3.留存率

  1. SELECT register_date, interval, active_num/new_register_num as retention rate
  2. FROM (SELECT register_date, count(*) as new_register_num
  3. FROM user_register
  4. WHERE register_date ='2020-09-01'
  5. ) as t1
  6. JOIN (SELECT register_date,datediff(active_date,register_date) as interval ,count(*) as active_num
  7. FROM user_register
  8. JOIN user_active
  9. ON user_register.user_id=user_active.user_id
  10. WHERE register_date = '2020-09-01' and datediff(active_date,register_date)<=7
  11. GROUP BY active_date
  12. ORDER BY active_date
  13. )as t2
  14. ON t1.register_date=t2.register_date
  15. ORDER BY interval

image.png

题目2: 计算用户留存率,留存率=(昨天活跃用户∩今天活跃用户)/昨天活跃用户

用户活跃表:dws_user_active_t
ds bigint comment ‘日期,分区,yyyymmdd’,
device int comment ‘设备类型’,
user_id bigint comment ‘用户id’,
active_score double comment ‘活跃度’
计算留存率=(昨天活跃用户∩今天活跃用户)/昨天活跃用户

步骤:
1.昨天活跃用户2.昨天活跃用户∩今天活跃用户3./

  1. select t1.ds,'第二天仍活跃用户'/'昨天活跃用户' as '留存率'
  2. from (
  3. select ds,count(distinct user_id) as '昨天活跃用户'
  4. from dws_user_active_t
  5. where ds = '2020-12-11' and active_score > 0
  6. ) t1
  7. join (
  8. select d1.ds, count(distinct d2.user_id) as '第二天仍活跃用户'
  9. from dws_user_active_t d1
  10. join dws_user_active_t d2
  11. on datediff(d2.ds,d1.ds)=1 and d1.user_id = d2.user_id
  12. where d1.ds = '2020-12-11' and d1.active_score > 0 and d2.active_score > 0
  13. ) t2
  14. on t1.ds = t2.ds

优化-相关子查询

  1. select ds,
  2. count(distinct user_id) as '活跃用户',
  3. (
  4. select count(distinct t2.user_id) as '第二天仍活跃用户'
  5. from dws_user_active_t t2
  6. where datediff(ds,t.ds)=1 and t1.active_score > 0 and t2.active_score > 0
  7. ),
  8. '第二天仍活跃用户'/'昨天活跃用户' as '留存率'
  9. from dws_user_active_t t1
  10. where ds = '2020-12-11' and active_score > 0