实际工作中遇到的~


现在有一份用户发布内容的记录表。

运营小姐姐想知道,在凌晨00:10:00 内发过内容且前一天没有发过内容的用户有哪些。
这里前一天指的是,相对于凌晨10分之内发布内容的这条数据的前一天。

publish_table

user_id publish_time
10001 2020-5-26 13:13:02
10001 2020-5-28 00:02:02
10001 2020-5-29 00:02:02
10002 2020-5-29 00:12:02


SQL-试题2 两次发文的时间差 - 图1这道题的关键是在于确定在有凌晨发文记录的行为,并取得该行为的上一次行为时间。

答案如下:

  1. SELECT DISTINCT user_id from (
  2. SELECT t1.user_id
  3. ,t1.publish_time as x
  4. ,max(t2.publish_time) as y
  5. from publish_table t1 INNER JOIN publish_table t2
  6. on t1.user_id = t2.user_id
  7. where t2.publish_time < t1.publish_time
  8. GROUP BY t1.user_id,t1.publish_time
  9. ) t3
  10. where DATEDIFF(x,y)>1 and DATE_FORMAT(x,'%T') < '00:10:00';

下面是详解:

为了测试首先在数据库内创建测试数据集:将上方思维导图提到的可能都建立了一遍:

image.png

1、先看看有凌晨发文的都是谁

SELECT user_id,publish_time,DATE_FORMAT(publish_time,'%T') from publish_table 
where DATE_FORMAT(publish_time,'%T') < '00:10:00';

image.png
其实这里只有三条记录,其实用肉眼瞅一下都知道是10001是最后的那个人。但是你要想象这里有无数的数据。

2、找到每条记录的比自己小的发文时间

SELECT t1.user_id,t1.publish_time,t2.publish_time as x from publish_table t1 
INNER JOIN publish_table t2 
on t1.user_id = t2.user_id where t2.publish_time < t1.publish_time;

image.png

3、找到每条记录上一次的发文时间

上面第二步是为了这一步的铺垫。

    SELECT t1.user_id
                ,t1.publish_time as x
                ,max(t2.publish_time) as y 

            from publish_table t1 INNER JOIN publish_table t2 
            on t1.user_id = t2.user_id 
            where t2.publish_time < t1.publish_time
            GROUP BY t1.user_id,t1.publish_time ;

image.png

4、找出其中,相差超过1天的记录,且x的发文时间是在凌晨10分内

SELECT DISTINCT user_id from (
    SELECT t1.user_id
                ,t1.publish_time as x
                ,max(t2.publish_time) as y 

            from publish_table t1 INNER JOIN publish_table t2 
            on t1.user_id = t2.user_id 
            where t2.publish_time < t1.publish_time
            GROUP BY t1.user_id,t1.publish_time 

) t3
where DATEDIFF(x,y)>1 and DATE_FORMAT(x,'%T') < '00:10:00';

image.png

5、考虑性能的版本

用生成环境的数据测试了下,不优化的版本反而快了30%多。

-- 创建视图 table_A 为了将表进行缩小
CREATE VIEW table_A AS 
SELECT t1.user_id,t1.publish_time from publish_table t1 INNER JOIN (
SELECT user_id,publish_time,DATE_FORMAT(publish_time,'%T') from publish_table where DATE_FORMAT(publish_time,'%T') < '00:10:00')t2
on t1.user_id = t2.user_id;

-- 将原答案中的 publish_table 都换成 table_A;

SELECT DISTINCT user_id from (
    SELECT t1.user_id
                ,t1.publish_time as x
                ,max(t2.publish_time) as y 

            from table_A t1 INNER JOIN table_A t2 
            on t1.user_id = t2.user_id 
            where t2.publish_time < t1.publish_time
            GROUP BY t1.user_id,t1.publish_time 

) t3
where DATEDIFF(x,y)>1 and DATE_FORMAT(x,'%T') < '00:10:00';

具体能提升多少性能,咱也不懂,反正在本地测试从0.003s提升到了0.002s;
**