实际工作中遇到的~
现在有一份用户发布内容的记录表。
运营小姐姐想知道,在凌晨00:10:00 内发过内容,且前一天没有发过内容的用户有哪些。
这里前一天指的是,相对于凌晨10分之内发布内容的这条数据的前一天。
publish_table
| user_id | publish_time |
|---|---|
| 10001 | 2020-5-26 13:13:02 |
| 10001 | 2020-5-28 00:02:02 |
| 10001 | 2020-5-29 00:02:02 |
| 10002 | 2020-5-29 00:12:02 |
| … … |
… … |
这道题的关键是在于确定在有凌晨发文记录的行为,并取得该行为的上一次行为时间。
答案如下:
SELECT DISTINCT user_id from (SELECT t1.user_id,t1.publish_time as x,max(t2.publish_time) as yfrom publish_table t1 INNER JOIN publish_table t2on t1.user_id = t2.user_idwhere t2.publish_time < t1.publish_timeGROUP BY t1.user_id,t1.publish_time) t3where DATEDIFF(x,y)>1 and DATE_FORMAT(x,'%T') < '00:10:00';
下面是详解:
为了测试首先在数据库内创建测试数据集:将上方思维导图提到的可能都建立了一遍:

1、先看看有凌晨发文的都是谁
SELECT user_id,publish_time,DATE_FORMAT(publish_time,'%T') from publish_table
where DATE_FORMAT(publish_time,'%T') < '00:10:00';

其实这里只有三条记录,其实用肉眼瞅一下都知道是10001是最后的那个人。但是你要想象这里有无数的数据。
2、找到每条记录的比自己小的发文时间
SELECT t1.user_id,t1.publish_time,t2.publish_time as x from publish_table t1
INNER JOIN publish_table t2
on t1.user_id = t2.user_id where t2.publish_time < t1.publish_time;
3、找到每条记录上一次的发文时间
上面第二步是为了这一步的铺垫。
SELECT t1.user_id
,t1.publish_time as x
,max(t2.publish_time) as y
from publish_table t1 INNER JOIN publish_table t2
on t1.user_id = t2.user_id
where t2.publish_time < t1.publish_time
GROUP BY t1.user_id,t1.publish_time ;
4、找出其中,相差超过1天的记录,且x的发文时间是在凌晨10分内
SELECT DISTINCT user_id from (
SELECT t1.user_id
,t1.publish_time as x
,max(t2.publish_time) as y
from publish_table t1 INNER JOIN publish_table t2
on t1.user_id = t2.user_id
where t2.publish_time < t1.publish_time
GROUP BY t1.user_id,t1.publish_time
) t3
where DATEDIFF(x,y)>1 and DATE_FORMAT(x,'%T') < '00:10:00';

5、考虑性能的版本
用生成环境的数据测试了下,不优化的版本反而快了30%多。
-- 创建视图 table_A 为了将表进行缩小
CREATE VIEW table_A AS
SELECT t1.user_id,t1.publish_time from publish_table t1 INNER JOIN (
SELECT user_id,publish_time,DATE_FORMAT(publish_time,'%T') from publish_table where DATE_FORMAT(publish_time,'%T') < '00:10:00')t2
on t1.user_id = t2.user_id;
-- 将原答案中的 publish_table 都换成 table_A;
SELECT DISTINCT user_id from (
SELECT t1.user_id
,t1.publish_time as x
,max(t2.publish_time) as y
from table_A t1 INNER JOIN table_A t2
on t1.user_id = t2.user_id
where t2.publish_time < t1.publish_time
GROUP BY t1.user_id,t1.publish_time
) t3
where DATEDIFF(x,y)>1 and DATE_FORMAT(x,'%T') < '00:10:00';
具体能提升多少性能,咱也不懂,反正在本地测试从0.003s提升到了0.002s;
**
