统计今日总消息量
统计今日每小时消息量、发送和接收用户数
统计今日各地区发送消息数据量
统计今日发送消息和接收消息的用户数
统计今日发送消息最多的Top10用户
统计今日接收消息最多的Top10用户
统计发送人的手机型号分布情况
统计发送人的设备操作系统分布情况
问题:
1. 实际开发中,拿到业务需求指标,如何下手?
2. SQL层面如何编写查询语句?
解答:
1. 正确解读业务需求,避免歧义
2. 确定待查询的数据表—>from 表
3. 找出分析的维度—>group by 分组的字段
4. 找出计算的指标—>聚合的字段
5. 其他细节点(过滤、排序等)
查询类SQL编写思路举例 :
表: t_user(id, name, age, sex, city)
需求:统计每个城市男女人数与男女平均年龄
分组字段: 每个城市、 男女, 也就意味着同一个城市,性别相同的人应该分到同一组,因此这里需要根据两个字段进行分组
聚合字段: 人数、 平均年龄
count(id)就是统计每个分组中的条数—>人数
avg(age)就是统计每个分组中年龄的平均值—>平均年龄
需求指标统计 :
指标1:统计今日消息总量
create table if not exists tb_rs_total_msg_cnt
comment "今日消息总量" as
select dayinfo,
count(*) as total_msg_cnt
from db_msg.tb_msg_etl group by dayinfo;
指标2: 统计每小时消息量、 发送和接收用户数
create table if not exists tb_rs_hour_msg_cnt
comment "每小时消息量趋势"
as
select
dayinfo,
hourinfo,
count(*) as total_msg_cnt,
count(distinct sender_account) as sender_usr_cnt,
count(distinct receiver_account) as receiver_usr_cnt
from db_msg.tb_msg_etl
group by dayinfo,hourinfo;
指标3: 统计今日各地区发送消息总量 ** **Cast(字段名 as 转换的类型 ),类型转换
create table if not exists tb_rs_loc_cnt
comment "今日各地区发送消息总量"
as
select
dayinfo,
sender_gps,
cast(sender_lng as double) as longitude,
cast(sender_lat as double) as latitude,
count(*) as total_msg_cnt
from db_msg.tb_msg_etl
group by dayinfo,sender_gps,sender_lng,sender_lat;
指标4: 统计今日发送和接收用户人数
create table if not exists tb_rs_usr_cnt
comment "今日发送消息人数、接受消息人数"
as
select
dayinfo,
count(distinct sender_account) as sender_usr_cnt,
count(distinct receiver_account) as receiver_usr_cnt
from db_msg.tb_msg_etl
group by dayinfo;
指标5: 统计发送消息条数最多的Top10用户
create table if not exists tb_rs_susr_top10
comment "发送消息条数最多的Top10用户"
as
select
dayinfo,
sender_name as username,
count(*) as sender_msg_cnt
from db_msg.tb_msg_etl
group by dayinfo,sender_name
order by sender_msg_cnt desc
limit 10;
指标6: 统计接收消息条数最多的Top10用户
create table if not exists tb_rs_rusr_top10
comment "接受消息条数最多的Top10用户"
as
select
dayinfo,
receiver_name as username,
count(*) as receiver_msg_cnt
from db_msg.tb_msg_etl
group by dayinfo,receiver_name
order by receiver_msg_cnt desc
limit 10;
指标7: 统计发送人的手机型号分布情况
create table if not exists tb_rs_sender_phone
comment "发送人的手机型号分布"
as
select
dayinfo,
sender_phonetype,
count(distinct sender_account) as cnt
from tb_msg_etl
group by dayinfo,sender_phonetype;
指标8: 统计发送人的操作系统分布
create table if not exists tb_rs_sender_os
comment "发送人的OS分布"
as
select
dayinfo,
sender_os,
count(distinct sender_account) as cnt
from tb_msg_etl
group by dayinfo,sender_os;