今日内容:
- 1- presto的架构 (理解)
- 2- presto的相关时间函数 (知道)
- 3- presto的内存调整(理解)
- 4- Presto高级语法: grouping sets cube rollup 和 grouping() (主要掌握: grouping sets 和 grouping())
- 5- DWS层实现操作
- 销售主题日统计宽表 (操作)
1- Presto相关内容
1.1 Presto 基本介绍
Presto是一个大数据旗下的分布式SQL查询引擎, Presto可以独立提供计算分析操作, 不需要依赖于其他的计算引擎, 而HIVE仅仅是一个工具, 最终计算是依赖于MR或者其他的执行引擎
Presto可以对接多种数据源, 可以从不同的数据源中读取数据进行分析处理, 一条presto查询可以将多个数据源进行合并, 可以跨越多个不同的组织进行分析
Presto是完全基于内存的计算引擎, 这也导致Presto不能对海量大量的数据进行统计分析操作, 数据集一般 在 GB ~ PB左右(集群数量越多, 资源越多, 可以计算的数据量越高)
性能对比图表:
应用场景: ```properties 1) 适用于对数据源统一查询 (多数据源场景) 2) 适用于 GB TB 快速的数据查询操作, 数据量越大, 资源占用越高
- 销售主题日统计宽表 (操作)
不适用场景: 1) 多张数据量比较大的表Join操作 2) 不适合进行清洗 转换 维度退化的相关操作, 主要应用在数据分析上
<a name="PKJCD"></a>
### 1.2 Presto的集群安装操作
参考 <<presto安装手册(参考此手册安装presto)>> 以及视频, 完成安装操作 (本地环境)
<a name="D5NGS"></a>
### 1.3 使用DBeaver连接Presto
演示连接云平台的presto:<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651743440067-b5457719-c7d9-46c5-a9c5-cd597788914b.png#clientId=u919621ec-ffc4-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=193&id=u53f8df0f&margin=%5Bobject%20Object%5D&name=image.png&originHeight=241&originWidth=564&originalType=binary&ratio=1&rotation=0&showTitle=false&size=19105&status=done&style=none&taskId=u19db7a66-5f8c-4074-b98b-3604da9af83&title=&width=451.2)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651743472626-ab4107e4-42a4-48fa-964c-acec091eeb0e.png#clientId=u919621ec-ffc4-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=502&id=ua24f3142&margin=%5Bobject%20Object%5D&name=image.png&originHeight=628&originWidth=609&originalType=binary&ratio=1&rotation=0&showTitle=false&size=39664&status=done&style=none&taskId=u4cc83126-660b-4efd-aa40-cbd8277405a&title=&width=487.2)<br />此处连接云端环境, 如果连接本地, 将ip地址更改为 192.168.88.80 即可<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651743600763-205d14f8-75d1-4ce7-b86f-29a9fa3e5d93.png#clientId=u919621ec-ffc4-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=583&id=ub0872591&margin=%5Bobject%20Object%5D&name=image.png&originHeight=729&originWidth=1118&originalType=binary&ratio=1&rotation=0&showTitle=false&size=85950&status=done&style=none&taskId=u005f1bfe-5810-4203-9f72-47cdc23bdf4&title=&width=894.4)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651743644100-b13773ee-6cc2-4456-9904-a33dba82f79f.png#clientId=u919621ec-ffc4-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=499&id=u189e769e&margin=%5Bobject%20Object%5D&name=image.png&originHeight=624&originWidth=612&originalType=binary&ratio=1&rotation=0&showTitle=false&size=50150&status=done&style=none&taskId=u181bc20c-1d98-49b8-b5b6-918ab2e5638&title=&width=489.6)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651743761775-13063293-01b8-40e3-a6d3-69f87debfc51.png#clientId=u919621ec-ffc4-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=178&id=u5056cf61&margin=%5Bobject%20Object%5D&name=image.png&originHeight=222&originWidth=492&originalType=binary&ratio=1&rotation=0&showTitle=false&size=22800&status=done&style=none&taskId=u939dcc29-c337-4cf7-912b-e7c9ec78694&title=&width=393.6)
<a name="OWTNG"></a>
### 1.4 Presto的架构
![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651762077050-87e03b09-f90e-44ea-90fd-8064c05accfa.png#clientId=ucd899c45-64d9-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=451&id=ue0417916&margin=%5Bobject%20Object%5D&name=image.png&originHeight=564&originWidth=1146&originalType=binary&ratio=1&rotation=0&showTitle=false&size=144838&status=done&style=none&taskId=ua1aed146-5c51-4d1a-a685-0157778551e&title=&width=916.8)
```properties
整个presto是一个 M-S架构 (主从架构):
coordinator: 主节点
作用: 负责接收客户端发送的SQL, 对SQL进行编译, 形成执行计划, 根据执行计划, 分发给各个从节点进行执行操作
discovery service: 附属节点
作用: 一般内嵌在主节点中, 主要负责维护从节点列表, 当从节点启动后, 都需要到 discovery 节点进行注册操作
worker节点: 从节点
作用: 负责接收coordinator传递过来任务, 对任务进行具体处理工作(读取数据, 处理数据, 将处理后结果数据返回给coordinator)
1.5 Presto的相关时间函数
在presto中, 对于数据类型要求比较严格, 比如 数据字段类型为date类型,那么在基于这个字段进行过滤的时候, 编写的过滤条件上值必须也是date类型
日期转换操作:
date_format(timestamp,format): 将一个带有年月日 时分秒的日期对象 转换为字符串
date_parse(string,format) ---> timestamp: 将带有年月日 时分秒的日期字符串 转换为 日期对象
date(日期对象)--->date : 将带有年月日 时分秒的日期对象, 转换为仅包含年月日的日期对象
说明:
date类型: 表示只有年 月 日
timestamp类型: 表示 年 月 日 时 分 秒
format格式:
年:%Y
月:%m
日:%d
时:%H
分:%i
秒:%s
周几:%w(0..6)
-- timestamp '日期字符串数据' 直接转换为 日期对象 , 要求 后面的日期字符串的数据必须是标准格式的日期
-- 即使 日期字符串数据, 只有年月日 依然是可以转换的, 只不过 时分秒都是 0
select date_format( timestamp '2020-10-10 12:50:50' , '%Y/%m/%d %H:%i:%s' );
-- date_parse: 可以将指定日期格式数据转换为日期对象
select date_format( date_parse('2020-10-10 12:50:50','%Y-%m-%d %H:%i:%s') , '%Y/%m/%d %H:%i:%s' );
-- date '字符串' 直接转换为日期对象, 仅能处理标准的年月日的数据, 无法处理带有时分秒的操作
select date_format( date '2020-10-10' , '%Y/%m/%d %H:%i:%s' );
日期计算的操作:
date_add(unit,value,timestamp) → [same as input] : 根据 unit设置时间单位, 对timestamp的时间数据 进行 value计算 , 计算为 + 号 , 如果想计算减号, 可以让value为负数即可
date_diff(unit, timestamp1, timestamp2) → bigint : 根据 unit单位,时间timestamp2 - 时间timestamp1 , 得出差值
select date_add('month',-3, timestamp '2021-05-05 12:20:21'); -- 2021-02-05 12:20:21
select date_diff('month', timestamp '2020-07-30 12:20:21', timestamp '2021-05-01 12:20:21') -- 9
1.6 Presto的内存调整
内存参数信息描述图:
说明: node1和 node2和内部结构 和 node3是一致的, 这里只是简写
1) 各个节点 JVM内存推荐大小: 当前节点剩余内存 80%
2) 对于 memory.heap-headroom-per-node 第三方库的内存配置 : 建议 jvm内存的 15%左右, 默认为30%
3) 在配置的时候, 不要正正好后, 建议预留一点点, 以免出现问题,建议预留 5%~10%
4) 用户内存和系统内存之间比较, 一般是以 8/2原则或者 7/3原则
经验说明:
数据量在35TB, presto节点数量大约在30台左右(每台节点: 128GB内存 + 8核CPU)
在presto中特殊的SQL优化点:
1) 在进行group by 操作的时候,如果分组字段比较多, 将分组字段中, 通过distinct去重后, 值比较多的字段放置在前面, 比较少往后放置
好处: 可以已开启就被分为更多的组, 让更多的worker参与计算操作
比如说:
select from 表 group by uid,sex;
说明: 一个表中 uid的值有很多的不同, 但是 sex一般只有二个值, 此时将UID放置在最前面, 这样好处是presto在进行任务分配的时候, 可以让更多的worker参与进行计算操作
2) 在进行JOIN操作的时候, 要将大表放置join前面. 小表放置在Join后面
3) 尽量使用正则替换掉SQL中like查询操作
2- Presto高级语法
2.1 Grouping Sets、CUBE和ROLLUP
- Grouping sets:
- 当需要对表中各个字段进行分组操作的时候, 并且最终需要将各个分组的结果汇总在一个表的时候, 此时可以通过grouping sets 来简写, 将各个分组操作, 统一的放置在grouping sets(…) 中就OK了
- 使用grouping sets 可以一次性实现多个字段进行分别分组操作, 简化代码, 减少数据扫描. 提升效率 ```sql select store_id, null as group_id, sum(order_amount) as total_price from hive.bj59_yp_dwb_jiale.dwb_order_detail group by store_id
union all
select null as store_id, group_id, sum(order_amount) as total_price from hive.bj59_yp_dwb_jiale.dwb_order_detail group by group_id
union all select store_id, group_id, sum(order_amount) as total_price from hive.bj59_yp_dwb_jiale.dwb_order_detail group by store_id,group_id ;
— 以上代码, 同一个表扫描三次, 执行了三次分组 对效率不太好
— 思考, 是否可以只需要读取一次, 就完成三次分组操作呢? select store_id, group_id, sum(order_amount) as total_price from hive.bj59_yp_dwb_jiale.dwb_order_detail group by grouping sets(store_id , group_id , (store_id,group_id) );
- cube:
- CUBE操作会生成提供column所有可能的grouping sets结果
- 例如:
- 书写 store_id,group_id , 组合分组结果: 空 , store_id,group_id, ( store_id,group_id)
- 书写 store_id,group_id,goods_id. 组合分组结果: 空,store_id,group_id,goods_id, (store_id,group_id),(group_id,goods_id),(store_id,group_id,goods_id)
```sql
select
store_id,
group_id,
sum(order_amount) as total_price
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by cube(store_id,group_id)
相当于:
select
store_id,
null as group_id,
sum(order_amount) as total_price
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by store_id
union all
select
null as store_id,
group_id,
sum(order_amount) as total_price
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by group_id
union all
select
store_id,
group_id,
sum(order_amount) as total_price
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by group_id,store_id
union all
select
null as store_id,
null as group_id,
sum(order_amount) as total_price
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by ()
相当于:
select
store_id,
group_id,
sum(order_amount) as total_price
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by grouping sets( (), store_id, group_id, (store_id,group_id) )
- rollup:
- 先将rollup中所有的字段全部组合在一起, 进行分组, 然后从后往前逐一递减进行分组,直到为空
- 如果设置: rollup(store_id,group_id) , 分组情况: (store_id,group_id), store_id, ()
- 如果设置: rollup(store_id,group_id,goods_id), 分组情况: (store_id,group_id,goods_id), (store_id,group_id),store_id,() ```sql select store_id, group_id, sum(order_amount) as total_price from hive.bj59_yp_dwb_jiale.dwb_order_detail group by rollup(store_id,group_id)
- 先将rollup中所有的字段全部组合在一起, 进行分组, 然后从后往前逐一递减进行分组,直到为空
相当于: select store_id, group_id, sum(order_amount) as total_price from hive.bj59_yp_dwb_jiale.dwb_order_detail group by grouping sets((store_id,group_id),store_id,())
说明: 对于 cube 以及 rollup 虽然可以提供更多的分组的情况, 但是实际使用中, 一般不用, 建议使用 grouping sets 来自定义自己分组情况即可, 这样更加灵活, 更加好理解
<a name="uzToD"></a>
### 2.2 Grouping
grouping(判断分组的字段 ....) : 主要是用于判断结果是否按照某个或者某几个字段的分组操作<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/22609247/1651809103360-fb3c4acc-cff1-4a32-8480-f41cd083ff37.png#clientId=u6dc2da0b-3bb2-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=385&id=ua821fa8c&margin=%5Bobject%20Object%5D&name=image.png&originHeight=481&originWidth=1202&originalType=binary&ratio=1&rotation=0&showTitle=false&size=31620&status=done&style=none&taskId=ud12bed75-999c-4144-a456-cabecdb50c5&title=&width=961.6)
```sql
select
store_id,
group_id,
sum(order_amount) as total_price,
grouping(store_id) as rn1, -- 如果只写一个, 返回结果只能 0 和 1
grouping(group_id) as rn2,
grouping(store_id,group_id) as rn3 -- 如果写了两个, 返回结果, 0 1 2 3 例如: 返回十进制 2, 二进制为 10 , 表示 按照group_id分组结果
from hive.bj59_yp_dwb_jiale.dwb_order_detail
group by grouping sets(store_id,group_id,(store_id,group_id));
3- DWS层实现操作
3.1 销售主题的日统计宽表
可分析的主要指标有:销售收入、平台收入、配送成交额、小程序成交额、安卓APP成交额、苹果APP成交额、PC商城成交额、订单量、参评单量、差评单量、配送单量、退款单量、小程序订单量、安卓APP订单量、苹果APP订单量、PC商城订单量。
维度有:日期、城市、商圈、店铺、品牌、大类、中类、小类。
维度组合:
日期: 日
日期 + 城市
日期 + 城市 + 商圈
日期 + 城市 + 商圈 + 店铺
日期 + 品牌
日期 + 大类
日期 + 大类 + 中类
日期 + 大类 + 中类 + 小类
16 * 8 = 128 个需求指标结果
分析, 当前需求统计的这些维度 和 指标, 需要涉及到那些表, 以及涉及到那些字段呢?
维度字段:
日期: dwb_order_detail.dt
城市: dwb_shop_detail: city_id 和 city_name
商圈: dwb_shop_detail: trade_area_id 和 trade_area_name
店铺: dwb_shop_detail: id 和 store_name
品牌: dwb_goods_detail: brand_id 和 brand_name
大类: dwb_goods_detail: max_class_id 和 max_class_name
中类: dwb_goods_detail: mid_class_id 和 mid_class_name
小类: dwb_goods_detail: min_class_id 和 min_class_name
指标字段:
订单量相关指标: dwb_order_detail.order_id
订单销售收入(销售收入, 小程序, 安卓, 苹果, pc端):dwb_order_detail.order_amount
平台收入: dwb_order_detail.plat_fee
配送费: wb_order_detail.delivery_fee
涉及表:
订单明细宽表(当前主题的事实表): dwb_order_detail (事实表)
店铺明细宽表: dwb_shop_detail (维度表)
商品明细宽表: dwb_goods_detail (维度表)
关联条件:
订单表 和 店铺表:
订单明细宽表.store_id = 店铺明细宽表.id
订单表 和 商品表:
订单明细宽表.goods_id = 商品明细宽表.id
思考: 当前这个是三种数仓模型那一种呢? 星型模型
是否需要过滤一些操作呢?
1- 保证必须是支付状态: is_pay = 1
2- 保证订单状态: order_state 不能是 1(已下单, 没有付款) 和 7 (已取消)
首先, 先创建DWS层库 和 表(销售主题日统计宽表) (在HIVE中重新建表)
-- 创建库:
create database if not exists bj59_yp_dws_jiale;
-- 创建表(指标字段 + 维度字段 + 经验字段 ):
DROP TABLE IF EXISTS bj59_yp_dws_jiale.dws_sale_daycount;
CREATE TABLE bj59_yp_dws_jiale.dws_sale_daycount(
--dt STRING,
province_id string COMMENT '省份id',
province_name string COMMENT '省份名称',
city_id string COMMENT '城市id',
city_name string COMMENT '城市name',
trade_area_id string COMMENT '商圈id',
trade_area_name string COMMENT '商圈名称',
store_id string COMMENT '店铺的id',
store_name string COMMENT '店铺名称',
brand_id string COMMENT '品牌id',
brand_name string COMMENT '品牌名称',
max_class_id string COMMENT '商品大类id',
max_class_name string COMMENT '大类名称',
mid_class_id string COMMENT '中类id',
mid_class_name string COMMENT '中类名称',
min_class_id string COMMENT '小类id',
min_class_name string COMMENT '小类名称',
-- 用于标记数据结果是按照那个维度来统计的一个经验字段
group_type string COMMENT '分组类型:store,trade_area,city,brand,min_class,mid_class,max_class, all',
-- =======日统计=======
-- 销售收入
sale_amt DECIMAL(38,2) COMMENT '销售收入',
-- 平台收入
plat_amt DECIMAL(38,2) COMMENT '平台收入',
-- 配送成交额
deliver_sale_amt DECIMAL(38,2) COMMENT '配送成交额',
-- 小程序成交额
mini_app_sale_amt DECIMAL(38,2) COMMENT '小程序成交额',
-- 安卓APP成交额
android_sale_amt DECIMAL(38,2) COMMENT '安卓APP成交额',
-- 苹果APP成交额
ios_sale_amt DECIMAL(38,2) COMMENT '苹果APP成交额',
-- PC商城成交额
pcweb_sale_amt DECIMAL(38,2) COMMENT 'PC商城成交额',
-- 成交单量
order_cnt BIGINT COMMENT '成交单量',
-- 参评单量
eva_order_cnt BIGINT COMMENT '参评单量comment=>cmt',
-- 差评单量
bad_eva_order_cnt BIGINT COMMENT '差评单量negtive-comment=>ncmt',
-- 配送成交单量
deliver_order_cnt BIGINT COMMENT '配送单量',
-- 退款单量
refund_order_cnt BIGINT COMMENT '退款单量',
-- 小程序成交单量
miniapp_order_cnt BIGINT COMMENT '小程序成交单量',
-- 安卓APP订单量
android_order_cnt BIGINT COMMENT '安卓APP订单量',
-- 苹果APP订单量
ios_order_cnt BIGINT COMMENT '苹果APP订单量',
-- PC商城成交单量
pcweb_order_cnt BIGINT COMMENT 'PC商城成交单量'
)
COMMENT '销售主题日统计宽表'
PARTITIONED BY(dt STRING)
ROW format delimited fields terminated BY '\t'
stored AS orc tblproperties ('orc.compress' = 'SNAPPY');
- SQL实现操作
-- 第一步: 对数据进行去重操作, 通过row_number实现
insert into hive.bj59_yp_dws_jiale.dws_sale_daycount
with t1 as (
select
-- 维度字段:
o.dt, -- 日期维度
s.province_id ,
s.province_name ,
s.city_id ,
s.city_name , -- 城市维度
s.trade_area_id ,
s.trade_area_name , -- 商圈维度
o.store_id ,
s.store_name , -- 店铺维度
g.brand_id ,
g.brand_name, -- 品牌维度
g.max_class_id ,
g.max_class_name , -- 大类
g.mid_class_id ,
g.mid_class_name , -- 中类
g.min_class_id ,
g.min_class_name , -- 小类
-- 指标字段
o.order_id , -- 订单id 计算订单量相关指标
o.order_amount, -- 订单销售收入
o.total_price , -- 商品销售收入
o.plat_fee , -- 平台分润
o.delivery_fee , -- 配送费用
-- 用于判断的字段
o.order_from , -- 渠道(小程序, 安卓, 苹果, pc)
o.evaluation_id , -- 评价表 id , 判断是否有参评
o.delievery_id , -- 配送表id, 判断是否有配送
o.geval_scores , -- 综合评分: 10分制
o.refund_id , -- 退款表id,判断是否有退款
-- 进行去重操作:
-- 计算 日期, 日期+城市, 日期+城市+商圈 , 日期+城市+商圈+店铺
row_number() over(partition by o.order_id) as order_rn,
-- 计算 日期 + 品牌 : 第一个去重计算 订单量 第二个去重 计算 销售额
row_number() over(partition by o.order_id,g.brand_id) as brand_rn,
row_number() over(partition by o.order_id,o.goods_id ,g.brand_id) as brand_goods_rn,
-- 计算 日期 + 大类
row_number() over(partition by o.order_id,g.max_class_id) as max_class_rn,
row_number() over(partition by o.order_id,o.goods_id ,g.max_class_id) as max_class_goods_rn,
-- 计算 日期 + 大类 + 中类
row_number() over(partition by o.order_id,g.max_class_id,g.mid_class_id) as mid_class_rn,
row_number() over(partition by o.order_id,o.goods_id ,g.max_class_id,g.mid_class_id) as mid_class_goods_rn,
-- 计算 日期 + 大类 + 中类 + 小类
row_number() over(partition by o.order_id,g.max_class_id,g.mid_class_id,g.min_class_id) as min_class_rn,
row_number() over(partition by o.order_id,o.goods_id ,g.max_class_id,g.mid_class_id,g.min_class_id) as min_class_goods_rn
from hive.bj59_yp_dwb_jiale.dwb_order_detail o
left join hive.bj59_yp_dwb_jiale.dwb_shop_detail s on o.store_id = s.id
left join hive.bj59_yp_dwb_jiale.dwb_goods_detail g on o.goods_id = g.id
where o.is_pay = 1 and o.order_state not in(1,7)
)
select
-- 维度字段
province_id,
province_name ,
city_id ,
city_name ,
trade_area_id ,
trade_area_name ,
store_id ,
store_name ,
brand_id ,
brand_name ,
max_class_id ,
max_class_name,
mid_class_id ,
mid_class_name,
min_class_id ,
min_class_name,
-- group_type字段的值, 需要根据不同的维度分组, 标上不同的值
case
when grouping(store_id) = 0 then 'store'
when grouping(trade_area_id) = 0 then 'trade_area'
when grouping(city_id) = 0 then 'city'
when grouping(brand_id) = 0 then 'brand'
when grouping(min_class_id) = 0 then 'min_class'
when grouping(mid_class_id) = 0 then 'mid_class'
when grouping(max_class_id) = 0 then 'max_class'
when grouping(dt) = 0 then 'all'
else 'other'
end as group_type,
-- 销售收入:
case
when grouping(store_id) = 0 then sum( if(order_rn = 1 and store_id is not null,coalesce(order_amount,0),0) )
when grouping(trade_area_id) = 0 then sum( if(order_rn = 1 and trade_area_id is not null,coalesce(order_amount,0),0) )
when grouping(city_id) = 0 then sum( if(order_rn = 1 and city_id is not null,coalesce(order_amount,0),0) )
when grouping(brand_id) = 0 then sum( if( brand_goods_rn = 1 and brand_id is not null,coalesce(total_price,0),0) )
when grouping(min_class_id) = 0 then sum( if( min_class_goods_rn = 1 and min_class_id is not null,coalesce(total_price,0),0) )
when grouping(mid_class_id) = 0 then sum( if( mid_class_goods_rn = 1 and mid_class_id is not null,coalesce(total_price,0),0) )
when grouping(max_class_id) = 0 then sum( if( max_class_goods_rn = 1 and max_class_id is not null,coalesce(total_price,0),0) )
when grouping(dt) = 0 then sum( if(order_rn = 1 and dt is not null,coalesce(order_amount,0),0) )
else NULL
end as sale_amt,
-- 平台收入:
case
when grouping(store_id) = 0 then sum( if(order_rn = 1 and store_id is not null,coalesce(plat_fee,0),0) )
when grouping(trade_area_id) = 0 then sum( if(order_rn = 1 and trade_area_id is not null,coalesce(plat_fee,0),0) )
when grouping(city_id) = 0 then sum( if(order_rn = 1 and city_id is not null,coalesce(plat_fee,0),0) )
when grouping(brand_id) = 0 then null
when grouping(min_class_id) = 0 then null
when grouping(mid_class_id) = 0 then null
when grouping(max_class_id) = 0 then null
when grouping(dt) = 0 then sum( if(order_rn = 1 and dt is not null,coalesce(plat_fee,0),0) )
else NULL
end as plat_amt,
-- 配送成交额
case
when grouping(store_id) = 0 then sum( if(order_rn = 1 and store_id is not null,coalesce(delivery_fee,0),0) )
when grouping(trade_area_id) = 0 then sum( if(order_rn = 1 and trade_area_id is not null,coalesce(delivery_fee,0),0) )
when grouping(city_id) = 0 then sum( if(order_rn = 1 and city_id is not null,coalesce(delivery_fee,0),0) )
when grouping(brand_id) = 0 then null
when grouping(min_class_id) = 0 then null
when grouping(mid_class_id) = 0 then null
when grouping(max_class_id) = 0 then null
when grouping(dt) = 0 then sum( if(order_rn = 1 and dt is not null,coalesce(delivery_fee,0),0) )
else NULL
end as deliver_sale_amt,
-- 小程序成交额:
case
when grouping(store_id) = 0
then sum(
if(
order_rn = 1 and store_id is not null and order_from = 'miniapp',
coalesce(order_amount,0),
0
)
)
when grouping(trade_area_id) = 0
then sum(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'miniapp',
coalesce(order_amount,0),
0
)
)
when grouping(city_id) = 0
then sum(
if(
order_rn = 1 and city_id is not null and order_from = 'miniapp',
coalesce(order_amount,0),
0
)
)
when grouping(brand_id) = 0
then sum(
if(
brand_goods_rn = 1 and brand_id is not null and order_from = 'miniapp',
coalesce(total_price,0),
0
)
)
when grouping(min_class_id) = 0
then sum(
if(
min_class_goods_rn = 1 and min_class_id is not null and order_from = 'miniapp',
coalesce(total_price,0),
0
)
)
when grouping(mid_class_id) = 0
then sum(
if(
mid_class_goods_rn = 1 and mid_class_id is not null and order_from = 'miniapp',
coalesce(total_price,0),
0
)
)
when grouping(max_class_id) = 0
then sum(
if(
max_class_goods_rn = 1 and max_class_id is not null and order_from = 'miniapp',
coalesce(total_price,0),
0
)
)
when grouping(dt) = 0
then sum(
if(
order_rn = 1 and dt is not null and order_from = 'miniapp',
coalesce(order_amount,0),
0
)
)
else NULL
end as mini_app_sale_amt,
-- 安卓成交额
case
when grouping(store_id) = 0
then sum(
if(
order_rn = 1 and store_id is not null and order_from = 'android',
coalesce(order_amount,0),
0
)
)
when grouping(trade_area_id) = 0
then sum(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'android',
coalesce(order_amount,0),
0
)
)
when grouping(city_id) = 0
then sum(
if(
order_rn = 1 and city_id is not null and order_from = 'android',
coalesce(order_amount,0),
0
)
)
when grouping(brand_id) = 0
then sum(
if(
brand_goods_rn = 1 and brand_id is not null and order_from = 'android',
coalesce(total_price,0),
0
)
)
when grouping(min_class_id) = 0
then sum(
if(
min_class_goods_rn = 1 and min_class_id is not null and order_from = 'android',
coalesce(total_price,0),
0
)
)
when grouping(mid_class_id) = 0
then sum(
if(
mid_class_goods_rn = 1 and mid_class_id is not null and order_from = 'android',
coalesce(total_price,0),
0
)
)
when grouping(max_class_id) = 0
then sum(
if(
max_class_goods_rn = 1 and max_class_id is not null and order_from = 'android',
coalesce(total_price,0),
0
)
)
when grouping(dt) = 0
then sum(
if(
order_rn = 1 and dt is not null and order_from = 'android',
coalesce(order_amount,0),
0
)
)
else NULL
end as android_sale_amt,
-- 苹果成交额
case
when grouping(store_id) = 0
then sum(
if(
order_rn = 1 and store_id is not null and order_from = 'ios',
coalesce(order_amount,0),
0
)
)
when grouping(trade_area_id) = 0
then sum(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'ios',
coalesce(order_amount,0),
0
)
)
when grouping(city_id) = 0
then sum(
if(
order_rn = 1 and city_id is not null and order_from = 'ios',
coalesce(order_amount,0),
0
)
)
when grouping(brand_id) = 0
then sum(
if(
brand_goods_rn = 1 and brand_id is not null and order_from = 'ios',
coalesce(total_price,0),
0
)
)
when grouping(min_class_id) = 0
then sum(
if(
min_class_goods_rn = 1 and min_class_id is not null and order_from = 'ios',
coalesce(total_price,0),
0
)
)
when grouping(mid_class_id) = 0
then sum(
if(
mid_class_goods_rn = 1 and mid_class_id is not null and order_from = 'ios',
coalesce(total_price,0),
0
)
)
when grouping(max_class_id) = 0
then sum(
if(
max_class_goods_rn = 1 and max_class_id is not null and order_from = 'ios',
coalesce(total_price,0),
0
)
)
when grouping(dt) = 0
then sum(
if(
order_rn = 1 and dt is not null and order_from = 'ios',
coalesce(order_amount,0),
0
)
)
else NULL
end as ios_sale_amt,
-- PC成交额
case
when grouping(store_id) = 0
then sum(
if(
order_rn = 1 and store_id is not null and order_from = 'pcweb',
coalesce(order_amount,0),
0
)
)
when grouping(trade_area_id) = 0
then sum(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'pcweb',
coalesce(order_amount,0),
0
)
)
when grouping(city_id) = 0
then sum(
if(
order_rn = 1 and city_id is not null and order_from = 'pcweb',
coalesce(order_amount,0),
0
)
)
when grouping(brand_id) = 0
then sum(
if(
brand_goods_rn = 1 and brand_id is not null and order_from = 'pcweb',
coalesce(total_price,0),
0
)
)
when grouping(min_class_id) = 0
then sum(
if(
min_class_goods_rn = 1 and min_class_id is not null and order_from = 'pcweb',
coalesce(total_price,0),
0
)
)
when grouping(mid_class_id) = 0
then sum(
if(
mid_class_goods_rn = 1 and mid_class_id is not null and order_from = 'pcweb',
coalesce(total_price,0),
0
)
)
when grouping(max_class_id) = 0
then sum(
if(
max_class_goods_rn = 1 and max_class_id is not null and order_from = 'pcweb',
coalesce(total_price,0),
0
)
)
when grouping(dt) = 0
then sum(
if(
order_rn = 1 and dt is not null and order_from = 'pcweb',
coalesce(order_amount,0),
0
)
)
else NULL
end as pcweb_sale_amt,
--- 订单量相关指标:
-- 成交单量:
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null,
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null,
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null,
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null,
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null,
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null,
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null,
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null,
order_id,
NULL
)
)
else NULL
end as order_cnt,
-- 参评单量:
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and evaluation_id is not null,
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and evaluation_id is not null,
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and evaluation_id is not null,
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and evaluation_id is not null,
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and evaluation_id is not null,
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and evaluation_id is not null ,
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and evaluation_id is not null,
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and evaluation_id is not null,
order_id,
NULL
)
)
else NULL
end as eva_order_cnt,
-- 差评单量:
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and evaluation_id is not null and geval_scores < 6,
order_id,
NULL
)
)
else NULL
end as bad_eva_order_cnt,
-- 配送成交单量
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and delievery_id is not null,
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and delievery_id is not null,
order_id,
NULL
)
)
else NULL
end as deliver_order_cnt,
-- 退款成交单量
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and refund_id is not null,
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and refund_id is not null,
order_id,
NULL
)
)
else NULL
end as refund_order_cnt,
-- 小程序成交量
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and order_from = 'miniapp',
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and order_from = 'miniapp',
order_id,
NULL
)
)
else NULL
end as miniapp_order_cnt,
-- 安卓成交量
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and order_from = 'android',
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and order_from = 'android',
order_id,
NULL
)
)
else NULL
end as android_order_cnt,
-- 苹果成交量
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and order_from = 'ios',
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and order_from = 'ios',
order_id,
NULL
)
)
else NULL
end as ios_order_cnt,
-- pc成交量
case
when grouping(store_id) = 0
then count(
if(
order_rn = 1 and store_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(trade_area_id) = 0
then count(
if(
order_rn = 1 and trade_area_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(city_id) = 0
then count(
if(
order_rn = 1 and city_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(brand_id) = 0
then count(
if(
brand_rn = 1 and brand_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(min_class_id) = 0
then count(
if(
min_class_rn = 1 and min_class_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(mid_class_id) = 0
then count(
if(
mid_class_rn = 1 and mid_class_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(max_class_id) = 0
then count(
if(
max_class_rn = 1 and max_class_id is not null and order_from = 'pcweb',
order_id,
NULL
)
)
when grouping(dt) = 0
then count(
if(
order_rn = 1 and dt is not null and order_from = 'pcweb',
order_id,
NULL
)
)
else NULL
end as pcweb_order_cnt,
dt
from t1
group by grouping sets(
dt,
(dt,province_id,province_name,city_id,city_name),
(dt,province_id,province_name,city_id,city_name,trade_area_id,trade_area_name),
(dt,province_id,province_name,city_id,city_name,trade_area_id,trade_area_name,store_id,store_name),
(dt,brand_id,brand_name),
(dt,max_class_id,max_class_name),
(dt,max_class_id,max_class_name,mid_class_id,mid_class_name),
(dt,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name)
)
3.2 商品主题统计宽表
主要指标有:下单次数、下单件数、下单金额、被支付次数、被支付件数、被支付金额、被退款次数、被退款件数、被退款金额、被加入购物车次数、被加入购物车件数、被收藏次数、好评数、中评数、差评数。<br /> 维度有:商品、日期。
- 创建商品主题统计宽表(HIVE执行)
create table bj59_yp_dws_jiale.dws_sku_daycount ( dt STRING, sku_id string comment 'sku_id', sku_name string comment '商品名称', order_count bigint comment '被下单次数', order_num bigint comment '被下单件数', order_amount decimal(38,2) comment '被下单金额', payment_count bigint comment '被支付次数', payment_num bigint comment '被支付件数', payment_amount decimal(38,2) comment '被支付金额', refund_count bigint comment '被退款次数', refund_num bigint comment '被退款件数', refund_amount decimal(38,2) comment '被退款金额', cart_count bigint comment '被加入购物车次数', cart_num bigint comment '被加入购物车件数', favor_count bigint comment '被收藏次数', evaluation_good_count bigint comment '好评数', evaluation_mid_count bigint comment '中评数', evaluation_bad_count bigint comment '差评数' ) COMMENT '每日商品行为' --PARTITIONED BY(dt STRING) ROW format delimited fields terminated BY '\t' stored AS orc tblproperties ('orc.compress' = 'SNAPPY');