问题导读
1.DWS 层有哪些专业术语?
2.系统函数有哪些?
3.nvl 函数基本语法是什么?
4.DWS 层(业务)包含哪些内容?
**
一、数仓搭建 - DWS 层
1.1 业务术语
1)用户
用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android
系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户
2)新增用户
首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用
户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月
新增用户
3)活跃用户
打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计
为一个活跃用户
4)周(月)活跃用户
某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户
5)月活跃率
月活跃用户与截止到该月累计的用户总和之间的比例
6)沉默用户
用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用
户质量和用户与 APP 的匹配程度
7)版本分布
不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之
间的优劣和用户行为习惯
8)本周回流用户
上周未启动过应用,本周启动了应用的用户
9)连续 n 周活跃用户
连续 n 周,每周至少启动一次
10)忠诚用户
连续活跃 5 周以上的用户
11)连续活跃用户
连续 2 周及以上活跃的用户
12)近期流失用户
连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)
13)留存用户
某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分
用户占当时新增用户的比例即是留存率
例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%,二个月后的留存率是 40%,三个月后的留存率是 25%
14)用户新鲜度
每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例
15)单次使用时长
每次启动使用的时间长度
16)日使用时长
累计一天内的使用时间长度
17)启动次数计算标准
IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔
1.2 系统函数
1.2.1 collect_set 函数
1)创建原数据表
1. drop table if exists stud;
2. create table stud (name string, area string, course string, score int);
2)向原数据表中插入数据
1. insert into table stud values('zhang3','bj','math',88);
2. insert into table stud values('li4','bj','math',99);
3. insert into table stud values('wang5','sh','chinese',92);
4. insert into table stud values('zhao6','sh','chinese',54);
5. insert into table stud values('tian7','bj','chinese',91);
3)查询表中数据
1. select * from stud;
2.
3. stud.name stud.area stud.course stud.score
4. zhang3 bj math 88
5. li4 bj math 99
6. wang5 sh chinese 92
7. zhao6 sh chinese 54
8. tian7 bj chinese 91
4)把同一分组的不同行的数据聚合成一个集合
1. select course, collect_set(area), avg(score) from stud group by course;
2.
3. chinese ["sh","bj"] 79.0
4. math ["bj"] 93.5
5) 用下标可以取某一个
1. select course, collect_set(area)[0], avg(score) from
2.
3. stud group by course;
4. chinese sh 79.0
5. math bj 93.5
6.
复制代码
1.2.2 nvl 函数
1)基本语法
NVL(表达式 1,表达式 2)
如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型
1.2.3 日期处理函数
1)date_format 函数(根据格式整理日期)
1. hive (gmall)> select date_format('2020-03-10','yyyy-MM');
2. 2020-03
2)date_add 函数(加减日期)
1. hive (gmall)> select date_add('2020-03-10',-1);
2. 2020-03-09
3. hive (gmall)> select date_add('2020-03-10',1);
4. 2020-03-11
3)next_day 函数
(1)取当前天的下一个周一
1. hive (gmall)> select next_day('2020-03-12','MO');
2. 2020-03-16
3.
4. 说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
(2)取当前周的周一
1. hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);
2. 2020-03-11
4)last_day 函数(求当月最后一天日期)
1. hive (gmall)> select last_day('2020-03-10');
2. 2020-03-31
1.3 DWS 层(用户行为)
1.3.1 每日设备行为
每日设备行为,主要按照 设备 id 统计
1)建表语句
1. drop table if exists dws_uv_detail_daycount;
2. create external table dws_uv_detail_daycount
3. (
4. `mid_id` string COMMENT '设备唯一标识',
5. `user_id` string COMMENT '用户标识',
6. `version_code` string COMMENT '程序版本号',
7. `version_name` string COMMENT '程序版本名',
8. `lang` string COMMENT '系统语言',
9. `source` string COMMENT '渠道号',
10. `os` string COMMENT '安卓系统版本',
11. `area` string COMMENT '区域',
12. `model` string COMMENT '手机型号',
13. `brand` string COMMENT '手机品牌',
14. `sdk_version` string COMMENT 'sdkVersion',
15. `gmail` string COMMENT 'gmail',
16. `height_width` string COMMENT '屏幕宽高',
17. `app_time` string COMMENT '客户端日志产生时的时间',
18. `network` string COMMENT '网络模式',
19. `lng` string COMMENT '经度',
20. `lat` string COMMENT '纬度',
21. `login_count` bigint COMMENT '活跃次数'
22. )
23. partitioned by(dt string)
24. stored as parquet
25. location '/warehouse/gmall/dws/dws_uv_detail_daycount';
2)数据装载
1. insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')
2. select
3. mid_id,
4. concat_ws('|', collect_set(user_id)) user_id,
5. concat_ws('|', collect_set(version_code)) version_code,
6. concat_ws('|', collect_set(version_name)) version_name,
7. concat_ws('|', collect_set(lang))lang,
8. concat_ws('|', collect_set(source)) source,
9. concat_ws('|', collect_set(os)) os,
10. concat_ws('|', collect_set(area)) area,
11. concat_ws('|', collect_set(model)) model,
12. concat_ws('|', collect_set(brand)) brand,
13. concat_ws('|', collect_set(sdk_version)) sdk_version,
14. concat_ws('|', collect_set(gmail)) gmail,
15. concat_ws('|', collect_set(height_width)) height_width,
16. concat_ws('|', collect_set(app_time)) app_time,
17. concat_ws('|', collect_set(network)) network,
18. concat_ws('|', collect_set(lng)) lng,
19. concat_ws('|', collect_set(lat)) lat,
20. count(*) login_count
21. from dwd_start_log
22. where dt='2020-03-10'
23. group by mid_id;
3)查询加载结果
1. select * from dws_uv_detail_daycount where dt='2020-03-10';
1.4 DWS 层(业务)
DWS 层的宽表字段,是站在不同维度的视角去看事实表,重点关注事实表的度量值
1.4.1 每日会员行为
1)建表语句
1. drop table if exists dws_user_action_daycount;
2. create external table dws_user_action_daycount
3. (
4. user_id string comment '用户 id',
5. login_count bigint comment '登录次数',
6. cart_count bigint comment '加入购物车次数',
7. cart_amount double comment '加入购物车金额',
8. order_count bigint comment '下单次数',
9. order_amount decimal(16,2) comment '下单金额',
10. payment_count bigint comment '支付次数',
11. payment_amount decimal(16,2) comment '支付金额'
12. ) COMMENT '每日用户行为'
13. PARTITIONED BY (`dt` string)
14. stored as parquet
15. location '/warehouse/gmall/dws/dws_user_action_daycount/'
16. tblproperties ("parquet.compression"="lzo");
2)数据装载
1. with
2. tmp_login as
3. (
4. select
5. user_id,
6. count(*) login_count
7. from dwd_start_log
8. where dt='2020-03-10'
9. and user_id is not null
10. group by user_id
11. ),
12. tmp_cart as
13. (
14. select
15. user_id,
16. count(*) cart_count,
17. sum(cart_price*sku_num) cart_amount
18. from dwd_fact_cart_info
19. where dt='2020-03-10'
20. and user_id is not null
21. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
22. group by user_id
23. ),
24. tmp_order as
25. (
26. select
27. user_id,
28. count(*) order_count,
29. sum(final_total_amount) order_amount
30. from dwd_fact_order_info
31. where dt='2020-03-10'
32. group by user_id
33. ) ,
34. tmp_payment as
35. (
36. select
37. user_id,
38. count(*) payment_count,
39. sum(payment_amount) payment_amount
40. from dwd_fact_payment_info
41. where dt='2020-03-10'
42. group by user_id
43. )
44. insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')
45. select
46. user_actions.user_id,
47. sum(user_actions.login_count),
48. sum(user_actions.cart_count),
49. sum(user_actions.cart_amount),
50. sum(user_actions.order_count),
51. sum(user_actions.order_amount),
52. sum(user_actions.payment_count),
53. sum(user_actions.payment_amount)
54. from
55. (
56. select
57. user_id,
58. login_count,
59. 0 cart_count,
60. 0 cart_amount,
61. 0 order_count,
62. 0 order_amount,
63. 0 payment_count,
64. 0 payment_amount
65. from
66. tmp_login
67. union all
68. select
69. user_id,
70. 0 login_count,
71. cart_count,
72. cart_amount,
73. 0 order_count,
74. 0 order_amount,
75. 0 payment_count,
76. 0 payment_amount
77. from
78. tmp_cart
79. union all
80. select
81. user_id,
82. 0 login_count,
83. 0 cart_count,
84. 0 cart_amount,
85. order_count,
86. order_amount,
87. 0 payment_count,
88. 0 payment_amount
89. from tmp_order
90. union all
91. select
92. user_id,
93. 0 login_count,
94. 0 cart_count,
95. 0 cart_amount,
96. 0 order_count,
97. 0 order_amount,
98. payment_count,
99. payment_amount
100. from tmp_payment
101. ) user_actions
102. group by user_id;
3)查询加载结果
hive (gmall)> select * from dws_user_action_daycount where dt=‘2020-03-10’;
1.4.2 每日商品行为
1)建表语句
1. drop table if exists dws_sku_action_daycount;
2. create external table dws_sku_action_daycount
3. (
4. sku_id string comment 'sku_id',
5. order_count bigint comment '被下单次数',
6. order_num bigint comment '被下单件数',
7. order_amount decimal(16,2) comment '被下单金额',
8. payment_count bigint comment '被支付次数',
9. payment_num bigint comment '被支付件数',
10. payment_amount decimal(16,2) comment '被支付金额',
11. refund_count bigint comment '被退款次数',
12. refund_num bigint comment '被退款件数',
13. refund_amount decimal(16,2) comment '被退款金额',
14. cart_count bigint comment '被加入购物车次数',
15. cart_num bigint comment '被加入购物车件数',
16. favor_count bigint comment '被收藏次数',
17. appraise_good_count bigint comment '好评数',
18. appraise_mid_count bigint comment '中评数',
19. appraise_bad_count bigint comment '差评数',
20. appraise_default_count bigint comment '默认评价数'
21. ) COMMENT '每日商品行为'
22. PARTITIONED BY (`dt` string)
23. stored as parquet
24. location '/warehouse/gmall/dws/dws_sku_action_daycount/'
25. tblproperties ("parquet.compression"="lzo");
26.
复制代码
2)数据装载
注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单
1. with
2. tmp_order as
3. (
4. select
5. sku_id,
6. count(*) order_count,
7. sum(sku_num) order_num,
8. sum(total_amount) order_amount
9. from dwd_fact_order_detail
10. where dt='2020-03-10'
11. group by sku_id
12. ),
13. tmp_payment as
14. (
15. select
16. sku_id,
17. count(*) payment_count,
18. sum(sku_num) payment_num,
19. sum(total_amount) payment_amount
20. from dwd_fact_order_detail
21. where dt='2020-03-10'
22. and order_id in
23. (
24. select
25. id
26. from dwd_fact_order_info
27. where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))
28. and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'
29. )
30. group by sku_id
31. ),
32. tmp_refund as
33. (
34. select
35. sku_id,
36. count(*) refund_count,
37. sum(refund_num) refund_num,
38. sum(refund_amount) refund_amount
39. from dwd_fact_order_refund_info
40. where dt='2020-03-10'
41. group by sku_id
42. ),
43. tmp_cart as
44. (
45. select
46. sku_id,
47. count(*) cart_count,
48. sum(sku_num) cart_num
49. from dwd_fact_cart_info
50. where dt='2020-03-10'
51. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
52. group by sku_id
53. ),
54. tmp_favor as
55. (
56. select
57. sku_id,
58. count(*) favor_count
59. from dwd_fact_favor_info
60. where dt='2020-03-10'
61. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
62. group by sku_id
63. ),
64. tmp_appraise as
65. (
66. select
67. sku_id,
68. sum(if(appraise='1201',1,0)) appraise_good_count,
69. sum(if(appraise='1202',1,0)) appraise_mid_count,
70. sum(if(appraise='1203',1,0)) appraise_bad_count,
71. sum(if(appraise='1204',1,0)) appraise_default_count
72. from dwd_fact_comment_info
73. where dt='2020-03-10'
74. group by sku_id
75. )
76. insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')
77. select
78. sku_id,
79. sum(order_count),
80. sum(order_num),
81. sum(order_amount),
82. sum(payment_count),
83. sum(payment_num),
84. sum(payment_amount),
85. sum(refund_count),
86. sum(refund_num),
87. sum(refund_amount),
88. sum(cart_count),
89. sum(cart_num),
90. sum(favor_count),
91. sum(appraise_good_count),
92. sum(appraise_mid_count),
93. sum(appraise_bad_count),
94. sum(appraise_default_count)
95. from
96. (
97. select
98. sku_id,
99. order_count,
100. order_num,
101. order_amount,
102. 0 payment_count,
103. 0 payment_num,
104. 0 payment_amount,
105. 0 refund_count,
106. 0 refund_num,
107. 0 refund_amount,
108. 0 cart_count,
109. 0 cart_num,
110. 0 favor_count,
111. 0 appraise_good_count,
112. 0 appraise_mid_count,
113. 0 appraise_bad_count,
114. 0 appraise_default_count
115. from tmp_order
116. union all
117. select
118. sku_id,
119. 0 order_count,
120. 0 order_num,
121. 0 order_amount,
122. payment_count,
123. payment_num,
124. payment_amount,
125. 0 refund_count,
126. 0 refund_num,
127. 0 refund_amount,
128. 0 cart_count,
129. 0 cart_num,
130. 0 favor_count,
131. 0 appraise_good_count,
132. 0 appraise_mid_count,
133. 0 appraise_bad_count,
134. 0 appraise_default_count
135. from tmp_payment
136. union all
137. select
138. sku_id,
139. 0 order_count,
140. 0 order_num,
141. 0 order_amount,
142. 0 payment_count,
143. 0 payment_num,
144. 0 payment_amount,
145. refund_count,
146. refund_num,
147. refund_amount,
148. 0 cart_count,
149. 0 cart_num,
150. 0 favor_count,
151. 0 appraise_good_count,
152. 0 appraise_mid_count,
153. 0 appraise_bad_count,
154. 0 appraise_default_count
155. from tmp_refund
156. union all
157. select
158. sku_id,
159. 0 order_count,
160. 0 order_num,
161. 0 order_amount,
162. 0 payment_count,
163. 0 payment_num,
164. 0 payment_amount,
165. 0 refund_count,
166. 0 refund_num,
167. 0 refund_amount,
168. cart_count,
169. cart_num,
170. 0 favor_count,
171. 0 appraise_good_count,
172. 0 appraise_mid_count,
173. 0 appraise_bad_count,
174. 0 appraise_default_count
175. from tmp_cart
176. union all
177. select
178. sku_id,
179. 0 order_count,
180. 0 order_num,
181. 0 order_amount,
182. 0 payment_count,
183. 0 payment_num,
184. 0 payment_amount,
185. 0 refund_count,
186. 0 refund_num,
187. 0 refund_amount,
188. 0 cart_count,
189. 0 cart_num,
190. favor_count,
191. 0 appraise_good_count,
192. 0 appraise_mid_count,
193. 0 appraise_bad_count,
194. 0 appraise_default_count
195. from tmp_favor
196. union all
197. select
198. sku_id,
199. 0 order_count,
200. 0 order_num,
201. 0 order_amount,
202. 0 payment_count,
203. 0 payment_num,
204. 0 payment_amount,
205. 0 refund_count,
206. 0 refund_num,
207. 0 refund_amount,
208. 0 cart_count,
209. 0 cart_num,
210. 0 favor_count,
211. appraise_good_count,
212. appraise_mid_count,
213. appraise_bad_count,
214. appraise_default_count
215. from tmp_appraise
216. )tmp
217. group by sku_id;
218.
复制代码
3)查询加载结果
1. select * from dws_sku_action_daycount where dt='2020-03-10';
1.4.5 每日购买行为
1)建表语句
1. drop table if exists dws_sale_detail_daycount;
2. create external table dws_sale_detail_daycount
3. (
4. user_id string comment '用户 id',
5. sku_id string comment '商品 id',
6. user_gender string comment '用户性别',
7. user_age string comment '用户年龄',
8. user_level string comment '用户等级',
9. order_price decimal(10,2) comment '商品价格',
10. sku_name string comment '商品名称',
11. sku_tm_id string comment '品牌 id',
12. sku_category3_id string comment '商品三级品类 id',
13. sku_category2_id string comment '商品二级品类 id',
14. sku_category1_id string comment '商品一级品类 id',
15. sku_category3_name string comment '商品三级品类名称',
16. sku_category2_name string comment '商品二级品类名称',
17. sku_category1_name string comment '商品一级品类名称',
18. spu_id string comment '商品 spu',
19. sku_num int comment '购买个数',
20. order_count bigint comment '当日下单单数',
21. order_amount decimal(16,2) comment '当日下单金额'
22. ) COMMENT '每日购买行为'
23. PARTITIONED BY (`dt` string)
24. stored as parquet
25. location '/warehouse/gmall/dws/dws_sale_detail_daycount/'
26. tblproperties ("parquet.compression"="lzo");
2)数据装载
1. insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')
2. select
3. op.user_id,
4. op.sku_id,
5. ui.gender,
6. months_between('2020-03-10', ui.birthday)/12 age,
7. ui.user_level,
8. si.price,
9. si.sku_name,
10. si.tm_id,
11. si.category3_id,
12. si.category2_id,
13. si.category1_id,
14. si.category3_name,
15. si.category2_name,
16. si.category1_name,
17. si.spu_id,
18. op.sku_num,
19. op.order_count,
20. op.order_amount
21. from
22. (
23. select
24. user_id,
25. sku_id,
26. sum(sku_num) sku_num,
27. count(*) order_count,
28. sum(total_amount) order_amount
29. from dwd_fact_order_detail
30. where dt='2020-03-10'
31. group by user_id, sku_id
32. )op
33. join
34. (
35. select
36. *
37. from dwd_dim_user_info_his
38. where end_date='9999-99-99'
39. )ui on op.user_id = ui.id
40. join
41. (
42. select
43. *
44. from dwd_dim_sku_info
45. where dt='2020-03-10'
46. )si on op.sku_id = si.id;
3)查询加载结果
1. select * from dws_sale_detail_daycount where dt='2020-03-10';
2.
复制代码
1.5 DWS 层数据导入脚本
1)vim dwd_to_dws.sh
在脚本中填写如下内容
1. #!/bin/bash
2. APP=gmall
3. hive=/opt/modules/hive/bin/hive
4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
5. if [ -n "$1" ] ;then
6. do_date=$1
7. else
8. do_date=`date -d "-1 day" +%F`
9. fi
10. sql="
11. insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
12. select
13. mid_id,
14. concat_ws('|', collect_set(user_id)) user_id,
15. concat_ws('|', collect_set(version_code)) version_code,
16. concat_ws('|', collect_set(version_name)) version_name,
17. concat_ws('|', collect_set(lang))lang,
18. concat_ws('|', collect_set(source)) source,
19. concat_ws('|', collect_set(os)) os,
20. concat_ws('|', collect_set(area)) area,
21. concat_ws('|', collect_set(model)) model,
22. concat_ws('|', collect_set(brand)) brand,
23. concat_ws('|', collect_set(sdk_version)) sdk_version,
24. concat_ws('|', collect_set(gmail)) gmail,
25. concat_ws('|', collect_set(height_width)) height_width,
26. concat_ws('|', collect_set(app_time)) app_time,
27. concat_ws('|', collect_set(network)) network,
28. concat_ws('|', collect_set(lng)) lng,
29. concat_ws('|', collect_set(lat)) lat,
30. count(*) login_count
31. from ${APP}.dwd_start_log
32. where dt='$do_date'
33. group by mid_id;
34. with
35. tmp_login as
36. (
37. select
38. user_id,
39. count(*) login_count
40. from ${APP}.dwd_start_log
41. where dt='$do_date'
42. and user_id is not null
43. group by user_id
44. ),
45. tmp_cart as
46. (
47. select
48. user_id,
49. count(*) cart_count,
50. sum(cart_price*sku_num) cart_amount
51. from ${APP}.dwd_fact_cart_info
52. where dt='$do_date'
53. and user_id is not null
54. and date_format(create_time,'yyyy-MM-dd')='$do_date'
55. group by user_id
56. ),
57. tmp_order as
58. (
59. select
60. user_id,
61. count(*) order_count,
62. sum(final_total_amount) order_amount
63. from ${APP}.dwd_fact_order_info
64. where dt='$do_date'
65. group by user_id
66. ) ,
67. tmp_payment as
68. (
69. select
70. user_id,
71. count(*) payment_count,
72. sum(payment_amount) payment_amount
73. from ${APP}.dwd_fact_payment_info
74. where dt='$do_date'
75. group by user_id
76. )
77.
78.
79. insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
80. select
81. user_actions.user_id,
82. sum(user_actions.login_count),
83. sum(user_actions.cart_count),
84. sum(user_actions.cart_amount),
85. sum(user_actions.order_count),
86. sum(user_actions.order_amount),
87. sum(user_actions.payment_count),
88. sum(user_actions.payment_amount)
89. from
90. (
91. select
92. user_id,
93. login_count,
94. 0 cart_count,
95. 0 cart_amount,
96. 0 order_count,
97. 0 order_amount,
98. 0 payment_count,
99. 0 payment_amount
100. from
101. tmp_login
102. union all
103. select
104. user_id,
105. 0 login_count,
106. cart_count,
107. cart_amount,
108. 0 order_count,
109. 0 order_amount,
110. 0 payment_count,
111. 0 payment_amount
112. from
113. tmp_cart
114. union all
115. select
116. user_id,
117. 0 login_count,
118. 0 cart_count,
119. 0 cart_amount,
120. order_count,
121. order_amount,
122. 0 payment_count,
123. 0 payment_amount
124. from tmp_order
125. union all
126. select
127. user_id,
128. 0 login_count,
129. 0 cart_count,
130. 0 cart_amount,
131. 0 order_count,
132. 0 order_amount,
133. payment_count,
134. payment_amount
135. from tmp_payment
136. ) user_actions
137. group by user_id;
138. with
139. tmp_order as
140. (
141. select
142. sku_id,
143. count(*) order_count,
144. sum(sku_num) order_num,
145. sum(total_amount) order_amount
146. from ${APP}.dwd_fact_order_detail
147. where dt='$do_date'
148. group by sku_id
149. ),
150. tmp_payment as
151. (
152. select
153. sku_id,
154. count(*) payment_count,
155. sum(sku_num) payment_num,
156. sum(total_amount) payment_amount
157. from ${APP}.dwd_fact_order_detail
158. where dt='$do_date'
159. and order_id in
160. (
161. select
162. id
163. from ${APP}.dwd_fact_order_info
164. where (dt='$do_date' or dt=date_add('$do_date',-1))
165. and date_format(payment_time,'yyyy-MM-dd')='$do_date'
166. )
167. group by sku_id
168. ),
169. tmp_refund as
170. (
171. select
172. sku_id,
173. count(*) refund_count,
174. sum(refund_num) refund_num,
175. sum(refund_amount) refund_amount
176. from ${APP}.dwd_fact_order_refund_info
177. where dt='$do_date'
178. group by sku_id
179. ),
180. tmp_cart as
181. (
182. select
183. sku_id,
184. count(*) cart_count,
185. sum(sku_num) cart_num
186. from ${APP}.dwd_fact_cart_info
187. where dt='$do_date'
188. and date_format(create_time,'yyyy-MM-dd')='$do_date'
189. group by sku_id
190. ),
191. tmp_favor as
192. (
193. select
194. sku_id,
195. count(*) favor_count
196. from ${APP}.dwd_fact_favor_info
197. where dt='$do_date'
198. and date_format(create_time,'yyyy-MM-dd')='$do_date'
199. group by sku_id
200. ),
201. tmp_appraise as
202. (
203. select
204. sku_id,
205. sum(if(appraise='1201',1,0)) appraise_good_count,
206. sum(if(appraise='1202',1,0)) appraise_mid_count,
207. sum(if(appraise='1203',1,0)) appraise_bad_count,
208. sum(if(appraise='1204',1,0)) appraise_default_count
209. from ${APP}.dwd_fact_comment_info
210. where dt='$do_date'
211. group by sku_id
212. )
213.
214.
215. insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
216. select
217. sku_id,
218. sum(order_count),
219. sum(order_num),
220. sum(order_amount),
221. sum(payment_count),
222. sum(payment_num),
223. sum(payment_amount),
224. sum(refund_count),
225. sum(refund_num),
226. sum(refund_amount),
227. sum(cart_count),
228. sum(cart_num),
229. sum(favor_count),
230. sum(appraise_good_count),
231. sum(appraise_mid_count),
232. sum(appraise_bad_count),
233. sum(appraise_default_count)
234. from
235. (
236. select
237. sku_id,
238. order_count,
239. order_num,
240. order_amount,
241. 0 payment_count,
242. 0 payment_num,
243. 0 payment_amount,
244. 0 refund_count,
245. 0 refund_num,
246. 0 refund_amount,
247. 0 cart_count,
248. 0 cart_num,
249. 0 favor_count,
250. 0 appraise_good_count,
251. 0 appraise_mid_count,
252. 0 appraise_bad_count,
253. 0 appraise_default_count
254. from tmp_order
255. union all
256. select
257. sku_id,
258. 0 order_count,
259. 0 order_num,
260. 0 order_amount,
261. payment_count,
262. payment_num,
263. payment_amount,
264. 0 refund_count,
265. 0 refund_num,
266. 0 refund_amount,
267. 0 cart_count,
268. 0 cart_num,
269. 0 favor_count,
270. 0 appraise_good_count,
271. 0 appraise_mid_count,
272. 0 appraise_bad_count,
273. 0 appraise_default_count
274. from tmp_payment
275. union all
276. select
277. sku_id,
278. 0 order_count,
279. 0 order_num,
280. 0 order_amount,
281. 0 payment_count,
282. 0 payment_num,
283. 0 payment_amount,
284. refund_count,
285. refund_num,
286. refund_amount,
287. 0 cart_count,
288. 0 cart_num,
289. 0 favor_count,
290. 0 appraise_good_count,
291. 0 appraise_mid_count,
292. 0 appraise_bad_count,
293. 0 appraise_default_count
294. from tmp_refund
295. union all
296. select
297. sku_id,
298. 0 order_count,
299. 0 order_num,
300. 0 order_amount,
301. 0 payment_count,
302. 0 payment_num,
303. 0 payment_amount,
304. 0 refund_count,
305. 0 refund_num,
306. 0 refund_amount,
307. cart_count,
308. cart_num,
309. 0 favor_count,
310. 0 appraise_good_count,
311. 0 appraise_mid_count,
312. 0 appraise_bad_count,
313. 0 appraise_default_count
314. from tmp_cart
315. union all
316. select
317. sku_id,
318. 0 order_count,
319. 0 order_num,
320. 0 order_amount,
321. 0 payment_count,
322. 0 payment_num,
323. 0 payment_amount,
324. 0 refund_count,
325. 0 refund_num,
326. 0 refund_amount,
327. 0 cart_count,
328. 0 cart_num,
329. favor_count,
330. 0 appraise_good_count,
331. 0 appraise_mid_count,
332. 0 appraise_bad_count,
333. 0 appraise_default_count
334. from tmp_favor
335. union all
336. select
337. sku_id,
338. 0 order_count,
339. 0 order_num,
340. 0 order_amount,
341. 0 payment_count,
342. 0 payment_num,
343. 0 payment_amount,
344. 0 refund_count,
345. 0 refund_num,
346. 0 refund_amount,
347. 0 cart_count,
348. 0 cart_num,
349. 0 favor_count,
350. appraise_good_count,
351. appraise_mid_count,
352. appraise_bad_count,
353. appraise_default_count
354. from tmp_appraise
355. )tmp
356. group by sku_id;
357.
358.
359. insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
360. select
361. op.user_id,
362. op.sku_id,
363. ui.gender,
364. months_between('$do_date', ui.birthday)/12 age,
365. ui.user_level,
366. si.price,
367. si.sku_name,
368. si.tm_id,
369. si.category3_id,
370. si.category2_id,
371. si.category1_id,
372. si.category3_name,
373. si.category2_name,
374. si.category1_name,
375. si.spu_id,
376. op.sku_num,
377. op.order_count,
378. op.order_amount
379. from
380. (
381. select
382. user_id,
383. sku_id,
384. sum(sku_num) sku_num,
385. count(*) order_count,
386. sum(total_amount) order_amount
387. from ${APP}.dwd_fact_order_detail
388. where dt='$do_date'
389. group by user_id, sku_id
390. )op
391. join
392. (
393. select
394. *
395. from ${APP}.dwd_dim_user_info_his
396. where end_date='9999-99-99'
397. )ui on op.user_id = ui.id
398. join
399. (
400. select
401. *
402. from ${APP}.dwd_dim_sku_info
403. where dt='$do_date'
404. )si on op.sku_id = si.id;
405. "
406. $hive -e "$sql"
407.
复制代码
2)增加脚本执行权限
1. chmod 770 dwd_to_dws.sh
3)执行脚本导入数据
1. dwd_to_dws.sh 2020-03-11
4)查看导入数据
1. select * from dws_uv_detail_daycount where dt='2020-03-11';
2. select * from dws_user_action_daycount where dt='2020-03-11';
3. select * from dws_sku_action_daycount where dt='2020-03-11';
4. select * from dws_sale_detail_daycount where dt='2020-03-11';
5.
复制代码
二、数仓搭建-DWT 层
2.1 设备主题宽表
1)建表语句
1. drop table if exists dwt_uv_topic;
2. create external table dwt_uv_topic
3. (
4. `mid_id` string COMMENT '设备唯一标识',
5. `user_id` string COMMENT '用户标识',
6. `version_code` string COMMENT '程序版本号',
7. `version_name` string COMMENT '程序版本名',
8. `lang` string COMMENT '系统语言',
9. `source` string COMMENT '渠道号',
10. `os` string COMMENT '安卓系统版本',
11. `area` string COMMENT '区域',
12. `model` string COMMENT '手机型号',
13. `brand` string COMMENT '手机品牌',
14. `sdk_version` string COMMENT 'sdkVersion',
15. `gmail` string COMMENT 'gmail',
16. `height_width` string COMMENT '屏幕宽高',
17. `app_time` string COMMENT '客户端日志产生时的时间',
18. `network` string COMMENT '网络模式',
19. `lng` string COMMENT '经度',
20. `lat` string COMMENT '纬度',
21. `login_date_first` string comment '首次活跃时间',
22. `login_date_last` string comment '末次活跃时间',
23. `login_day_count` bigint comment '当日活跃次数',
24. `login_count` bigint comment '累积活跃天数'
25. )
26. stored as parquet
27. location '/warehouse/gmall/dwt/dwt_uv_topic';
2)数据装载
1. insert overwrite table dwt_uv_topic
2. select
3. nvl(new.mid_id,old.mid_id),
4. nvl(new.user_id,old.user_id),
5. nvl(new.version_code,old.version_code),
6. nvl(new.version_name,old.version_name),
7. nvl(new.lang,old.lang),
8. nvl(new.source,old.source),
9. nvl(new.os,old.os),
10. nvl(new.area,old.area),
11. nvl(new.model,old.model),
12. nvl(new.brand,old.brand),
13. nvl(new.sdk_version,old.sdk_version),
14. nvl(new.gmail,old.gmail),
15. nvl(new.height_width,old.height_width),
16. nvl(new.app_time,old.app_time),
17. nvl(new.network,old.network),
18. nvl(new.lng,old.lng),
19. nvl(new.lat,old.lat),
20. if(old.mid_id is null,'2020-03-10',old.login_date_first),
21. if(new.mid_id is not null,'2020-03-10',old.login_date_last),
22. if(new.mid_id is not null, new.login_count,0),
23. nvl(old.login_count,0)+if(new.login_count>0,1,0)
24. from
25. (
26. select
27. *
28. from dwt_uv_topic
29. )old
30. full outer join
31. (
32. select
33. *
34. from dws_uv_detail_daycount
35. where dt='2020-03-10'
36. )new
37. on old.mid_id=new.mid_id;
38.
复制代码
3)查询加载结果
1. select * from dwt_uv_topic limit 5;
2.
复制代码
2.2 会员主题宽表
宽表字段怎么来?维度关联的事实表度量值+开头、结尾+累积+累积一个时间段
1)建表语句
1. drop table if exists dwt_user_topic;
2. create external table dwt_user_topic
3. (
4. user_id string comment '用户 id',
5. login_date_first string comment '首次登录时间',
6. login_date_last string comment '末次登录时间',
7. login_count bigint comment '累积登录天数',
8. login_last_30d_count bigint comment '最近 30 日登录天数',
9. order_date_first string comment '首次下单时间',
10. order_date_last string comment '末次下单时间',
11. order_count bigint comment '累积下单次数',
12. order_amount decimal(16,2) comment '累积下单金额',
13. order_last_30d_count bigint comment '最近 30 日下单次数',
14. order_last_30d_amount bigint comment '最近 30 日下单金额',
15. payment_date_first string comment '首次支付时间',
16. payment_date_last string comment '末次支付时间',
17. payment_count decimal(16,2) comment '累积支付次数',
18. payment_amount decimal(16,2) comment '累积支付金额',
19. payment_last_30d_count decimal(16,2) comment '最近 30 日支付次数',
20. payment_last_30d_amount decimal(16,2) comment '最近 30 日支付金额'
21. )COMMENT '用户主题宽表'
22. stored as parquet
23. location '/warehouse/gmall/dwt/dwt_user_topic/'
24. tblproperties ("parquet.compression"="lzo");
2)数据装载
1. insert overwrite table dwt_user_topic
2. select
3. nvl(new.user_id,old.user_id),
4. if(old.login_date_first is null and
5. new.login_count>0,'2020-03-10',old.login_date_first),
6. if(new.login_count>0,'2020-03-10',old.login_date_last),
7. nvl(old.login_count,0)+if(new.login_count>0,1,0),
8. nvl(new.login_last_30d_count,0),
9. if(old.order_date_first is null and
10. new.order_count>0,'2020-03-10',old.order_date_first),
11. if(new.order_count>0,'2020-03-10',old.order_date_last),
12. nvl(old.order_count,0)+nvl(new.order_count,0),
13. nvl(old.order_amount,0)+nvl(new.order_amount,0),
14. nvl(new.order_last_30d_count,0),
15. nvl(new.order_last_30d_amount,0),
16. if(old.payment_date_first is null and
17. new.payment_count>0,'2020-03-10',old.payment_date_first),
18. if(new.payment_count>0,'2020-03-10',old.payment_date_last),
19. nvl(old.payment_count,0)+nvl(new.payment_count,0),
20. nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
21. nvl(new.payment_last_30d_count,0),
22. nvl(new.payment_last_30d_amount,0)
23. from
24. dwt_user_topic old
25. full outer join
26. (
27. select
28. user_id,
29. sum(if(dt='2020-03-10',login_count,0)) login_count,
30. sum(if(dt='2020-03-10',order_count,0)) order_count,
31. sum(if(dt='2020-03-10',order_amount,0)) order_amount,
32. sum(if(dt='2020-03-10',payment_count,0)) payment_count,
33. sum(if(dt='2020-03-10',payment_amount,0)) payment_amount,
34. sum(if(login_count>0,1,0)) login_last_30d_count,
35. sum(order_count) order_last_30d_count,
36. sum(order_amount) order_last_30d_amount,
37. sum(payment_count) payment_last_30d_count,
38. sum(payment_amount) payment_last_30d_amount
39. from dws_user_action_daycount
40. where dt>=date_add( '2020-03-10',-30)
41. group by user_id
42. )new
43. on old.user_id=new.user_id;
3)查询加载结果
1. select * from dwt_user_topic limit 5;
2.
复制代码
2.3 商品主题宽表
1)建表语句
1. create external table dwt_sku_topic
2. (
3. sku_id string comment 'sku_id',
4. spu_id string comment 'spu_id',
5. order_last_30d_count bigint comment '最近 30 日被下单次数',
6. order_last_30d_num bigint comment '最近 30 日被下单件数',
7. order_last_30d_amount decimal(16,2) comment '最近 30 日被下单金额',
8. order_count bigint comment '累积被下单次数',
9. order_num bigint comment '累积被下单件数',
10. order_amount decimal(16,2) comment '累积被下单金额',
11. payment_last_30d_count bigint comment '最近 30 日被支付次数',
12. payment_last_30d_num bigint comment '最近 30 日被支付件数',
13. payment_last_30d_amount decimal(16,2) comment '最近 30 日被支付金额',
14. payment_count bigint comment '累积被支付次数',
15. payment_num bigint comment '累积被支付件数',
16. payment_amount decimal(16,2) comment '累积被支付金额',
17. refund_last_30d_count bigint comment '最近三十日退款次数',
18. refund_last_30d_num bigint comment '最近三十日退款件数',
19. refund_last_30d_amount decimal(10,2) comment '最近三十日退款金额',
20. refund_count bigint comment '累积退款次数',
21. refund_num bigint comment '累积退款件数',
22. refund_amount decimal(10,2) comment '累积退款金额',
23. cart_last_30d_count bigint comment '最近 30 日被加入购物车次数',
24. cart_last_30d_num bigint comment '最近 30 日被加入购物车件数',
25. cart_count bigint comment '累积被加入购物车次数',
26. cart_num bigint comment '累积被加入购物车件数',
27. favor_last_30d_count bigint comment '最近 30 日被收藏次数',
28. favor_count bigint comment '累积被收藏次数',
29. appraise_last_30d_good_count bigint comment '最近 30 日好评数',
30. appraise_last_30d_mid_count bigint comment '最近 30 日中评数',
31. appraise_last_30d_bad_count bigint comment '最近 30 日差评数',
32. appraise_last_30d_default_count bigint comment '最近 30 日默认评价数',
33. appraise_good_count bigint comment '累积好评数',
34. appraise_mid_count bigint comment '累积中评数',
35. appraise_bad_count bigint comment '累积差评数',
36. appraise_default_count bigint comment '累积默认评价数'
37. )COMMENT '商品主题宽表'
38. stored as parquet
39. location '/warehouse/gmall/dwt/dwt_sku_topic/'
40. tblproperties ("parquet.compression"="lzo");
2)数据装载
1. insert overwrite table dwt_sku_topic
2. select
3. nvl(new.sku_id,old.sku_id), sku_info.spu_id,
4. nvl(new.order_count30,0),
5. nvl(new.order_num30,0),
6. nvl(new.order_amount30,0),
7. nvl(old.order_count,0) + nvl(new.order_count,0),
8. nvl(old.order_num,0) + nvl(new.order_num,0),
9. nvl(old.order_amount,0) + nvl(new.order_amount,0),
10. nvl(new.payment_count30,0),
11. nvl(new.payment_num30,0),
12. nvl(new.payment_amount30,0),
13. nvl(old.payment_count,0) + nvl(new.payment_count,0),
14. nvl(old.payment_num,0) + nvl(new.payment_count,0),
15. nvl(old.payment_amount,0) + nvl(new.payment_count,0),
16. nvl(new.refund_count30,0),
17. nvl(new.refund_num30,0),
18. nvl(new.refund_amount30,0),
19. nvl(old.refund_count,0) + nvl(new.refund_count,0),
20. nvl(old.refund_num,0) + nvl(new.refund_num,0),
21. nvl(old.refund_amount,0) + nvl(new.refund_amount,0),
22. nvl(new.cart_count30,0),
23. nvl(new.cart_num30,0),
24. nvl(old.cart_count,0) + nvl(new.cart_count,0),
25. nvl(old.cart_num,0) + nvl(new.cart_num,0),
26. nvl(new.favor_count30,0),
27. nvl(old.favor_count,0) + nvl(new.favor_count,0),
28. nvl(new.appraise_good_count30,0),
29. nvl(new.appraise_mid_count30,0),
30. nvl(new.appraise_bad_count30,0),
31. nvl(new.appraise_default_count30,0) ,
32. nvl(old.appraise_good_count,0) + nvl(new.appraise_good_count,0),
33. nvl(old.appraise_mid_count,0) + nvl(new.appraise_mid_count,0),
34. nvl(old.appraise_bad_count,0) + nvl(new.appraise_bad_count,0),
35. nvl(old.appraise_default_count,0) + nvl(new.appraise_default_count,0)
36. from
37. (
38. select
39. sku_id,
40. spu_id,
41. order_last_30d_count,
42. order_last_30d_num,
43. order_last_30d_amount,
44. order_count,
45. order_num,
46. order_amount ,
47. payment_last_30d_count,
48. payment_last_30d_num,
49. payment_last_30d_amount,
50. payment_count,
51. payment_num,
52. payment_amount,
53. refund_last_30d_count,
54. refund_last_30d_num,
55. refund_last_30d_amount,
56. refund_count,
57. refund_num,
58. refund_amount,
59. cart_last_30d_count,
60. cart_last_30d_num,
61. cart_count,
62. cart_num,
63. favor_last_30d_count,
64. favor_count,
65. appraise_last_30d_good_count,
66. appraise_last_30d_mid_count,
67. appraise_last_30d_bad_count,
68. appraise_last_30d_default_count,
69. appraise_good_count,
70. appraise_mid_count,
71. appraise_bad_count,
72. appraise_default_count
73. from dwt_sku_topic
74. )old
75. full outer join
76. (
77. select
78. sku_id,
79. sum(if(dt='2020-03-10', order_count,0 )) order_count,
80. sum(if(dt='2020-03-10',order_num ,0 )) order_num,
81. sum(if(dt='2020-03-10',order_amount,0 )) order_amount ,
82. sum(if(dt='2020-03-10',payment_count,0 )) payment_count,
83. sum(if(dt='2020-03-10',payment_num,0 )) payment_num,
84. sum(if(dt='2020-03-10',payment_amount,0 )) payment_amount,
85. sum(if(dt='2020-03-10',refund_count,0 )) refund_count,
86. sum(if(dt='2020-03-10',refund_num,0 )) refund_num,
87. sum(if(dt='2020-03-10',refund_amount,0 )) refund_amount,
88. sum(if(dt='2020-03-10',cart_count,0 )) cart_count,
89. sum(if(dt='2020-03-10',cart_num,0 )) cart_num,
90. sum(if(dt='2020-03-10',favor_count,0 )) favor_count,
91. sum(if(dt='2020-03-10',appraise_good_count,0 )) appraise_good_count,
92. sum(if(dt='2020-03-10',appraise_mid_count,0 ) ) appraise_mid_count ,
93. sum(if(dt='2020-03-10',appraise_bad_count,0 )) appraise_bad_count,
94. sum(if(dt='2020-03-10',appraise_default_count,0 )) appraise_default_count,
95. sum(order_count) order_count30 ,
96. sum(order_num) order_num30,
97. sum(order_amount) order_amount30,
98. sum(payment_count) payment_count30,
99. sum(payment_num) payment_num30,
100. sum(payment_amount) payment_amount30,
101. sum(refund_count) refund_count30,
102. sum(refund_num) refund_num30,
103. sum(refund_amount) refund_amount30,
104. sum(cart_count) cart_count30,
105. sum(cart_num) cart_num30,
106. sum(favor_count) favor_count30,
107. sum(appraise_good_count) appraise_good_count30,
108. sum(appraise_mid_count) appraise_mid_count30,
109. sum(appraise_bad_count) appraise_bad_count30,
110. sum(appraise_default_count) appraise_default_count30
111. from dws_sku_action_daycount
112. where dt >= date_add ('2020-03-10', -30)
113. group by sku_id
114. )new
115. on new.sku_id = old.sku_id
116. left join
117. (select * from dwd_dim_sku_info where dt='2020-03-10') sku_info
118. on nvl(new.sku_id,old.sku_id)= sku_info.id;
119.
复制代码
3)查询加载结果
1. select * from dwt_sku_topic limit 5;
2.
复制代码
2.4 DWT 层数据导入脚本
1)vim dws_to_dwt.sh
在脚本中填写如下内容
1. #!/bin/bash
2. APP=gmall
3. hive=/opt/modules/hive/bin/hive
4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
5. if [ -n "$1" ] ;then
6. do_date=$1
7. else
8. do_date=`date -d "-1 day" +%F`
9. fi
10. sql="
11. insert overwrite table ${APP}.dwt_uv_topic
12. select
13. nvl(new.mid_id,old.mid_id),
14. nvl(new.user_id,old.user_id),
15. nvl(new.version_code,old.version_code),
16. nvl(new.version_name,old.version_name),
17. nvl(new.lang,old.lang),
18. nvl(new.source,old.source),
19. nvl(new.os,old.os),
20. nvl(new.area,old.area),
21. nvl(new.model,old.model),
22. nvl(new.brand,old.brand),
23. nvl(new.sdk_version,old.sdk_version),
24. nvl(new.gmail,old.gmail),
25. nvl(new.height_width,old.height_width),
26. nvl(new.app_time,old.app_time),
27. nvl(new.network,old.network),
28. nvl(new.lng,old.lng),
29. nvl(new.lat,old.lat),
30. nvl(old.login_date_first,'$do_date'),
31. if(new.login_count>0,'$do_date',old.login_date_last),
32. nvl(new.login_count,0),
33. nvl(new.login_count,0)+nvl(old.login_count,0)
34. from
35. (
36. select
37. *
38. from ${APP}.dwt_uv_topic
39. )old
40. full outer join
41. (
42. select
43. *
44. from ${APP}.dws_uv_detail_daycount
45. where dt='$do_date'
46. )new
47. on old.mid_id=new.mid_id;
48.
49.
50. insert overwrite table ${APP}.dwt_user_topic
51. select
52. nvl(new.user_id,old.user_id),
53. if(old.login_date_first is null and
54. new.login_count>0,'$do_date',old.login_date_first),
55. if(new.login_count>0,'$do_date',old.login_date_last),
56. nvl(old.login_count,0)+if(new.login_count>0,1,0),
57. nvl(new.login_last_30d_count,0),
58. if(old.order_date_first is null and
59. new.order_count>0,'$do_date',old.order_date_first),
60. if(new.order_count>0,'$do_date',old.order_date_last),
61. nvl(old.order_count,0)+nvl(new.order_count,0),
62. nvl(old.order_amount,0)+nvl(new.order_amount,0),
63. nvl(new.order_last_30d_count,0),
64. nvl(new.order_last_30d_amount,0),
65. if(old.payment_date_first is null and
66. new.payment_count>0,'$do_date',old.payment_date_first),
67. if(new.payment_count>0,'$do_date',old.payment_date_last),
68. nvl(old.payment_count,0)+nvl(new.payment_count,0),
69. nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
70. nvl(new.payment_last_30d_count,0),
71. nvl(new.payment_last_30d_amount,0)
72. from
73. (
74. select
75. *
76. from ${APP}.dwt_user_topic
77. )old
78. full outer join
79. (
80. select
81. user_id,
82. sum(if(dt='$do_date',login_count,0)) login_count,
83. sum(if(dt='$do_date',order_count,0)) order_count,
84. sum(if(dt='$do_date',order_amount,0)) order_amount,
85. sum(if(dt='$do_date',payment_count,0)) payment_count,
86. sum(if(dt='$do_date',payment_amount,0)) payment_amount,
87. sum(if(order_count>0,1,0)) login_last_30d_count,
88. sum(order_count) order_last_30d_count,
89. sum(order_amount) order_last_30d_amount,
90. sum(payment_count) payment_last_30d_count,
91. sum(payment_amount) payment_last_30d_amount
92. from ${APP}.dws_user_action_daycount
93. where dt>=date_add( '$do_date',-30)
94. group by user_id
95. )new
96. on old.user_id=new.user_id;
97. with
98. sku_act as
99. (
100. select
101. sku_id,
102. sum(if(dt='$do_date', order_count,0 )) order_count,
103. sum(if(dt='$do_date',order_num ,0 )) order_num,
104. sum(if(dt='$do_date',order_amount,0 )) order_amount ,
105. sum(if(dt='$do_date',payment_count,0 )) payment_count,
106. sum(if(dt='$do_date',payment_num,0 )) payment_num,
107. sum(if(dt='$do_date',payment_amount,0 )) payment_amount,
108. sum(if(dt='$do_date',refund_count,0 )) refund_count,
109. sum(if(dt='$do_date',refund_num,0 )) refund_num,
110. sum(if(dt='$do_date',refund_amount,0 )) refund_amount,
111. sum(if(dt='$do_date',cart_count,0 )) cart_count,
112. sum(if(dt='$do_date',cart_num,0 )) cart_num,
113. sum(if(dt='$do_date',favor_count,0 )) favor_count,
114. sum(if(dt='$do_date',appraise_good_count,0 )) appraise_good_count,
115. sum(if(dt='$do_date',appraise_mid_count,0 ) ) appraise_mid_count ,
116. sum(if(dt='$do_date',appraise_bad_count,0 )) appraise_bad_count,
117. sum(if(dt='$do_date',appraise_default_count,0 )) appraise_default_count,
118. sum( order_count ) order_count30 ,
119. sum( order_num ) order_num30,
120. sum(order_amount ) order_amount30,
121. sum(payment_count ) payment_count30,
122. sum(payment_num ) payment_num30,
123. sum(payment_amount ) payment_amount30,
124. sum(refund_count ) refund_count30,
125. sum(refund_num ) refund_num30,
126. sum(refund_amount ) refund_amount30,
127. sum(cart_count ) cart_count30,
128. sum(cart_num ) cart_num30,
129. sum(favor_count ) favor_count30,
130. sum(appraise_good_count ) appraise_good_count30,
131. sum(appraise_mid_count ) appraise_mid_count30,
132. sum(appraise_bad_count ) appraise_bad_count30,
133. sum(appraise_default_count ) appraise_default_count30
134. from ${APP}.dws_sku_action_daycount
135. where dt>=date_add ( '$do_date',-30)
136. group by sku_id
137. ),
138. sku_topic
139. as
140. (
141. select
142. sku_id,
143. spu_id,
144. order_last_30d_count,
145. order_last_30d_num,
146. order_last_30d_amount,
147. order_count,
148. order_num,
149. order_amount ,
150. payment_last_30d_count,
151. payment_last_30d_num,
152. payment_last_30d_amount,
153. payment_count,
154. payment_num,
155. payment_amount,
156. refund_last_30d_count,
157. refund_last_30d_num,
158. refund_last_30d_amount ,
159. refund_count ,
160. refund_num ,
161. refund_amount ,
162. cart_last_30d_count ,
163. cart_last_30d_num ,
164. cart_count ,
165. cart_num ,
166. favor_last_30d_count ,
167. favor_count ,
168. appraise_last_30d_good_count ,
169. appraise_last_30d_mid_count ,
170. appraise_last_30d_bad_count ,
171. appraise_last_30d_default_count ,
172. appraise_good_count ,
173. appraise_mid_count ,
174. appraise_bad_count ,
175. appraise_default_count
176. from ${APP}.dwt_sku_topic
177. )
178.
179.
180. insert overwrite table ${APP}.dwt_sku_topic
181. select
182. nvl(sku_act.sku_id,sku_topic.sku_id) ,
183. sku_info.spu_id,
184. nvl (sku_act.order_count30,0) ,
185. nvl (sku_act.order_num30,0) ,
186. nvl (sku_act.order_amount30,0) ,
187. nvl(sku_topic.order_count,0)+ nvl (sku_act.order_count,0) ,
188. nvl(sku_topic.order_num,0)+ nvl (sku_act.order_num,0) ,
189. nvl(sku_topic.order_amount,0)+ nvl (sku_act.order_amount,0),
190. nvl (sku_act.payment_count30,0),
191. nvl (sku_act.payment_num30,0),
192. nvl (sku_act.payment_amount30,0),
193. nvl(sku_topic.payment_count,0)+ nvl (sku_act.payment_count,0) ,
194. nvl(sku_topic.payment_num,0)+ nvl (sku_act.payment_count,0) ,
195. nvl(sku_topic.payment_amount,0)+ nvl (sku_act.payment_count,0) ,
196. nvl (refund_count30,0),
197. nvl (sku_act.refund_num30,0),
198. nvl (sku_act.refund_amount30,0),
199. nvl(sku_topic.refund_count,0)+ nvl (sku_act.refund_count,0),
200. nvl(sku_topic.refund_num,0)+ nvl (sku_act.refund_num,0),
201. nvl(sku_topic.refund_amount,0)+ nvl (sku_act.refund_amount,0),
202. nvl(sku_act.cart_count30,0) ,
203. nvl(sku_act.cart_num30,0) ,
204. nvl(sku_topic.cart_count ,0)+ nvl (sku_act.cart_count,0),
205. nvl( sku_topic.cart_num ,0)+ nvl (sku_act.cart_num,0),
206. nvl(sku_act.favor_count30 ,0) ,
207. nvl (sku_topic.favor_count ,0)+ nvl (sku_act.favor_count,0),
208. nvl (sku_act.appraise_good_count30 ,0) ,
209. nvl (sku_act.appraise_mid_count30 ,0) ,
210. nvl (sku_act.appraise_bad_count30 ,0) ,
211. nvl (sku_act.appraise_default_count30 ,0) ,
212. nvl (sku_topic.appraise_good_count ,0)+ nvl
213. (sku_act.appraise_good_count,0) ,
214. nvl (sku_topic.appraise_mid_count ,0)+ nvl
215. (sku_act.appraise_mid_count,0) ,
216. nvl (sku_topic.appraise_bad_count ,0)+ nvl
217. (sku_act.appraise_bad_count,0) ,
218. nvl (sku_topic.appraise_default_count ,0)+ nvl
219. (sku_act.appraise_default_count,0)
220. from sku_act
221. full outer join sku_topic
222. on sku_act.sku_id =sku_topic.sku_id
223. left join
224. (select * from ${APP}.dwd_dim_sku_info where dt='$do_date') sku_info
225. on nvl(sku_topic.sku_id,sku_act.sku_id)= sku_info.id;
226. "
227. $hive -e "$sql"
228.
复制代码
2)增加脚本执行权限
1. chmod 770 dws_to_dwt.sh
3)执行脚本导入数据
1. dws_to_dwt.sh 2020-03-11
4)查看导入数据
1. select * from dwt_uv_topic limit 5;
2. select * from dwt_user_topic limit 5;
3. select * from dwt_sku_topic limit 5;