问题导读
1.DWS 层有哪些专业术语?

2.系统函数有哪些?
3.nvl 函数基本语法是什么?
4.DWS 层(业务)包含哪些内容?
**

一、数仓搭建 - DWS 层

1.1 业务术语

1)用户
用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android
系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户

2)新增用户
首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用
户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月
新增用户

3)活跃用户
打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计
为一个活跃用户

4)周(月)活跃用户
某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户

5)月活跃率
月活跃用户与截止到该月累计的用户总和之间的比例

6)沉默用户
用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用
户质量和用户与 APP 的匹配程度

7)版本分布
不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之
间的优劣和用户行为习惯

8)本周回流用户
上周未启动过应用,本周启动了应用的用户

9)连续 n 周活跃用户
连续 n 周,每周至少启动一次

10)忠诚用户
连续活跃 5 周以上的用户

11)连续活跃用户
连续 2 周及以上活跃的用户

12)近期流失用户
连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)

13)留存用户
某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分
用户占当时新增用户的比例即是留存率
例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%,二个月后的留存率是 40%,三个月后的留存率是 25%

14)用户新鲜度
每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例

15)单次使用时长
每次启动使用的时间长度

16)日使用时长
累计一天内的使用时间长度

17)启动次数计算标准
IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔

1.2 系统函数

1.2.1 collect_set 函数

1)创建原数据表

  1. 1. drop table if exists stud;
  2. 2. create table stud (name string, area string, course string, score int);

2)向原数据表中插入数据

  1. 1. insert into table stud values('zhang3','bj','math',88);
  2. 2. insert into table stud values('li4','bj','math',99);
  3. 3. insert into table stud values('wang5','sh','chinese',92);
  4. 4. insert into table stud values('zhao6','sh','chinese',54);
  5. 5. insert into table stud values('tian7','bj','chinese',91);

3)查询表中数据

  1. 1. select * from stud;
  2. 2.
  3. 3. stud.name stud.area stud.course stud.score
  4. 4. zhang3 bj math 88
  5. 5. li4 bj math 99
  6. 6. wang5 sh chinese 92
  7. 7. zhao6 sh chinese 54
  8. 8. tian7 bj chinese 91

4)把同一分组的不同行的数据聚合成一个集合

  1. 1. select course, collect_set(area), avg(score) from stud group by course;
  2. 2.
  3. 3. chinese ["sh","bj"] 79.0
  4. 4. math ["bj"] 93.5

5) 用下标可以取某一个

  1. 1. select course, collect_set(area)[0], avg(score) from
  2. 2.
  3. 3. stud group by course;
  4. 4. chinese sh 79.0
  5. 5. math bj 93.5
  6. 6.
  7. 复制代码

1.2.2 nvl 函数

1)基本语法

NVL(表达式 1,表达式 2)

如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型

1.2.3 日期处理函数

1)date_format 函数(根据格式整理日期)

  1. 1. hive (gmall)> select date_format('2020-03-10','yyyy-MM');
  2. 2. 2020-03

2)date_add 函数(加减日期)

  1. 1. hive (gmall)> select date_add('2020-03-10',-1);
  2. 2. 2020-03-09
  3. 3. hive (gmall)> select date_add('2020-03-10',1);
  4. 4. 2020-03-11

3)next_day 函数
(1)取当前天的下一个周一

  1. 1. hive (gmall)> select next_day('2020-03-12','MO');
  2. 2. 2020-03-16
  3. 3.
  4. 4. 说明:星期一到星期日的英文(MondayTuesdayWednesdayThursdayFridaySaturdaySunday

(2)取当前周的周一

  1. 1. hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);
  2. 2. 2020-03-11

4)last_day 函数(求当月最后一天日期)

  1. 1. hive (gmall)> select last_day('2020-03-10');
  2. 2. 2020-03-31

1.3 DWS 层(用户行为)

1.3.1 每日设备行为

每日设备行为,主要按照 设备 id 统计

数仓项目(九)数仓搭建 - DWS 层 - 图1

1)建表语句

  1. 1. drop table if exists dws_uv_detail_daycount;
  2. 2. create external table dws_uv_detail_daycount
  3. 3. (
  4. 4. `mid_id` string COMMENT '设备唯一标识',
  5. 5. `user_id` string COMMENT '用户标识',
  6. 6. `version_code` string COMMENT '程序版本号',
  7. 7. `version_name` string COMMENT '程序版本名',
  8. 8. `lang` string COMMENT '系统语言',
  9. 9. `source` string COMMENT '渠道号',
  10. 10. `os` string COMMENT '安卓系统版本',
  11. 11. `area` string COMMENT '区域',
  12. 12. `model` string COMMENT '手机型号',
  13. 13. `brand` string COMMENT '手机品牌',
  14. 14. `sdk_version` string COMMENT 'sdkVersion',
  15. 15. `gmail` string COMMENT 'gmail',
  16. 16. `height_width` string COMMENT '屏幕宽高',
  17. 17. `app_time` string COMMENT '客户端日志产生时的时间',
  18. 18. `network` string COMMENT '网络模式',
  19. 19. `lng` string COMMENT '经度',
  20. 20. `lat` string COMMENT '纬度',
  21. 21. `login_count` bigint COMMENT '活跃次数'
  22. 22. )
  23. 23. partitioned by(dt string)
  24. 24. stored as parquet
  25. 25. location '/warehouse/gmall/dws/dws_uv_detail_daycount';

2)数据装载

  1. 1. insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')
  2. 2. select
  3. 3. mid_id,
  4. 4. concat_ws('|', collect_set(user_id)) user_id,
  5. 5. concat_ws('|', collect_set(version_code)) version_code,
  6. 6. concat_ws('|', collect_set(version_name)) version_name,
  7. 7. concat_ws('|', collect_set(lang))lang,
  8. 8. concat_ws('|', collect_set(source)) source,
  9. 9. concat_ws('|', collect_set(os)) os,
  10. 10. concat_ws('|', collect_set(area)) area,
  11. 11. concat_ws('|', collect_set(model)) model,
  12. 12. concat_ws('|', collect_set(brand)) brand,
  13. 13. concat_ws('|', collect_set(sdk_version)) sdk_version,
  14. 14. concat_ws('|', collect_set(gmail)) gmail,
  15. 15. concat_ws('|', collect_set(height_width)) height_width,
  16. 16. concat_ws('|', collect_set(app_time)) app_time,
  17. 17. concat_ws('|', collect_set(network)) network,
  18. 18. concat_ws('|', collect_set(lng)) lng,
  19. 19. concat_ws('|', collect_set(lat)) lat,
  20. 20. count(*) login_count
  21. 21. from dwd_start_log
  22. 22. where dt='2020-03-10'
  23. 23. group by mid_id;

3)查询加载结果

  1. 1. select * from dws_uv_detail_daycount where dt='2020-03-10';

1.4 DWS 层(业务)

DWS 层的宽表字段,是站在不同维度的视角去看事实表,重点关注事实表的度量值

数仓项目(九)数仓搭建 - DWS 层 - 图2

1.4.1 每日会员行为

1)建表语句

  1. 1. drop table if exists dws_user_action_daycount;
  2. 2. create external table dws_user_action_daycount
  3. 3. (
  4. 4. user_id string comment '用户 id',
  5. 5. login_count bigint comment '登录次数',
  6. 6. cart_count bigint comment '加入购物车次数',
  7. 7. cart_amount double comment '加入购物车金额',
  8. 8. order_count bigint comment '下单次数',
  9. 9. order_amount decimal(16,2) comment '下单金额',
  10. 10. payment_count bigint comment '支付次数',
  11. 11. payment_amount decimal(16,2) comment '支付金额'
  12. 12. ) COMMENT '每日用户行为'
  13. 13. PARTITIONED BY (`dt` string)
  14. 14. stored as parquet
  15. 15. location '/warehouse/gmall/dws/dws_user_action_daycount/'
  16. 16. tblproperties ("parquet.compression"="lzo");

2)数据装载

  1. 1. with
  2. 2. tmp_login as
  3. 3. (
  4. 4. select
  5. 5. user_id,
  6. 6. count(*) login_count
  7. 7. from dwd_start_log
  8. 8. where dt='2020-03-10'
  9. 9. and user_id is not null
  10. 10. group by user_id
  11. 11. ),
  12. 12. tmp_cart as
  13. 13. (
  14. 14. select
  15. 15. user_id,
  16. 16. count(*) cart_count,
  17. 17. sum(cart_price*sku_num) cart_amount
  18. 18. from dwd_fact_cart_info
  19. 19. where dt='2020-03-10'
  20. 20. and user_id is not null
  21. 21. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
  22. 22. group by user_id
  23. 23. ),
  24. 24. tmp_order as
  25. 25. (
  26. 26. select
  27. 27. user_id,
  28. 28. count(*) order_count,
  29. 29. sum(final_total_amount) order_amount
  30. 30. from dwd_fact_order_info
  31. 31. where dt='2020-03-10'
  32. 32. group by user_id
  33. 33. ) ,
  34. 34. tmp_payment as
  35. 35. (
  36. 36. select
  37. 37. user_id,
  38. 38. count(*) payment_count,
  39. 39. sum(payment_amount) payment_amount
  40. 40. from dwd_fact_payment_info
  41. 41. where dt='2020-03-10'
  42. 42. group by user_id
  43. 43. )
  44. 44. insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')
  45. 45. select
  46. 46. user_actions.user_id,
  47. 47. sum(user_actions.login_count),
  48. 48. sum(user_actions.cart_count),
  49. 49. sum(user_actions.cart_amount),
  50. 50. sum(user_actions.order_count),
  51. 51. sum(user_actions.order_amount),
  52. 52. sum(user_actions.payment_count),
  53. 53. sum(user_actions.payment_amount)
  54. 54. from
  55. 55. (
  56. 56. select
  57. 57. user_id,
  58. 58. login_count,
  59. 59. 0 cart_count,
  60. 60. 0 cart_amount,
  61. 61. 0 order_count,
  62. 62. 0 order_amount,
  63. 63. 0 payment_count,
  64. 64. 0 payment_amount
  65. 65. from
  66. 66. tmp_login
  67. 67. union all
  68. 68. select
  69. 69. user_id,
  70. 70. 0 login_count,
  71. 71. cart_count,
  72. 72. cart_amount,
  73. 73. 0 order_count,
  74. 74. 0 order_amount,
  75. 75. 0 payment_count,
  76. 76. 0 payment_amount
  77. 77. from
  78. 78. tmp_cart
  79. 79. union all
  80. 80. select
  81. 81. user_id,
  82. 82. 0 login_count,
  83. 83. 0 cart_count,
  84. 84. 0 cart_amount,
  85. 85. order_count,
  86. 86. order_amount,
  87. 87. 0 payment_count,
  88. 88. 0 payment_amount
  89. 89. from tmp_order
  90. 90. union all
  91. 91. select
  92. 92. user_id,
  93. 93. 0 login_count,
  94. 94. 0 cart_count,
  95. 95. 0 cart_amount,
  96. 96. 0 order_count,
  97. 97. 0 order_amount,
  98. 98. payment_count,
  99. 99. payment_amount
  100. 100. from tmp_payment
  101. 101. ) user_actions
  102. 102. group by user_id;

3)查询加载结果
hive (gmall)> select * from dws_user_action_daycount where dt=‘2020-03-10’;

1.4.2 每日商品行为

1)建表语句

  1. 1. drop table if exists dws_sku_action_daycount;
  2. 2. create external table dws_sku_action_daycount
  3. 3. (
  4. 4. sku_id string comment 'sku_id',
  5. 5. order_count bigint comment '被下单次数',
  6. 6. order_num bigint comment '被下单件数',
  7. 7. order_amount decimal(16,2) comment '被下单金额',
  8. 8. payment_count bigint comment '被支付次数',
  9. 9. payment_num bigint comment '被支付件数',
  10. 10. payment_amount decimal(16,2) comment '被支付金额',
  11. 11. refund_count bigint comment '被退款次数',
  12. 12. refund_num bigint comment '被退款件数',
  13. 13. refund_amount decimal(16,2) comment '被退款金额',
  14. 14. cart_count bigint comment '被加入购物车次数',
  15. 15. cart_num bigint comment '被加入购物车件数',
  16. 16. favor_count bigint comment '被收藏次数',
  17. 17. appraise_good_count bigint comment '好评数',
  18. 18. appraise_mid_count bigint comment '中评数',
  19. 19. appraise_bad_count bigint comment '差评数',
  20. 20. appraise_default_count bigint comment '默认评价数'
  21. 21. ) COMMENT '每日商品行为'
  22. 22. PARTITIONED BY (`dt` string)
  23. 23. stored as parquet
  24. 24. location '/warehouse/gmall/dws/dws_sku_action_daycount/'
  25. 25. tblproperties ("parquet.compression"="lzo");
  26. 26.
  27. 复制代码

2)数据装载
注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单

  1. 1. with
  2. 2. tmp_order as
  3. 3. (
  4. 4. select
  5. 5. sku_id,
  6. 6. count(*) order_count,
  7. 7. sum(sku_num) order_num,
  8. 8. sum(total_amount) order_amount
  9. 9. from dwd_fact_order_detail
  10. 10. where dt='2020-03-10'
  11. 11. group by sku_id
  12. 12. ),
  13. 13. tmp_payment as
  14. 14. (
  15. 15. select
  16. 16. sku_id,
  17. 17. count(*) payment_count,
  18. 18. sum(sku_num) payment_num,
  19. 19. sum(total_amount) payment_amount
  20. 20. from dwd_fact_order_detail
  21. 21. where dt='2020-03-10'
  22. 22. and order_id in
  23. 23. (
  24. 24. select
  25. 25. id
  26. 26. from dwd_fact_order_info
  27. 27. where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))
  28. 28. and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'
  29. 29. )
  30. 30. group by sku_id
  31. 31. ),
  32. 32. tmp_refund as
  33. 33. (
  34. 34. select
  35. 35. sku_id,
  36. 36. count(*) refund_count,
  37. 37. sum(refund_num) refund_num,
  38. 38. sum(refund_amount) refund_amount
  39. 39. from dwd_fact_order_refund_info
  40. 40. where dt='2020-03-10'
  41. 41. group by sku_id
  42. 42. ),
  43. 43. tmp_cart as
  44. 44. (
  45. 45. select
  46. 46. sku_id,
  47. 47. count(*) cart_count,
  48. 48. sum(sku_num) cart_num
  49. 49. from dwd_fact_cart_info
  50. 50. where dt='2020-03-10'
  51. 51. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
  52. 52. group by sku_id
  53. 53. ),
  54. 54. tmp_favor as
  55. 55. (
  56. 56. select
  57. 57. sku_id,
  58. 58. count(*) favor_count
  59. 59. from dwd_fact_favor_info
  60. 60. where dt='2020-03-10'
  61. 61. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
  62. 62. group by sku_id
  63. 63. ),
  64. 64. tmp_appraise as
  65. 65. (
  66. 66. select
  67. 67. sku_id,
  68. 68. sum(if(appraise='1201',1,0)) appraise_good_count,
  69. 69. sum(if(appraise='1202',1,0)) appraise_mid_count,
  70. 70. sum(if(appraise='1203',1,0)) appraise_bad_count,
  71. 71. sum(if(appraise='1204',1,0)) appraise_default_count
  72. 72. from dwd_fact_comment_info
  73. 73. where dt='2020-03-10'
  74. 74. group by sku_id
  75. 75. )
  76. 76. insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')
  77. 77. select
  78. 78. sku_id,
  79. 79. sum(order_count),
  80. 80. sum(order_num),
  81. 81. sum(order_amount),
  82. 82. sum(payment_count),
  83. 83. sum(payment_num),
  84. 84. sum(payment_amount),
  85. 85. sum(refund_count),
  86. 86. sum(refund_num),
  87. 87. sum(refund_amount),
  88. 88. sum(cart_count),
  89. 89. sum(cart_num),
  90. 90. sum(favor_count),
  91. 91. sum(appraise_good_count),
  92. 92. sum(appraise_mid_count),
  93. 93. sum(appraise_bad_count),
  94. 94. sum(appraise_default_count)
  95. 95. from
  96. 96. (
  97. 97. select
  98. 98. sku_id,
  99. 99. order_count,
  100. 100. order_num,
  101. 101. order_amount,
  102. 102. 0 payment_count,
  103. 103. 0 payment_num,
  104. 104. 0 payment_amount,
  105. 105. 0 refund_count,
  106. 106. 0 refund_num,
  107. 107. 0 refund_amount,
  108. 108. 0 cart_count,
  109. 109. 0 cart_num,
  110. 110. 0 favor_count,
  111. 111. 0 appraise_good_count,
  112. 112. 0 appraise_mid_count,
  113. 113. 0 appraise_bad_count,
  114. 114. 0 appraise_default_count
  115. 115. from tmp_order
  116. 116. union all
  117. 117. select
  118. 118. sku_id,
  119. 119. 0 order_count,
  120. 120. 0 order_num,
  121. 121. 0 order_amount,
  122. 122. payment_count,
  123. 123. payment_num,
  124. 124. payment_amount,
  125. 125. 0 refund_count,
  126. 126. 0 refund_num,
  127. 127. 0 refund_amount,
  128. 128. 0 cart_count,
  129. 129. 0 cart_num,
  130. 130. 0 favor_count,
  131. 131. 0 appraise_good_count,
  132. 132. 0 appraise_mid_count,
  133. 133. 0 appraise_bad_count,
  134. 134. 0 appraise_default_count
  135. 135. from tmp_payment
  136. 136. union all
  137. 137. select
  138. 138. sku_id,
  139. 139. 0 order_count,
  140. 140. 0 order_num,
  141. 141. 0 order_amount,
  142. 142. 0 payment_count,
  143. 143. 0 payment_num,
  144. 144. 0 payment_amount,
  145. 145. refund_count,
  146. 146. refund_num,
  147. 147. refund_amount,
  148. 148. 0 cart_count,
  149. 149. 0 cart_num,
  150. 150. 0 favor_count,
  151. 151. 0 appraise_good_count,
  152. 152. 0 appraise_mid_count,
  153. 153. 0 appraise_bad_count,
  154. 154. 0 appraise_default_count
  155. 155. from tmp_refund
  156. 156. union all
  157. 157. select
  158. 158. sku_id,
  159. 159. 0 order_count,
  160. 160. 0 order_num,
  161. 161. 0 order_amount,
  162. 162. 0 payment_count,
  163. 163. 0 payment_num,
  164. 164. 0 payment_amount,
  165. 165. 0 refund_count,
  166. 166. 0 refund_num,
  167. 167. 0 refund_amount,
  168. 168. cart_count,
  169. 169. cart_num,
  170. 170. 0 favor_count,
  171. 171. 0 appraise_good_count,
  172. 172. 0 appraise_mid_count,
  173. 173. 0 appraise_bad_count,
  174. 174. 0 appraise_default_count
  175. 175. from tmp_cart
  176. 176. union all
  177. 177. select
  178. 178. sku_id,
  179. 179. 0 order_count,
  180. 180. 0 order_num,
  181. 181. 0 order_amount,
  182. 182. 0 payment_count,
  183. 183. 0 payment_num,
  184. 184. 0 payment_amount,
  185. 185. 0 refund_count,
  186. 186. 0 refund_num,
  187. 187. 0 refund_amount,
  188. 188. 0 cart_count,
  189. 189. 0 cart_num,
  190. 190. favor_count,
  191. 191. 0 appraise_good_count,
  192. 192. 0 appraise_mid_count,
  193. 193. 0 appraise_bad_count,
  194. 194. 0 appraise_default_count
  195. 195. from tmp_favor
  196. 196. union all
  197. 197. select
  198. 198. sku_id,
  199. 199. 0 order_count,
  200. 200. 0 order_num,
  201. 201. 0 order_amount,
  202. 202. 0 payment_count,
  203. 203. 0 payment_num,
  204. 204. 0 payment_amount,
  205. 205. 0 refund_count,
  206. 206. 0 refund_num,
  207. 207. 0 refund_amount,
  208. 208. 0 cart_count,
  209. 209. 0 cart_num,
  210. 210. 0 favor_count,
  211. 211. appraise_good_count,
  212. 212. appraise_mid_count,
  213. 213. appraise_bad_count,
  214. 214. appraise_default_count
  215. 215. from tmp_appraise
  216. 216. )tmp
  217. 217. group by sku_id;
  218. 218.
  219. 复制代码

3)查询加载结果

  1. 1. select * from dws_sku_action_daycount where dt='2020-03-10';

1.4.5 每日购买行为

数仓项目(九)数仓搭建 - DWS 层 - 图3

1)建表语句

  1. 1. drop table if exists dws_sale_detail_daycount;
  2. 2. create external table dws_sale_detail_daycount
  3. 3. (
  4. 4. user_id string comment '用户 id',
  5. 5. sku_id string comment '商品 id',
  6. 6. user_gender string comment '用户性别',
  7. 7. user_age string comment '用户年龄',
  8. 8. user_level string comment '用户等级',
  9. 9. order_price decimal(10,2) comment '商品价格',
  10. 10. sku_name string comment '商品名称',
  11. 11. sku_tm_id string comment '品牌 id',
  12. 12. sku_category3_id string comment '商品三级品类 id',
  13. 13. sku_category2_id string comment '商品二级品类 id',
  14. 14. sku_category1_id string comment '商品一级品类 id',
  15. 15. sku_category3_name string comment '商品三级品类名称',
  16. 16. sku_category2_name string comment '商品二级品类名称',
  17. 17. sku_category1_name string comment '商品一级品类名称',
  18. 18. spu_id string comment '商品 spu',
  19. 19. sku_num int comment '购买个数',
  20. 20. order_count bigint comment '当日下单单数',
  21. 21. order_amount decimal(16,2) comment '当日下单金额'
  22. 22. ) COMMENT '每日购买行为'
  23. 23. PARTITIONED BY (`dt` string)
  24. 24. stored as parquet
  25. 25. location '/warehouse/gmall/dws/dws_sale_detail_daycount/'
  26. 26. tblproperties ("parquet.compression"="lzo");

2)数据装载

  1. 1. insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')
  2. 2. select
  3. 3. op.user_id,
  4. 4. op.sku_id,
  5. 5. ui.gender,
  6. 6. months_between('2020-03-10', ui.birthday)/12 age,
  7. 7. ui.user_level,
  8. 8. si.price,
  9. 9. si.sku_name,
  10. 10. si.tm_id,
  11. 11. si.category3_id,
  12. 12. si.category2_id,
  13. 13. si.category1_id,
  14. 14. si.category3_name,
  15. 15. si.category2_name,
  16. 16. si.category1_name,
  17. 17. si.spu_id,
  18. 18. op.sku_num,
  19. 19. op.order_count,
  20. 20. op.order_amount
  21. 21. from
  22. 22. (
  23. 23. select
  24. 24. user_id,
  25. 25. sku_id,
  26. 26. sum(sku_num) sku_num,
  27. 27. count(*) order_count,
  28. 28. sum(total_amount) order_amount
  29. 29. from dwd_fact_order_detail
  30. 30. where dt='2020-03-10'
  31. 31. group by user_id, sku_id
  32. 32. )op
  33. 33. join
  34. 34. (
  35. 35. select
  36. 36. *
  37. 37. from dwd_dim_user_info_his
  38. 38. where end_date='9999-99-99'
  39. 39. )ui on op.user_id = ui.id
  40. 40. join
  41. 41. (
  42. 42. select
  43. 43. *
  44. 44. from dwd_dim_sku_info
  45. 45. where dt='2020-03-10'
  46. 46. )si on op.sku_id = si.id;

3)查询加载结果

  1. 1. select * from dws_sale_detail_daycount where dt='2020-03-10';
  2. 2.
  3. 复制代码

1.5 DWS 层数据导入脚本

1)vim dwd_to_dws.sh
在脚本中填写如下内容

  1. 1. #!/bin/bash
  2. 2. APP=gmall
  3. 3. hive=/opt/modules/hive/bin/hive
  4. 4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  5. 5. if [ -n "$1" ] ;then
  6. 6. do_date=$1
  7. 7. else
  8. 8. do_date=`date -d "-1 day" +%F`
  9. 9. fi
  10. 10. sql="
  11. 11. insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
  12. 12. select
  13. 13. mid_id,
  14. 14. concat_ws('|', collect_set(user_id)) user_id,
  15. 15. concat_ws('|', collect_set(version_code)) version_code,
  16. 16. concat_ws('|', collect_set(version_name)) version_name,
  17. 17. concat_ws('|', collect_set(lang))lang,
  18. 18. concat_ws('|', collect_set(source)) source,
  19. 19. concat_ws('|', collect_set(os)) os,
  20. 20. concat_ws('|', collect_set(area)) area,
  21. 21. concat_ws('|', collect_set(model)) model,
  22. 22. concat_ws('|', collect_set(brand)) brand,
  23. 23. concat_ws('|', collect_set(sdk_version)) sdk_version,
  24. 24. concat_ws('|', collect_set(gmail)) gmail,
  25. 25. concat_ws('|', collect_set(height_width)) height_width,
  26. 26. concat_ws('|', collect_set(app_time)) app_time,
  27. 27. concat_ws('|', collect_set(network)) network,
  28. 28. concat_ws('|', collect_set(lng)) lng,
  29. 29. concat_ws('|', collect_set(lat)) lat,
  30. 30. count(*) login_count
  31. 31. from ${APP}.dwd_start_log
  32. 32. where dt='$do_date'
  33. 33. group by mid_id;
  34. 34. with
  35. 35. tmp_login as
  36. 36. (
  37. 37. select
  38. 38. user_id,
  39. 39. count(*) login_count
  40. 40. from ${APP}.dwd_start_log
  41. 41. where dt='$do_date'
  42. 42. and user_id is not null
  43. 43. group by user_id
  44. 44. ),
  45. 45. tmp_cart as
  46. 46. (
  47. 47. select
  48. 48. user_id,
  49. 49. count(*) cart_count,
  50. 50. sum(cart_price*sku_num) cart_amount
  51. 51. from ${APP}.dwd_fact_cart_info
  52. 52. where dt='$do_date'
  53. 53. and user_id is not null
  54. 54. and date_format(create_time,'yyyy-MM-dd')='$do_date'
  55. 55. group by user_id
  56. 56. ),
  57. 57. tmp_order as
  58. 58. (
  59. 59. select
  60. 60. user_id,
  61. 61. count(*) order_count,
  62. 62. sum(final_total_amount) order_amount
  63. 63. from ${APP}.dwd_fact_order_info
  64. 64. where dt='$do_date'
  65. 65. group by user_id
  66. 66. ) ,
  67. 67. tmp_payment as
  68. 68. (
  69. 69. select
  70. 70. user_id,
  71. 71. count(*) payment_count,
  72. 72. sum(payment_amount) payment_amount
  73. 73. from ${APP}.dwd_fact_payment_info
  74. 74. where dt='$do_date'
  75. 75. group by user_id
  76. 76. )
  77. 77.
  78. 78.
  79. 79. insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
  80. 80. select
  81. 81. user_actions.user_id,
  82. 82. sum(user_actions.login_count),
  83. 83. sum(user_actions.cart_count),
  84. 84. sum(user_actions.cart_amount),
  85. 85. sum(user_actions.order_count),
  86. 86. sum(user_actions.order_amount),
  87. 87. sum(user_actions.payment_count),
  88. 88. sum(user_actions.payment_amount)
  89. 89. from
  90. 90. (
  91. 91. select
  92. 92. user_id,
  93. 93. login_count,
  94. 94. 0 cart_count,
  95. 95. 0 cart_amount,
  96. 96. 0 order_count,
  97. 97. 0 order_amount,
  98. 98. 0 payment_count,
  99. 99. 0 payment_amount
  100. 100. from
  101. 101. tmp_login
  102. 102. union all
  103. 103. select
  104. 104. user_id,
  105. 105. 0 login_count,
  106. 106. cart_count,
  107. 107. cart_amount,
  108. 108. 0 order_count,
  109. 109. 0 order_amount,
  110. 110. 0 payment_count,
  111. 111. 0 payment_amount
  112. 112. from
  113. 113. tmp_cart
  114. 114. union all
  115. 115. select
  116. 116. user_id,
  117. 117. 0 login_count,
  118. 118. 0 cart_count,
  119. 119. 0 cart_amount,
  120. 120. order_count,
  121. 121. order_amount,
  122. 122. 0 payment_count,
  123. 123. 0 payment_amount
  124. 124. from tmp_order
  125. 125. union all
  126. 126. select
  127. 127. user_id,
  128. 128. 0 login_count,
  129. 129. 0 cart_count,
  130. 130. 0 cart_amount,
  131. 131. 0 order_count,
  132. 132. 0 order_amount,
  133. 133. payment_count,
  134. 134. payment_amount
  135. 135. from tmp_payment
  136. 136. ) user_actions
  137. 137. group by user_id;
  138. 138. with
  139. 139. tmp_order as
  140. 140. (
  141. 141. select
  142. 142. sku_id,
  143. 143. count(*) order_count,
  144. 144. sum(sku_num) order_num,
  145. 145. sum(total_amount) order_amount
  146. 146. from ${APP}.dwd_fact_order_detail
  147. 147. where dt='$do_date'
  148. 148. group by sku_id
  149. 149. ),
  150. 150. tmp_payment as
  151. 151. (
  152. 152. select
  153. 153. sku_id,
  154. 154. count(*) payment_count,
  155. 155. sum(sku_num) payment_num,
  156. 156. sum(total_amount) payment_amount
  157. 157. from ${APP}.dwd_fact_order_detail
  158. 158. where dt='$do_date'
  159. 159. and order_id in
  160. 160. (
  161. 161. select
  162. 162. id
  163. 163. from ${APP}.dwd_fact_order_info
  164. 164. where (dt='$do_date' or dt=date_add('$do_date',-1))
  165. 165. and date_format(payment_time,'yyyy-MM-dd')='$do_date'
  166. 166. )
  167. 167. group by sku_id
  168. 168. ),
  169. 169. tmp_refund as
  170. 170. (
  171. 171. select
  172. 172. sku_id,
  173. 173. count(*) refund_count,
  174. 174. sum(refund_num) refund_num,
  175. 175. sum(refund_amount) refund_amount
  176. 176. from ${APP}.dwd_fact_order_refund_info
  177. 177. where dt='$do_date'
  178. 178. group by sku_id
  179. 179. ),
  180. 180. tmp_cart as
  181. 181. (
  182. 182. select
  183. 183. sku_id,
  184. 184. count(*) cart_count,
  185. 185. sum(sku_num) cart_num
  186. 186. from ${APP}.dwd_fact_cart_info
  187. 187. where dt='$do_date'
  188. 188. and date_format(create_time,'yyyy-MM-dd')='$do_date'
  189. 189. group by sku_id
  190. 190. ),
  191. 191. tmp_favor as
  192. 192. (
  193. 193. select
  194. 194. sku_id,
  195. 195. count(*) favor_count
  196. 196. from ${APP}.dwd_fact_favor_info
  197. 197. where dt='$do_date'
  198. 198. and date_format(create_time,'yyyy-MM-dd')='$do_date'
  199. 199. group by sku_id
  200. 200. ),
  201. 201. tmp_appraise as
  202. 202. (
  203. 203. select
  204. 204. sku_id,
  205. 205. sum(if(appraise='1201',1,0)) appraise_good_count,
  206. 206. sum(if(appraise='1202',1,0)) appraise_mid_count,
  207. 207. sum(if(appraise='1203',1,0)) appraise_bad_count,
  208. 208. sum(if(appraise='1204',1,0)) appraise_default_count
  209. 209. from ${APP}.dwd_fact_comment_info
  210. 210. where dt='$do_date'
  211. 211. group by sku_id
  212. 212. )
  213. 213.
  214. 214.
  215. 215. insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
  216. 216. select
  217. 217. sku_id,
  218. 218. sum(order_count),
  219. 219. sum(order_num),
  220. 220. sum(order_amount),
  221. 221. sum(payment_count),
  222. 222. sum(payment_num),
  223. 223. sum(payment_amount),
  224. 224. sum(refund_count),
  225. 225. sum(refund_num),
  226. 226. sum(refund_amount),
  227. 227. sum(cart_count),
  228. 228. sum(cart_num),
  229. 229. sum(favor_count),
  230. 230. sum(appraise_good_count),
  231. 231. sum(appraise_mid_count),
  232. 232. sum(appraise_bad_count),
  233. 233. sum(appraise_default_count)
  234. 234. from
  235. 235. (
  236. 236. select
  237. 237. sku_id,
  238. 238. order_count,
  239. 239. order_num,
  240. 240. order_amount,
  241. 241. 0 payment_count,
  242. 242. 0 payment_num,
  243. 243. 0 payment_amount,
  244. 244. 0 refund_count,
  245. 245. 0 refund_num,
  246. 246. 0 refund_amount,
  247. 247. 0 cart_count,
  248. 248. 0 cart_num,
  249. 249. 0 favor_count,
  250. 250. 0 appraise_good_count,
  251. 251. 0 appraise_mid_count,
  252. 252. 0 appraise_bad_count,
  253. 253. 0 appraise_default_count
  254. 254. from tmp_order
  255. 255. union all
  256. 256. select
  257. 257. sku_id,
  258. 258. 0 order_count,
  259. 259. 0 order_num,
  260. 260. 0 order_amount,
  261. 261. payment_count,
  262. 262. payment_num,
  263. 263. payment_amount,
  264. 264. 0 refund_count,
  265. 265. 0 refund_num,
  266. 266. 0 refund_amount,
  267. 267. 0 cart_count,
  268. 268. 0 cart_num,
  269. 269. 0 favor_count,
  270. 270. 0 appraise_good_count,
  271. 271. 0 appraise_mid_count,
  272. 272. 0 appraise_bad_count,
  273. 273. 0 appraise_default_count
  274. 274. from tmp_payment
  275. 275. union all
  276. 276. select
  277. 277. sku_id,
  278. 278. 0 order_count,
  279. 279. 0 order_num,
  280. 280. 0 order_amount,
  281. 281. 0 payment_count,
  282. 282. 0 payment_num,
  283. 283. 0 payment_amount,
  284. 284. refund_count,
  285. 285. refund_num,
  286. 286. refund_amount,
  287. 287. 0 cart_count,
  288. 288. 0 cart_num,
  289. 289. 0 favor_count,
  290. 290. 0 appraise_good_count,
  291. 291. 0 appraise_mid_count,
  292. 292. 0 appraise_bad_count,
  293. 293. 0 appraise_default_count
  294. 294. from tmp_refund
  295. 295. union all
  296. 296. select
  297. 297. sku_id,
  298. 298. 0 order_count,
  299. 299. 0 order_num,
  300. 300. 0 order_amount,
  301. 301. 0 payment_count,
  302. 302. 0 payment_num,
  303. 303. 0 payment_amount,
  304. 304. 0 refund_count,
  305. 305. 0 refund_num,
  306. 306. 0 refund_amount,
  307. 307. cart_count,
  308. 308. cart_num,
  309. 309. 0 favor_count,
  310. 310. 0 appraise_good_count,
  311. 311. 0 appraise_mid_count,
  312. 312. 0 appraise_bad_count,
  313. 313. 0 appraise_default_count
  314. 314. from tmp_cart
  315. 315. union all
  316. 316. select
  317. 317. sku_id,
  318. 318. 0 order_count,
  319. 319. 0 order_num,
  320. 320. 0 order_amount,
  321. 321. 0 payment_count,
  322. 322. 0 payment_num,
  323. 323. 0 payment_amount,
  324. 324. 0 refund_count,
  325. 325. 0 refund_num,
  326. 326. 0 refund_amount,
  327. 327. 0 cart_count,
  328. 328. 0 cart_num,
  329. 329. favor_count,
  330. 330. 0 appraise_good_count,
  331. 331. 0 appraise_mid_count,
  332. 332. 0 appraise_bad_count,
  333. 333. 0 appraise_default_count
  334. 334. from tmp_favor
  335. 335. union all
  336. 336. select
  337. 337. sku_id,
  338. 338. 0 order_count,
  339. 339. 0 order_num,
  340. 340. 0 order_amount,
  341. 341. 0 payment_count,
  342. 342. 0 payment_num,
  343. 343. 0 payment_amount,
  344. 344. 0 refund_count,
  345. 345. 0 refund_num,
  346. 346. 0 refund_amount,
  347. 347. 0 cart_count,
  348. 348. 0 cart_num,
  349. 349. 0 favor_count,
  350. 350. appraise_good_count,
  351. 351. appraise_mid_count,
  352. 352. appraise_bad_count,
  353. 353. appraise_default_count
  354. 354. from tmp_appraise
  355. 355. )tmp
  356. 356. group by sku_id;
  357. 357.
  358. 358.
  359. 359. insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
  360. 360. select
  361. 361. op.user_id,
  362. 362. op.sku_id,
  363. 363. ui.gender,
  364. 364. months_between('$do_date', ui.birthday)/12 age,
  365. 365. ui.user_level,
  366. 366. si.price,
  367. 367. si.sku_name,
  368. 368. si.tm_id,
  369. 369. si.category3_id,
  370. 370. si.category2_id,
  371. 371. si.category1_id,
  372. 372. si.category3_name,
  373. 373. si.category2_name,
  374. 374. si.category1_name,
  375. 375. si.spu_id,
  376. 376. op.sku_num,
  377. 377. op.order_count,
  378. 378. op.order_amount
  379. 379. from
  380. 380. (
  381. 381. select
  382. 382. user_id,
  383. 383. sku_id,
  384. 384. sum(sku_num) sku_num,
  385. 385. count(*) order_count,
  386. 386. sum(total_amount) order_amount
  387. 387. from ${APP}.dwd_fact_order_detail
  388. 388. where dt='$do_date'
  389. 389. group by user_id, sku_id
  390. 390. )op
  391. 391. join
  392. 392. (
  393. 393. select
  394. 394. *
  395. 395. from ${APP}.dwd_dim_user_info_his
  396. 396. where end_date='9999-99-99'
  397. 397. )ui on op.user_id = ui.id
  398. 398. join
  399. 399. (
  400. 400. select
  401. 401. *
  402. 402. from ${APP}.dwd_dim_sku_info
  403. 403. where dt='$do_date'
  404. 404. )si on op.sku_id = si.id;
  405. 405. "
  406. 406. $hive -e "$sql"
  407. 407.
  408. 复制代码

2)增加脚本执行权限

  1. 1. chmod 770 dwd_to_dws.sh

3)执行脚本导入数据

  1. 1. dwd_to_dws.sh 2020-03-11

4)查看导入数据

  1. 1. select * from dws_uv_detail_daycount where dt='2020-03-11';
  2. 2. select * from dws_user_action_daycount where dt='2020-03-11';
  3. 3. select * from dws_sku_action_daycount where dt='2020-03-11';
  4. 4. select * from dws_sale_detail_daycount where dt='2020-03-11';
  5. 5.
  6. 复制代码

二、数仓搭建-DWT 层

2.1 设备主题宽表

数仓项目(九)数仓搭建 - DWS 层 - 图4

1)建表语句

  1. 1. drop table if exists dwt_uv_topic;
  2. 2. create external table dwt_uv_topic
  3. 3. (
  4. 4. `mid_id` string COMMENT '设备唯一标识',
  5. 5. `user_id` string COMMENT '用户标识',
  6. 6. `version_code` string COMMENT '程序版本号',
  7. 7. `version_name` string COMMENT '程序版本名',
  8. 8. `lang` string COMMENT '系统语言',
  9. 9. `source` string COMMENT '渠道号',
  10. 10. `os` string COMMENT '安卓系统版本',
  11. 11. `area` string COMMENT '区域',
  12. 12. `model` string COMMENT '手机型号',
  13. 13. `brand` string COMMENT '手机品牌',
  14. 14. `sdk_version` string COMMENT 'sdkVersion',
  15. 15. `gmail` string COMMENT 'gmail',
  16. 16. `height_width` string COMMENT '屏幕宽高',
  17. 17. `app_time` string COMMENT '客户端日志产生时的时间',
  18. 18. `network` string COMMENT '网络模式',
  19. 19. `lng` string COMMENT '经度',
  20. 20. `lat` string COMMENT '纬度',
  21. 21. `login_date_first` string comment '首次活跃时间',
  22. 22. `login_date_last` string comment '末次活跃时间',
  23. 23. `login_day_count` bigint comment '当日活跃次数',
  24. 24. `login_count` bigint comment '累积活跃天数'
  25. 25. )
  26. 26. stored as parquet
  27. 27. location '/warehouse/gmall/dwt/dwt_uv_topic';

2)数据装载

  1. 1. insert overwrite table dwt_uv_topic
  2. 2. select
  3. 3. nvl(new.mid_id,old.mid_id),
  4. 4. nvl(new.user_id,old.user_id),
  5. 5. nvl(new.version_code,old.version_code),
  6. 6. nvl(new.version_name,old.version_name),
  7. 7. nvl(new.lang,old.lang),
  8. 8. nvl(new.source,old.source),
  9. 9. nvl(new.os,old.os),
  10. 10. nvl(new.area,old.area),
  11. 11. nvl(new.model,old.model),
  12. 12. nvl(new.brand,old.brand),
  13. 13. nvl(new.sdk_version,old.sdk_version),
  14. 14. nvl(new.gmail,old.gmail),
  15. 15. nvl(new.height_width,old.height_width),
  16. 16. nvl(new.app_time,old.app_time),
  17. 17. nvl(new.network,old.network),
  18. 18. nvl(new.lng,old.lng),
  19. 19. nvl(new.lat,old.lat),
  20. 20. if(old.mid_id is null,'2020-03-10',old.login_date_first),
  21. 21. if(new.mid_id is not null,'2020-03-10',old.login_date_last),
  22. 22. if(new.mid_id is not null, new.login_count,0),
  23. 23. nvl(old.login_count,0)+if(new.login_count>0,1,0)
  24. 24. from
  25. 25. (
  26. 26. select
  27. 27. *
  28. 28. from dwt_uv_topic
  29. 29. )old
  30. 30. full outer join
  31. 31. (
  32. 32. select
  33. 33. *
  34. 34. from dws_uv_detail_daycount
  35. 35. where dt='2020-03-10'
  36. 36. )new
  37. 37. on old.mid_id=new.mid_id;
  38. 38.
  39. 复制代码

3)查询加载结果

  1. 1. select * from dwt_uv_topic limit 5;
  2. 2.
  3. 复制代码

2.2 会员主题宽表

宽表字段怎么来?维度关联的事实表度量值+开头、结尾+累积+累积一个时间段

数仓项目(九)数仓搭建 - DWS 层 - 图5

1)建表语句

  1. 1. drop table if exists dwt_user_topic;
  2. 2. create external table dwt_user_topic
  3. 3. (
  4. 4. user_id string comment '用户 id',
  5. 5. login_date_first string comment '首次登录时间',
  6. 6. login_date_last string comment '末次登录时间',
  7. 7. login_count bigint comment '累积登录天数',
  8. 8. login_last_30d_count bigint comment '最近 30 日登录天数',
  9. 9. order_date_first string comment '首次下单时间',
  10. 10. order_date_last string comment '末次下单时间',
  11. 11. order_count bigint comment '累积下单次数',
  12. 12. order_amount decimal(16,2) comment '累积下单金额',
  13. 13. order_last_30d_count bigint comment '最近 30 日下单次数',
  14. 14. order_last_30d_amount bigint comment '最近 30 日下单金额',
  15. 15. payment_date_first string comment '首次支付时间',
  16. 16. payment_date_last string comment '末次支付时间',
  17. 17. payment_count decimal(16,2) comment '累积支付次数',
  18. 18. payment_amount decimal(16,2) comment '累积支付金额',
  19. 19. payment_last_30d_count decimal(16,2) comment '最近 30 日支付次数',
  20. 20. payment_last_30d_amount decimal(16,2) comment '最近 30 日支付金额'
  21. 21. )COMMENT '用户主题宽表'
  22. 22. stored as parquet
  23. 23. location '/warehouse/gmall/dwt/dwt_user_topic/'
  24. 24. tblproperties ("parquet.compression"="lzo");

2)数据装载

  1. 1. insert overwrite table dwt_user_topic
  2. 2. select
  3. 3. nvl(new.user_id,old.user_id),
  4. 4. if(old.login_date_first is null and
  5. 5. new.login_count>0,'2020-03-10',old.login_date_first),
  6. 6. if(new.login_count>0,'2020-03-10',old.login_date_last),
  7. 7. nvl(old.login_count,0)+if(new.login_count>0,1,0),
  8. 8. nvl(new.login_last_30d_count,0),
  9. 9. if(old.order_date_first is null and
  10. 10. new.order_count>0,'2020-03-10',old.order_date_first),
  11. 11. if(new.order_count>0,'2020-03-10',old.order_date_last),
  12. 12. nvl(old.order_count,0)+nvl(new.order_count,0),
  13. 13. nvl(old.order_amount,0)+nvl(new.order_amount,0),
  14. 14. nvl(new.order_last_30d_count,0),
  15. 15. nvl(new.order_last_30d_amount,0),
  16. 16. if(old.payment_date_first is null and
  17. 17. new.payment_count>0,'2020-03-10',old.payment_date_first),
  18. 18. if(new.payment_count>0,'2020-03-10',old.payment_date_last),
  19. 19. nvl(old.payment_count,0)+nvl(new.payment_count,0),
  20. 20. nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
  21. 21. nvl(new.payment_last_30d_count,0),
  22. 22. nvl(new.payment_last_30d_amount,0)
  23. 23. from
  24. 24. dwt_user_topic old
  25. 25. full outer join
  26. 26. (
  27. 27. select
  28. 28. user_id,
  29. 29. sum(if(dt='2020-03-10',login_count,0)) login_count,
  30. 30. sum(if(dt='2020-03-10',order_count,0)) order_count,
  31. 31. sum(if(dt='2020-03-10',order_amount,0)) order_amount,
  32. 32. sum(if(dt='2020-03-10',payment_count,0)) payment_count,
  33. 33. sum(if(dt='2020-03-10',payment_amount,0)) payment_amount,
  34. 34. sum(if(login_count>0,1,0)) login_last_30d_count,
  35. 35. sum(order_count) order_last_30d_count,
  36. 36. sum(order_amount) order_last_30d_amount,
  37. 37. sum(payment_count) payment_last_30d_count,
  38. 38. sum(payment_amount) payment_last_30d_amount
  39. 39. from dws_user_action_daycount
  40. 40. where dt>=date_add( '2020-03-10',-30)
  41. 41. group by user_id
  42. 42. )new
  43. 43. on old.user_id=new.user_id;

3)查询加载结果

  1. 1. select * from dwt_user_topic limit 5;
  2. 2.
  3. 复制代码

2.3 商品主题宽表

1)建表语句

  1. 1. create external table dwt_sku_topic
  2. 2. (
  3. 3. sku_id string comment 'sku_id',
  4. 4. spu_id string comment 'spu_id',
  5. 5. order_last_30d_count bigint comment '最近 30 日被下单次数',
  6. 6. order_last_30d_num bigint comment '最近 30 日被下单件数',
  7. 7. order_last_30d_amount decimal(16,2) comment '最近 30 日被下单金额',
  8. 8. order_count bigint comment '累积被下单次数',
  9. 9. order_num bigint comment '累积被下单件数',
  10. 10. order_amount decimal(16,2) comment '累积被下单金额',
  11. 11. payment_last_30d_count bigint comment '最近 30 日被支付次数',
  12. 12. payment_last_30d_num bigint comment '最近 30 日被支付件数',
  13. 13. payment_last_30d_amount decimal(16,2) comment '最近 30 日被支付金额',
  14. 14. payment_count bigint comment '累积被支付次数',
  15. 15. payment_num bigint comment '累积被支付件数',
  16. 16. payment_amount decimal(16,2) comment '累积被支付金额',
  17. 17. refund_last_30d_count bigint comment '最近三十日退款次数',
  18. 18. refund_last_30d_num bigint comment '最近三十日退款件数',
  19. 19. refund_last_30d_amount decimal(10,2) comment '最近三十日退款金额',
  20. 20. refund_count bigint comment '累积退款次数',
  21. 21. refund_num bigint comment '累积退款件数',
  22. 22. refund_amount decimal(10,2) comment '累积退款金额',
  23. 23. cart_last_30d_count bigint comment '最近 30 日被加入购物车次数',
  24. 24. cart_last_30d_num bigint comment '最近 30 日被加入购物车件数',
  25. 25. cart_count bigint comment '累积被加入购物车次数',
  26. 26. cart_num bigint comment '累积被加入购物车件数',
  27. 27. favor_last_30d_count bigint comment '最近 30 日被收藏次数',
  28. 28. favor_count bigint comment '累积被收藏次数',
  29. 29. appraise_last_30d_good_count bigint comment '最近 30 日好评数',
  30. 30. appraise_last_30d_mid_count bigint comment '最近 30 日中评数',
  31. 31. appraise_last_30d_bad_count bigint comment '最近 30 日差评数',
  32. 32. appraise_last_30d_default_count bigint comment '最近 30 日默认评价数',
  33. 33. appraise_good_count bigint comment '累积好评数',
  34. 34. appraise_mid_count bigint comment '累积中评数',
  35. 35. appraise_bad_count bigint comment '累积差评数',
  36. 36. appraise_default_count bigint comment '累积默认评价数'
  37. 37. )COMMENT '商品主题宽表'
  38. 38. stored as parquet
  39. 39. location '/warehouse/gmall/dwt/dwt_sku_topic/'
  40. 40. tblproperties ("parquet.compression"="lzo");

2)数据装载

  1. 1. insert overwrite table dwt_sku_topic
  2. 2. select
  3. 3. nvl(new.sku_id,old.sku_id), sku_info.spu_id,
  4. 4. nvl(new.order_count30,0),
  5. 5. nvl(new.order_num30,0),
  6. 6. nvl(new.order_amount30,0),
  7. 7. nvl(old.order_count,0) + nvl(new.order_count,0),
  8. 8. nvl(old.order_num,0) + nvl(new.order_num,0),
  9. 9. nvl(old.order_amount,0) + nvl(new.order_amount,0),
  10. 10. nvl(new.payment_count30,0),
  11. 11. nvl(new.payment_num30,0),
  12. 12. nvl(new.payment_amount30,0),
  13. 13. nvl(old.payment_count,0) + nvl(new.payment_count,0),
  14. 14. nvl(old.payment_num,0) + nvl(new.payment_count,0),
  15. 15. nvl(old.payment_amount,0) + nvl(new.payment_count,0),
  16. 16. nvl(new.refund_count30,0),
  17. 17. nvl(new.refund_num30,0),
  18. 18. nvl(new.refund_amount30,0),
  19. 19. nvl(old.refund_count,0) + nvl(new.refund_count,0),
  20. 20. nvl(old.refund_num,0) + nvl(new.refund_num,0),
  21. 21. nvl(old.refund_amount,0) + nvl(new.refund_amount,0),
  22. 22. nvl(new.cart_count30,0),
  23. 23. nvl(new.cart_num30,0),
  24. 24. nvl(old.cart_count,0) + nvl(new.cart_count,0),
  25. 25. nvl(old.cart_num,0) + nvl(new.cart_num,0),
  26. 26. nvl(new.favor_count30,0),
  27. 27. nvl(old.favor_count,0) + nvl(new.favor_count,0),
  28. 28. nvl(new.appraise_good_count30,0),
  29. 29. nvl(new.appraise_mid_count30,0),
  30. 30. nvl(new.appraise_bad_count30,0),
  31. 31. nvl(new.appraise_default_count30,0) ,
  32. 32. nvl(old.appraise_good_count,0) + nvl(new.appraise_good_count,0),
  33. 33. nvl(old.appraise_mid_count,0) + nvl(new.appraise_mid_count,0),
  34. 34. nvl(old.appraise_bad_count,0) + nvl(new.appraise_bad_count,0),
  35. 35. nvl(old.appraise_default_count,0) + nvl(new.appraise_default_count,0)
  36. 36. from
  37. 37. (
  38. 38. select
  39. 39. sku_id,
  40. 40. spu_id,
  41. 41. order_last_30d_count,
  42. 42. order_last_30d_num,
  43. 43. order_last_30d_amount,
  44. 44. order_count,
  45. 45. order_num,
  46. 46. order_amount ,
  47. 47. payment_last_30d_count,
  48. 48. payment_last_30d_num,
  49. 49. payment_last_30d_amount,
  50. 50. payment_count,
  51. 51. payment_num,
  52. 52. payment_amount,
  53. 53. refund_last_30d_count,
  54. 54. refund_last_30d_num,
  55. 55. refund_last_30d_amount,
  56. 56. refund_count,
  57. 57. refund_num,
  58. 58. refund_amount,
  59. 59. cart_last_30d_count,
  60. 60. cart_last_30d_num,
  61. 61. cart_count,
  62. 62. cart_num,
  63. 63. favor_last_30d_count,
  64. 64. favor_count,
  65. 65. appraise_last_30d_good_count,
  66. 66. appraise_last_30d_mid_count,
  67. 67. appraise_last_30d_bad_count,
  68. 68. appraise_last_30d_default_count,
  69. 69. appraise_good_count,
  70. 70. appraise_mid_count,
  71. 71. appraise_bad_count,
  72. 72. appraise_default_count
  73. 73. from dwt_sku_topic
  74. 74. )old
  75. 75. full outer join
  76. 76. (
  77. 77. select
  78. 78. sku_id,
  79. 79. sum(if(dt='2020-03-10', order_count,0 )) order_count,
  80. 80. sum(if(dt='2020-03-10',order_num ,0 )) order_num,
  81. 81. sum(if(dt='2020-03-10',order_amount,0 )) order_amount ,
  82. 82. sum(if(dt='2020-03-10',payment_count,0 )) payment_count,
  83. 83. sum(if(dt='2020-03-10',payment_num,0 )) payment_num,
  84. 84. sum(if(dt='2020-03-10',payment_amount,0 )) payment_amount,
  85. 85. sum(if(dt='2020-03-10',refund_count,0 )) refund_count,
  86. 86. sum(if(dt='2020-03-10',refund_num,0 )) refund_num,
  87. 87. sum(if(dt='2020-03-10',refund_amount,0 )) refund_amount,
  88. 88. sum(if(dt='2020-03-10',cart_count,0 )) cart_count,
  89. 89. sum(if(dt='2020-03-10',cart_num,0 )) cart_num,
  90. 90. sum(if(dt='2020-03-10',favor_count,0 )) favor_count,
  91. 91. sum(if(dt='2020-03-10',appraise_good_count,0 )) appraise_good_count,
  92. 92. sum(if(dt='2020-03-10',appraise_mid_count,0 ) ) appraise_mid_count ,
  93. 93. sum(if(dt='2020-03-10',appraise_bad_count,0 )) appraise_bad_count,
  94. 94. sum(if(dt='2020-03-10',appraise_default_count,0 )) appraise_default_count,
  95. 95. sum(order_count) order_count30 ,
  96. 96. sum(order_num) order_num30,
  97. 97. sum(order_amount) order_amount30,
  98. 98. sum(payment_count) payment_count30,
  99. 99. sum(payment_num) payment_num30,
  100. 100. sum(payment_amount) payment_amount30,
  101. 101. sum(refund_count) refund_count30,
  102. 102. sum(refund_num) refund_num30,
  103. 103. sum(refund_amount) refund_amount30,
  104. 104. sum(cart_count) cart_count30,
  105. 105. sum(cart_num) cart_num30,
  106. 106. sum(favor_count) favor_count30,
  107. 107. sum(appraise_good_count) appraise_good_count30,
  108. 108. sum(appraise_mid_count) appraise_mid_count30,
  109. 109. sum(appraise_bad_count) appraise_bad_count30,
  110. 110. sum(appraise_default_count) appraise_default_count30
  111. 111. from dws_sku_action_daycount
  112. 112. where dt >= date_add ('2020-03-10', -30)
  113. 113. group by sku_id
  114. 114. )new
  115. 115. on new.sku_id = old.sku_id
  116. 116. left join
  117. 117. (select * from dwd_dim_sku_info where dt='2020-03-10') sku_info
  118. 118. on nvl(new.sku_id,old.sku_id)= sku_info.id;
  119. 119.
  120. 复制代码

3)查询加载结果

  1. 1. select * from dwt_sku_topic limit 5;
  2. 2.
  3. 复制代码

2.4 DWT 层数据导入脚本

1)vim dws_to_dwt.sh
在脚本中填写如下内容

  1. 1. #!/bin/bash
  2. 2. APP=gmall
  3. 3. hive=/opt/modules/hive/bin/hive
  4. 4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  5. 5. if [ -n "$1" ] ;then
  6. 6. do_date=$1
  7. 7. else
  8. 8. do_date=`date -d "-1 day" +%F`
  9. 9. fi
  10. 10. sql="
  11. 11. insert overwrite table ${APP}.dwt_uv_topic
  12. 12. select
  13. 13. nvl(new.mid_id,old.mid_id),
  14. 14. nvl(new.user_id,old.user_id),
  15. 15. nvl(new.version_code,old.version_code),
  16. 16. nvl(new.version_name,old.version_name),
  17. 17. nvl(new.lang,old.lang),
  18. 18. nvl(new.source,old.source),
  19. 19. nvl(new.os,old.os),
  20. 20. nvl(new.area,old.area),
  21. 21. nvl(new.model,old.model),
  22. 22. nvl(new.brand,old.brand),
  23. 23. nvl(new.sdk_version,old.sdk_version),
  24. 24. nvl(new.gmail,old.gmail),
  25. 25. nvl(new.height_width,old.height_width),
  26. 26. nvl(new.app_time,old.app_time),
  27. 27. nvl(new.network,old.network),
  28. 28. nvl(new.lng,old.lng),
  29. 29. nvl(new.lat,old.lat),
  30. 30. nvl(old.login_date_first,'$do_date'),
  31. 31. if(new.login_count>0,'$do_date',old.login_date_last),
  32. 32. nvl(new.login_count,0),
  33. 33. nvl(new.login_count,0)+nvl(old.login_count,0)
  34. 34. from
  35. 35. (
  36. 36. select
  37. 37. *
  38. 38. from ${APP}.dwt_uv_topic
  39. 39. )old
  40. 40. full outer join
  41. 41. (
  42. 42. select
  43. 43. *
  44. 44. from ${APP}.dws_uv_detail_daycount
  45. 45. where dt='$do_date'
  46. 46. )new
  47. 47. on old.mid_id=new.mid_id;
  48. 48.
  49. 49.
  50. 50. insert overwrite table ${APP}.dwt_user_topic
  51. 51. select
  52. 52. nvl(new.user_id,old.user_id),
  53. 53. if(old.login_date_first is null and
  54. 54. new.login_count>0,'$do_date',old.login_date_first),
  55. 55. if(new.login_count>0,'$do_date',old.login_date_last),
  56. 56. nvl(old.login_count,0)+if(new.login_count>0,1,0),
  57. 57. nvl(new.login_last_30d_count,0),
  58. 58. if(old.order_date_first is null and
  59. 59. new.order_count>0,'$do_date',old.order_date_first),
  60. 60. if(new.order_count>0,'$do_date',old.order_date_last),
  61. 61. nvl(old.order_count,0)+nvl(new.order_count,0),
  62. 62. nvl(old.order_amount,0)+nvl(new.order_amount,0),
  63. 63. nvl(new.order_last_30d_count,0),
  64. 64. nvl(new.order_last_30d_amount,0),
  65. 65. if(old.payment_date_first is null and
  66. 66. new.payment_count>0,'$do_date',old.payment_date_first),
  67. 67. if(new.payment_count>0,'$do_date',old.payment_date_last),
  68. 68. nvl(old.payment_count,0)+nvl(new.payment_count,0),
  69. 69. nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
  70. 70. nvl(new.payment_last_30d_count,0),
  71. 71. nvl(new.payment_last_30d_amount,0)
  72. 72. from
  73. 73. (
  74. 74. select
  75. 75. *
  76. 76. from ${APP}.dwt_user_topic
  77. 77. )old
  78. 78. full outer join
  79. 79. (
  80. 80. select
  81. 81. user_id,
  82. 82. sum(if(dt='$do_date',login_count,0)) login_count,
  83. 83. sum(if(dt='$do_date',order_count,0)) order_count,
  84. 84. sum(if(dt='$do_date',order_amount,0)) order_amount,
  85. 85. sum(if(dt='$do_date',payment_count,0)) payment_count,
  86. 86. sum(if(dt='$do_date',payment_amount,0)) payment_amount,
  87. 87. sum(if(order_count>0,1,0)) login_last_30d_count,
  88. 88. sum(order_count) order_last_30d_count,
  89. 89. sum(order_amount) order_last_30d_amount,
  90. 90. sum(payment_count) payment_last_30d_count,
  91. 91. sum(payment_amount) payment_last_30d_amount
  92. 92. from ${APP}.dws_user_action_daycount
  93. 93. where dt>=date_add( '$do_date',-30)
  94. 94. group by user_id
  95. 95. )new
  96. 96. on old.user_id=new.user_id;
  97. 97. with
  98. 98. sku_act as
  99. 99. (
  100. 100. select
  101. 101. sku_id,
  102. 102. sum(if(dt='$do_date', order_count,0 )) order_count,
  103. 103. sum(if(dt='$do_date',order_num ,0 )) order_num,
  104. 104. sum(if(dt='$do_date',order_amount,0 )) order_amount ,
  105. 105. sum(if(dt='$do_date',payment_count,0 )) payment_count,
  106. 106. sum(if(dt='$do_date',payment_num,0 )) payment_num,
  107. 107. sum(if(dt='$do_date',payment_amount,0 )) payment_amount,
  108. 108. sum(if(dt='$do_date',refund_count,0 )) refund_count,
  109. 109. sum(if(dt='$do_date',refund_num,0 )) refund_num,
  110. 110. sum(if(dt='$do_date',refund_amount,0 )) refund_amount,
  111. 111. sum(if(dt='$do_date',cart_count,0 )) cart_count,
  112. 112. sum(if(dt='$do_date',cart_num,0 )) cart_num,
  113. 113. sum(if(dt='$do_date',favor_count,0 )) favor_count,
  114. 114. sum(if(dt='$do_date',appraise_good_count,0 )) appraise_good_count,
  115. 115. sum(if(dt='$do_date',appraise_mid_count,0 ) ) appraise_mid_count ,
  116. 116. sum(if(dt='$do_date',appraise_bad_count,0 )) appraise_bad_count,
  117. 117. sum(if(dt='$do_date',appraise_default_count,0 )) appraise_default_count,
  118. 118. sum( order_count ) order_count30 ,
  119. 119. sum( order_num ) order_num30,
  120. 120. sum(order_amount ) order_amount30,
  121. 121. sum(payment_count ) payment_count30,
  122. 122. sum(payment_num ) payment_num30,
  123. 123. sum(payment_amount ) payment_amount30,
  124. 124. sum(refund_count ) refund_count30,
  125. 125. sum(refund_num ) refund_num30,
  126. 126. sum(refund_amount ) refund_amount30,
  127. 127. sum(cart_count ) cart_count30,
  128. 128. sum(cart_num ) cart_num30,
  129. 129. sum(favor_count ) favor_count30,
  130. 130. sum(appraise_good_count ) appraise_good_count30,
  131. 131. sum(appraise_mid_count ) appraise_mid_count30,
  132. 132. sum(appraise_bad_count ) appraise_bad_count30,
  133. 133. sum(appraise_default_count ) appraise_default_count30
  134. 134. from ${APP}.dws_sku_action_daycount
  135. 135. where dt>=date_add ( '$do_date',-30)
  136. 136. group by sku_id
  137. 137. ),
  138. 138. sku_topic
  139. 139. as
  140. 140. (
  141. 141. select
  142. 142. sku_id,
  143. 143. spu_id,
  144. 144. order_last_30d_count,
  145. 145. order_last_30d_num,
  146. 146. order_last_30d_amount,
  147. 147. order_count,
  148. 148. order_num,
  149. 149. order_amount ,
  150. 150. payment_last_30d_count,
  151. 151. payment_last_30d_num,
  152. 152. payment_last_30d_amount,
  153. 153. payment_count,
  154. 154. payment_num,
  155. 155. payment_amount,
  156. 156. refund_last_30d_count,
  157. 157. refund_last_30d_num,
  158. 158. refund_last_30d_amount ,
  159. 159. refund_count ,
  160. 160. refund_num ,
  161. 161. refund_amount ,
  162. 162. cart_last_30d_count ,
  163. 163. cart_last_30d_num ,
  164. 164. cart_count ,
  165. 165. cart_num ,
  166. 166. favor_last_30d_count ,
  167. 167. favor_count ,
  168. 168. appraise_last_30d_good_count ,
  169. 169. appraise_last_30d_mid_count ,
  170. 170. appraise_last_30d_bad_count ,
  171. 171. appraise_last_30d_default_count ,
  172. 172. appraise_good_count ,
  173. 173. appraise_mid_count ,
  174. 174. appraise_bad_count ,
  175. 175. appraise_default_count
  176. 176. from ${APP}.dwt_sku_topic
  177. 177. )
  178. 178.
  179. 179.
  180. 180. insert overwrite table ${APP}.dwt_sku_topic
  181. 181. select
  182. 182. nvl(sku_act.sku_id,sku_topic.sku_id) ,
  183. 183. sku_info.spu_id,
  184. 184. nvl (sku_act.order_count30,0) ,
  185. 185. nvl (sku_act.order_num30,0) ,
  186. 186. nvl (sku_act.order_amount30,0) ,
  187. 187. nvl(sku_topic.order_count,0)+ nvl (sku_act.order_count,0) ,
  188. 188. nvl(sku_topic.order_num,0)+ nvl (sku_act.order_num,0) ,
  189. 189. nvl(sku_topic.order_amount,0)+ nvl (sku_act.order_amount,0),
  190. 190. nvl (sku_act.payment_count30,0),
  191. 191. nvl (sku_act.payment_num30,0),
  192. 192. nvl (sku_act.payment_amount30,0),
  193. 193. nvl(sku_topic.payment_count,0)+ nvl (sku_act.payment_count,0) ,
  194. 194. nvl(sku_topic.payment_num,0)+ nvl (sku_act.payment_count,0) ,
  195. 195. nvl(sku_topic.payment_amount,0)+ nvl (sku_act.payment_count,0) ,
  196. 196. nvl (refund_count30,0),
  197. 197. nvl (sku_act.refund_num30,0),
  198. 198. nvl (sku_act.refund_amount30,0),
  199. 199. nvl(sku_topic.refund_count,0)+ nvl (sku_act.refund_count,0),
  200. 200. nvl(sku_topic.refund_num,0)+ nvl (sku_act.refund_num,0),
  201. 201. nvl(sku_topic.refund_amount,0)+ nvl (sku_act.refund_amount,0),
  202. 202. nvl(sku_act.cart_count30,0) ,
  203. 203. nvl(sku_act.cart_num30,0) ,
  204. 204. nvl(sku_topic.cart_count ,0)+ nvl (sku_act.cart_count,0),
  205. 205. nvl( sku_topic.cart_num ,0)+ nvl (sku_act.cart_num,0),
  206. 206. nvl(sku_act.favor_count30 ,0) ,
  207. 207. nvl (sku_topic.favor_count ,0)+ nvl (sku_act.favor_count,0),
  208. 208. nvl (sku_act.appraise_good_count30 ,0) ,
  209. 209. nvl (sku_act.appraise_mid_count30 ,0) ,
  210. 210. nvl (sku_act.appraise_bad_count30 ,0) ,
  211. 211. nvl (sku_act.appraise_default_count30 ,0) ,
  212. 212. nvl (sku_topic.appraise_good_count ,0)+ nvl
  213. 213. (sku_act.appraise_good_count,0) ,
  214. 214. nvl (sku_topic.appraise_mid_count ,0)+ nvl
  215. 215. (sku_act.appraise_mid_count,0) ,
  216. 216. nvl (sku_topic.appraise_bad_count ,0)+ nvl
  217. 217. (sku_act.appraise_bad_count,0) ,
  218. 218. nvl (sku_topic.appraise_default_count ,0)+ nvl
  219. 219. (sku_act.appraise_default_count,0)
  220. 220. from sku_act
  221. 221. full outer join sku_topic
  222. 222. on sku_act.sku_id =sku_topic.sku_id
  223. 223. left join
  224. 224. (select * from ${APP}.dwd_dim_sku_info where dt='$do_date') sku_info
  225. 225. on nvl(sku_topic.sku_id,sku_act.sku_id)= sku_info.id;
  226. 226. "
  227. 227. $hive -e "$sql"
  228. 228.
  229. 复制代码

2)增加脚本执行权限

  1. 1. chmod 770 dws_to_dwt.sh

3)执行脚本导入数据

  1. 1. dws_to_dwt.sh 2020-03-11

4)查看导入数据

  1. 1. select * from dwt_uv_topic limit 5;
  2. 2. select * from dwt_user_topic limit 5;
  3. 3. select * from dwt_sku_topic limit 5;