一、补充:

1、hive的本地模式

可以在hive的conf下,创建一个.hiverc 在这个里面写配置文件。
复习:hive的配置文件的三种写法
image.png

  1. hive.exec.mode.local.auto.inputbytes.max
  2. 如果 hive.exec.mode.local.auto true,当输入文件大小小于此阈值时可以自动在本地模式运行,默认是 128兆。
  3. 134217728
  4. hive.exec.mode.local.auto.input.files.max map小于这个任务数时启动本地模式。
  5. 默认值:
  6. hive.exec.mode.local.auto=true (默认是false)
  7. hive.exec.mode.local.auto.inputbytes.max=134217728(默认是128M)
  8. hive.exec.mode.local.auto.input.files.max=4(默认是4)
  9. set hive.exec.mode.local.auto=true 是打开 hive ⾃动判断是否启动本地模式的开关,但是只是打开这个参数并不能保证启动本地模式,要当
  10. map 任务数不超过
  11. hive.exec.mode.local.auto.input.files.max 的个数并且 map 输⼊⽂件⼤⼩不超过
  12. hive.exec.mode.local.auto.inputbytes.max 所指定的⼤⼩时,才能启动本地模式。

2、case when 有两种用法,等值查询和非等值查询。

等值查询: select case sex when 1 then ‘male’ when 2 then ‘female’ else ‘mixed’ end as sexual,sex from tbl

select case
when price>100 then ‘expensive’ when price<=100 then ‘cheap’ else ‘异常值’ end as price,价格分类 from tbl

和sum一起使用: :::danger select
sum(case u.sex when 1 then 1 else 0 end)男性,
sum(case u.sex when 2 then 1 else 0 end)女性,
sum(case when u.sex <>1 and u.sex<>2 then 1 else 0 end)性别为空
from users u; :::

二、ads的指标

先创建一个数据库 ads_nshop;

  1. drop database ads_nshop cascade;
  2. create database ads_nshop;
  3. use ads_nshop;

1、流量类指标_平台统计表
按照性别,年龄,以及所在区域统计网站访问量以及人均访问数量

  1. create external table if not exists ads_nshop.ads_nshop_platform_flow_stat(
  2. customer_gender TINYINT COMMENT '性别:1男 0女',
  3. age_range string COMMENT '年龄段',
  4. customer_natives string COMMENT '所在地区',
  5. visit_count int comment '访问量',
  6. visit_avg_counts int comment '人均页面访问数'
  7. ) partitioned by (bdp_day string)
  8. row format delimited fields terminated by ','
  9. stored as textfile
  10. location '/data/nshop/ads/operation/ads_nshop_platform_flow_stat/'

分析:需要两个表:
dws_nshop.dws_nshop_ulog_view 和 customer 表进行关联

  1. insert overwrite table ads_nshop.ads_nshop_platform_flow_stat partition(bdp_day='20220509')
  2. select
  3. b.customer_gender,
  4. b.customer_age_range,
  5. b.customer_natives,
  6. sum(view_count),
  7. sum(view_count)/count(a.user_id)
  8. from
  9. dws_nshop.dws_nshop_ulog_view a
  10. join
  11. ods_nshop.ods_02_customer b
  12. on a.user_id = b.customer_id
  13. where a.bdp_day='20220509'
  14. group by
  15. b.customer_gender,
  16. b.customer_age_range,
  17. b.customer_natives

2、流量类指标_平台流量统计P/UV

  1. create external table if not exists ads_nshop.ads_nshop_flowpu_stat(
  2. uv bigint comment '独立访客数',
  3. pv bigint comment '页面访客数',
  4. pv_avg double comment '人均页面访问数'
  5. ) partitioned by (bdp_day string)
  6. row format delimited fields terminated by ','
  7. stored as textfile
  8. location '/data/nshop/ads/operation/ads_nshop_flow/'

一个表就可以完成:dws_nshop.dws_nshop_ulog_view

  1. insert overwrite table ads_nshop.ads_nshop_flowpu_stat partition(bdp_day='20220509')
  2. select
  3. count(distinct user_id),
  4. sum(view_count),
  5. sum(view_count)/count(distinct user_id)
  6. from
  7. dws_nshop.dws_nshop_ulog_view
  8. where bdp_day='20220509';

3、平台搜索热词统计

  1. create external table if not exists ads_nshop.ads_nshop_search_keys(
  2. search_keys string comment '搜索内容',
  3. gender string comment '性别',
  4. age_range string comment '年龄段',
  5. os string comment '手机系统',
  6. manufacturer string comment '手机制造商',
  7. area_code string comment '地区编码',
  8. search_users int comment '此搜索内容用户数',
  9. search_records int comment '此搜索内容查询次数',
  10. search_orders string comment '查询排序',
  11. search_targets int comment '查询目标数量'
  12. ) partitioned by (bdp_day string)
  13. row format delimited fields terminated by ','
  14. stored as textfile
  15. location '/data/nshop/ads/operation/ads_nshop_search_keys/';

分析: 需要customer 表,因为性别,年龄段。 需要dwd_nshop.dwd_nshop_actlog_pdtsearch 因为里面有搜索内容。

  1. with t1 as (
  2. select
  3. b.target_keys,
  4. a.customer_gender,
  5. a.customer_age_range,
  6. b.os ,
  7. b.manufacturer,
  8. b.carrier string,
  9. b.area_code,
  10. b.target_order,
  11. b.target_id,
  12. a.customer_id
  13. from
  14. ods_nshop.ods_02_customer a
  15. join
  16. dwd_nshop.dwd_nshop_actlog_pdtsearch b
  17. on a.customer_id = b.user_id
  18. where b.bdp_day='20220509'
  19. )
  20. insert overwrite table ads_nshop.ads_nshop_search_keys partition(bdp_day='20220509')
  21. select
  22. target_keys,
  23. customer_gender,
  24. customer_age_range,
  25. os,
  26. manufacturer,
  27. area_code,
  28. count(distinct customer_id),
  29. count(customer_id),
  30. target_order,
  31. count(target_id)
  32. from t1
  33. group by
  34. target_keys,
  35. customer_gender,
  36. customer_age_range,
  37. os,
  38. manufacturer,
  39. area_code,
  40. target_order;

4、用户类启动情况表

  1. create external table if not exists ads_nshop.ads_nshop_actlog_launch_gsets(
  2. os string comment '手机系统',
  3. manufacturer string comment '手机制造商',
  4. carrier string comment '电信运营商',
  5. network_type string comment '网络类型',
  6. area_code string comment '地区编码',
  7. user_count int comment '用户数',
  8. launch_count int comment '启动次数'
  9. ) partitioned by (bdp_day string)
  10. row format delimited fields terminated by ','
  11. stored as textfile
  12. location '/data/nshop/ads/actlog/ads_nshop_actlog_launch_gsets/'

分析:

  1. insert overwrite table ads_nshop.ads_nshop_actlog_launch_gsets partition(bdp_day='20220509')
  2. select
  3. os,
  4. manufacturer,
  5. carrier,
  6. network_type,
  7. area_code,
  8. count(distinct user_id),
  9. sum(launch_count)
  10. from
  11. dws_nshop.dws_nshop_ulog_launch
  12. where bdp_day='20220509'
  13. group by
  14. os,
  15. manufacturer,
  16. carrier,
  17. network_type,
  18. area_code;

多维分析的情况:

  1. create external table if not exists ads_nshop.ads_nshop_actlog_launch_rollup(
  2. os string comment '手机系统',
  3. manufacturer string comment '手机制造商',
  4. carrier string comment '电信运营商',
  5. network_type string comment '网络类型',
  6. area_code string comment '地区编码',
  7. user_count int comment '用户数',
  8. launch_count int comment '启动次数'
  9. ) partitioned by (bdp_day string)
  10. row format delimited fields terminated by ','
  11. stored as textfile
  12. location '/data/nshop/ads/actlog/ads_nshop_actlog_launch_rollup/'
  1. insert overwrite table ads_nshop.ads_nshop_actlog_launch_rollup partition(bdp_day='20220509')
  2. select
  3. os,
  4. manufacturer,
  5. carrier,
  6. network_type,
  7. area_code,
  8. count(distinct user_id),
  9. sum(launch_count)
  10. from
  11. dws_nshop.dws_nshop_ulog_launch
  12. where bdp_day='20220509'
  13. group by
  14. os,
  15. manufacturer,
  16. carrier,
  17. network_type,
  18. area_code
  19. with rollup;

5、用户类指标统计表

  1. create external table if not exists ads_nshop.ads_nshop_customer_stat(
  2. customer_gender TINYINT COMMENT '性别:1男 0女',
  3. age_range string COMMENT '年龄段',
  4. customer_natives string COMMENT '所在地区',
  5. total_counts int comment '总用户数',
  6. add_counts int comment '新增用户数',
  7. active_counts int comment '活跃用户数'
  8. ) partitioned by (bdp_day string)
  9. row format delimited fields terminated by ','
  10. stored as textfile
  11. location '/data/nshop/ads/operation/ads_nshop_customer_stat/'

可以使用用户表 ods_nshop.ods_02_customer 以及启动表dws_nshop.dws_nshop_ulog_launch。
用到了一个函数:select from_unixtime(cast(1572608342000/1000 as int),’yyyyMMdd’) ;
什么是新增用户?就是注册时间是某一天,对于某一天来讲,就是新增。

  1. insert overwrite table ads_nshop.ads_nshop_customer_stat partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. a.customer_natives,
  6. count(distinct a.customer_id) as total_counts,
  7. count(case when from_unixtime(cast(a.customer_ctime/1000 as int),'yyyyMMdd')='20191102' then 1 else 0 end) as add_counts,
  8. count(b.user_id) as active_counts
  9. from ods_nshop.ods_02_customer a
  10. join dws_nshop.dws_nshop_ulog_launch b
  11. on a.customer_id = b.user_id
  12. where b.bdp_day='20220509'
  13. group by
  14. a.customer_gender,
  15. a.customer_age_range,
  16. a.customer_natives;

6、总体运营指标统计表

  1. create external table if not exists ads_nshop.ads_nshop_oper_stat(
  2. customer_gender TINYINT COMMENT '性别:1男 0女',
  3. age_range string COMMENT '年龄段',
  4. customer_natives string COMMENT '所在地区',
  5. product_type int comment '商品类别',
  6. order_counts int comment '订单数',
  7. order_rate double comment '下单率',
  8. order_amounts int comment '销售总金额',
  9. order_discounts int comment '优惠总金额',
  10. shipping_amounts int comment '运费总金额',
  11. per_customer_transaction int comment '客单价'
  12. ) partitioned by (bdp_day string)
  13. row format delimited fields terminated by ','
  14. stored as textfile
  15. location '/data/nshop/ads/operation/ads_nshop_oper_stat/'

分析:前面三个字段是用户表里的字段(customer),商品表(dim_pub_product),
订单数量等信息表示肯定跟订单有关(order_details),根据下单率应该推算出来有商品的浏览表(ulog_view)。
用户表:ods_nshop.ods_02_customer
商品表: ods_nshop.dim_pub_product
订单明细表:dwd_nshop.dwd_nshop_orders_details
浏览情况表:dws_nshop.dws_nshop_ulog_view

  1. insert overwrite table ads_nshop.ads_nshop_oper_stat partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. a.customer_natives,
  6. c.category_code,
  7. count(distinct b.order_id),
  8. count(distinct b.order_id)/sum(d.view_count),
  9. sum(b.payment_money),
  10. sum(b.district_money),
  11. sum(b.shipping_money),
  12. sum(b.payment_money)/count(distinct b.customer_id)
  13. from
  14. ods_nshop.ods_02_customer a
  15. join
  16. dwd_nshop.dwd_nshop_orders_details b
  17. on a.customer_id =b.customer_id
  18. join ods_nshop.dim_pub_product c
  19. on b.product_code = c.product_code
  20. join dws_nshop.dws_nshop_ulog_view d
  21. on d.user_id = a.customer_id
  22. group by
  23. a.customer_gender,
  24. a.customer_age_range,
  25. a.customer_natives,
  26. c.category_code

7、风控类指标统计表

  1. create external table if not exists ads_nshop.ads_nshop_risk_mgt(
  2. customer_gender TINYINT COMMENT '性别:1男 0女',
  3. age_range string COMMENT '年龄段',
  4. customer_natives string COMMENT '所在地区',
  5. product_type int comment '商品类别',
  6. start_complaint_counts int comment '发起投诉数',
  7. complaint_rate float comment '投诉率'
  8. ) partitioned by (bdp_day string)
  9. row format delimited fields terminated by ','
  10. stored as textfile
  11. location '/data/nshop/ads/operation/ads_nshop_risk_mgt/';

分析:
用户表:ods_nshop.ods_02_customer
商品表: ods_nshop.dim_pub_product
/投诉表:dws_nshop.dws_nshop_user_complainant
投诉率= 投诉数量/订单总数
/
关联订单明细表:dwd_nshop.dwd_nshop_orders_details

  1. insert overwrite table ads_nshop.ads_nshop_risk_mgt partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. a.customer_natives,
  6. c.category_code,
  7. count(distinct case when b.order_status=6 then b.order_id end),
  8. count(distinct case when b.order_status=6 then b.order_id end)/count(distinct b.order_id)
  9. from
  10. ods_nshop.ods_02_customer a
  11. join
  12. dwd_nshop.dwd_nshop_orders_details b
  13. on a.customer_id =b.customer_id
  14. join ods_nshop.dim_pub_product c
  15. on b.product_code = c.product_code
  16. where b.bdp_day='20220509'
  17. group by
  18. a.customer_gender,
  19. a.customer_age_range,
  20. a.customer_natives,
  21. c.category_code;

8、支付类指标统计

  1. create external table if not exists ads_nshop.ads_nshop_pay_stat_gsets(
  2. customer_gender TINYINT COMMENT '性别:1男 0女',
  3. age_range string COMMENT '年龄段',
  4. customer_natives string COMMENT '所在地区',
  5. pay_type string COMMENT '支付类型',
  6. pay_nettype string COMMENT '支付网络方式',
  7. pay_amounts int comment '支付金额',
  8. pay_success int comment '支付成功数',
  9. pay_fail int comment '支付失败数',
  10. pay_order_counts int comment '支付订单数',
  11. pay_user_counts int comment '支付用户数',
  12. pay_product_counts int comment '支付商品数',
  13. order_pay_amount_rate float comment '下单-支付金额转换率',
  14. order_pay_user_rate float comment '下单-支付买家数转换率'
  15. ) partitioned by (bdp_day string)
  16. row format delimited fields terminated by ','
  17. stored as textfile
  18. location '/data/nshop/ads/operation/ads_nshop_pay_stat_gsets/';

分析:ods_nshop.ods_02_customer 表,dwd_nshop.dwd_nshop_orders_details 订单明细表,
ods_nshop.ods_02_orders_pay_records 支付记录明细表
count(某个字段),如果该字段有null值,那么就跳过。

  1. insert overwrite table ads_nshop.ads_nshop_pay_stat_gsets partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. a.customer_natives,
  6. b.pay_type,
  7. b.pay_nettype,
  8. sum(c.pay_amount),
  9. count(case when c.pay_status='1' then 1 end),
  10. count(case when c.pay_status='0' then 1 end),
  11. count(distinct b.order_id),
  12. count(distinct a.customer_id),
  13. sum(b.product_cnt),
  14. sum(case when c.pay_status='1' then c.pay_amount end)/sum(b.payment_money),
  15. count(distinct case when c.pay_status='1' then c.customer_id end)/count(distinct b.customer_id)
  16. from
  17. ods_nshop.ods_02_customer a
  18. join
  19. dwd_nshop.dwd_nshop_orders_details b
  20. on a.customer_id = b.customer_id
  21. join
  22. ods_nshop.ods_02_orders_pay_records c
  23. on b.order_id = c.order_id
  24. where b.bdp_day="20220509"
  25. group by
  26. a.customer_gender,
  27. a.customer_age_range,
  28. a.customer_natives,
  29. b.pay_type,
  30. b.pay_nettype

执行报错:
image.png

image.png
关闭本地模式:

  1. set hive.exec.mode.local.auto=false;

9、支付统计TopN表

  1. create external table if not exists ads_nshop.ads_nshop_pay_stat_topn(
  2. pay_type string COMMENT '支付类型',
  3. customer_area_code string COMMENT '所在地区',
  4. pay_count int COMMENT '支付数量',
  5. pay_amounts int comment '支付金额'
  6. ) partitioned by (bdp_day string)
  7. row format delimited fields terminated by ','
  8. stored as textfile
  9. location '/data/nshop/ads/operation/ads_nshop_pay_stat_topn/';

地区字段只有customer表有,所以需要customer表,剩余的都是支付的数据,使用支付明细表ods_nshop.ods_02_orders_pay_records

  1. insert overwrite table ads_nshop.ads_nshop_pay_stat_topn partition(bdp_day='20220509')
  2. select
  3. b.pay_type,
  4. a.customer_natives,
  5. count(distinct b.pay_id) as pay_count,
  6. sum(b.pay_amount) as pay_amount
  7. from
  8. ods_nshop.ods_02_customer a
  9. join
  10. ods_nshop.ods_02_orders_pay_records b
  11. on a.customer_id = b.customer_id
  12. group by
  13. a.customer_natives,
  14. b.pay_type
  15. order by
  16. pay_amount desc
  17. limit 50;

10、交易类指标表

  1. create external table if not exists ads_nshop.ads_nshop_busi_stat(
  2. customer_gender TINYINT COMMENT '性别:1男 0女',
  3. age_range string COMMENT '年龄段',
  4. customer_area_code string COMMENT '所在地区',
  5. category_type string COMMENT '商品类别',
  6. supplier_type string COMMENT '店铺类别',
  7. busi_succ_orders int comment '交易成功订单数',
  8. busi_succ_amounts int comment '交易成功金额',
  9. busi_succ_users int comment '交易成功买家数',
  10. busi_succ_products int comment '交易成功商品数',
  11. busi_fail_orders int comment '交易失败订单数',
  12. busi_fail_amounts int comment '交易失败金额'
  13. ) partitioned by (bdp_day string)
  14. row format delimited fields terminated by ','
  15. stored as textfile
  16. location '/data/nshop/ads/operation/ads_nshop_busi_stat/';

有一个customer, 一个 订单详情表,商家表(供货商), 商品表

  1. insert overwrite table ads_nshop.ads_nshop_busi_stat partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. a.customer_natives,
  6. d.category_code,
  7. c.supplier_type,
  8. count(distinct case when b.order_status!=7 then b.order_id end),
  9. sum(distinct case when b.order_status!=7 then b.payment_money end),
  10. count(distinct case when b.order_status!=7 then b.customer_id end),
  11. count(case when b.order_status!=7 then b.product_code end),
  12. count(distinct case when b.order_status=7 then b.order_id end),
  13. sum(distinct case when b.order_status=7 then b.payment_money end),
  14. from
  15. ods_nshop.ods_02_customer a
  16. join
  17. dwd_nshop.dwd_nshop_orders_details b
  18. on a.customer_id = b.customer_id
  19. join ods_nshop.dim_pub_supplier c
  20. on b.supplier_code = c.supplier_code
  21. join ods_nshop.dim_pub_product d
  22. on b.product_code = d.product_code
  23. where b.bdp_day='20220509'
  24. group by
  25. a.customer_gender,
  26. a.customer_age_range,
  27. a.customer_natives,
  28. d.category_code,
  29. c.supplier_type

11、交易TopN统计表

  1. create external table if not exists ads_nshop.ads_nshop_busi_stat_topn(
  2. customer_natives string COMMENT '所在地区',
  3. category_type string COMMENT '商品类别',
  4. supplier_type string COMMENT '店铺类别',
  5. busi_succ_orders int comment '交易成功订单数',
  6. busi_succ_amounts int comment '交易成功金额',
  7. busi_succ_users int comment '交易成功买家数',
  8. busi_succ_products int comment '交易成功商品数',
  9. busi_fail_orders int comment '交易失败订单数',
  10. busi_fail_amounts int comment '交易失败金额'
  11. ) partitioned by (bdp_day string)
  12. row format delimited fields terminated by ','
  13. stored as textfile
  14. location '/data/nshop/ads/operation/ads_nshop_busi_stat_topn/';
  1. insert overwrite table ads_nshop.ads_nshop_busi_stat_topn partition(bdp_day='20220509')
  2. select
  3. a.customer_natives,
  4. d.category_code,
  5. c.supplier_type,
  6. count(distinct case when b.order_status!=7 then b.order_id end),
  7. sum(distinct case when b.order_status!=7 then b.payment_money end) as pay_money,
  8. count(distinct case when b.order_status!=7 then b.customer_id end),
  9. count(case when b.order_status!=7 then b.product_code end),
  10. count(distinct case when b.order_status=7 then b.order_id end),
  11. sum(distinct case when b.order_status=7 then b.payment_money end)
  12. from
  13. ods_nshop.ods_02_customer a
  14. join
  15. dwd_nshop.dwd_nshop_orders_details b
  16. on a.customer_id = b.customer_id
  17. join ods_nshop.dim_pub_supplier c
  18. on b.supplier_code = c.supplier_code
  19. join ods_nshop.dim_pub_product d
  20. on b.product_code = d.product_code
  21. where b.bdp_day='20220509'
  22. group by
  23. a.customer_natives,
  24. d.category_code,
  25. c.supplier_type
  26. order by pay_money desc
  27. limit 100;

12、广告投放类指标统计

  1. create external table if not exists ads_nshop.ads_nshop_release_stat(
  2. device_type string comment '设备类型',
  3. os string comment '手机系统',
  4. customer_gender TINYINT comment '性别:1男 0女',
  5. age_range string comment '年龄段',
  6. customer_natives string comment '所在地区',
  7. release_sources string comment '投放渠道',
  8. release_category string comment '投放浏览产品分类',
  9. visit_total_customers int comment '总访客数',
  10. visit_total_counts int comment '总访问次数'
  11. ) partitioned by (bdp_day string)
  12. row format delimited fields terminated by ','
  13. stored as textfile
  14. location '/data/nshop/ads/operation/ads_nshop_release_stat/';

分析:
需要一个客户表以及dwd_nshop.dwd_nshop_releasedatas

  1. insert overwrite table ads_nshop.ads_nshop_release_stat partition(bdp_day='20220509')
  2. select
  3. a.device_type,
  4. a.os,
  5. b.customer_gender,
  6. b.customer_age_range,
  7. b.customer_natives,
  8. a.release_sources,
  9. a.release_category,
  10. count(distinct a.customer_id),
  11. count(1)
  12. from
  13. dwd_nshop.dwd_nshop_releasedatas a
  14. join
  15. ods_nshop.ods_02_customer b
  16. on a.customer_id = b.customer_id
  17. where a.bdp_day='20220509'
  18. group by
  19. a.device_type,
  20. a.os,
  21. b.customer_gender,
  22. b.customer_age_range,
  23. b.customer_natives,
  24. a.release_sources,
  25. a.release_category