一、关于数据的问题

为什么每次都是2022-05-09 ?
因为真实的情况应该是每天都要从mysql中,埋点数据中,以及第三方的日志中获取新数据。
ods层应该每天都会有新的分区数据产生,因为我们没有,只采集了一次,所以我们就老拿这一天的数据。
每天都要重复我们项目前几天的工作。所以一般开发一次,将这些内容编写成脚本直接运行即可。
想到azkaban可以定时每天几点执行,执行任务,我们有几个任务?
任务一:从其他三个数据来源中采集数据到ods层,任务二:从ods层处理数据到dwd层,任务三:dwd-dws
任务四:dws->ads
编写ods层的命令:
里面是使用sqoop命令实现增量导入各个表数据(参照第一天的sqoopJob.sh)生成我们的ods.sh
编写dwd.sh:

  1. #!/bin/bash
  2. /usr/local/hive/bin/hive \
  3. -f dwd.hql

将我们前几天写的dwd的sql语句全部粘贴过来,变成了 dwd.hql
就是将我们第二天的SQL语句全部粘贴到了dwd.hql里面,当然这个里面可以传参。
编写:dws.sh

  1. #!/bin/bash
  2. /usr/local/hive/bin/hive \
  3. -f dws.hql

注意:如果在shell脚本中,使用到了某个命令一定要是 全路径,不要认为配置了环境变量就万事大吉。
dws.hql 语句就是我们第三天编写的sql语句,全部粘贴进去了,当然日期可以传参数。
编写:ads.sh

  1. #!/bin/bash
  2. /usr/local/hive-3.1.2/bin/hive \
  3. -f ads.hql

ads.hql语句就是昨天编写的课堂上的语句,汇总在一起的东西。
现在就有4个脚本了,ods.sh,dwd.sh,dws.sh,ads.sh 四个脚本(shell编程)
编写一个azkaban的任务:

  1. nodes:
  2. - name: ads
  3. type: command
  4. config:
  5. command: sh ads.sh
  6. dependsOn:
  7. - dws
  8. - name: dws
  9. type: command
  10. config:
  11. command: sh dws.sh
  12. dependsOn:
  13. - dwd
  14. - name: dwd
  15. type: command
  16. config:
  17. command: sh dwd.sh
  18. dependsOn:
  19. - ods
  20. - name: ods
  21. type: command
  22. config:
  23. command: sh ods.sh

接着编写project, 打包上传至azkaban,即可,需要设置每天几点执行(定时任务),设置邮箱提醒。

二、复习SuperSet

如何启动/停止 superset:

  1. 先进入superset环境:
  2. conda activate superset
  3. 接着在环境中,执行这个命令:
  4. gunicorn -w 1 -t 120 -b bigdata01:8787 "superset.app:create_app()"
  5. workers:指定进程个数
  6. timeoutworker进程超时时间,超时会自动重启
  7. bind:绑定本机地址,即为Superset访问地址
  8. daemon:后台运行
  9. 如何停止superset?
  10. 停止superset
  11. 停掉gunicorn进程:ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
  12. 退出superset环境:conda deactivate

三、DataX(类上sqoop)

1、安装
1)将DataX上传至/home/soft下,解压至/usr/local/
image.png

  1. tar -xvf datax.tar.gz -C /usr/local/

2)复制hive中mysql驱动至 /usr/local/datax/lib下

  1. cp /usr/local/hive/lib/mysql-connector-java-8.0.26.jar /usr/local/datax/lib/

3)创建Hive表,将mysql的表数据抽取到hive表中,需要编写json文件
报错:
配置信息错误,您提供的配置文件[/usr/local/datax/plugin/reader/._drdsreader/plugin.json]不存在. 请检查您的配置文件
解决办法:

  1. 需要删除隐藏文件 (重要)
  2. rm -rf /usr/local/datax/plugin/*/._*

2、DataX的相关的概念

  1. https://github.com/alibaba/DataX

github 中文中有一个叫做码云的网站:

  1. https://gitee.com/

image.png

  1. DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQLOracleOceanBaseSqlServerPostgreHDFSHiveADSHBaseTableStore(OTS)、MaxCompute(ODPS)、HologresDRDS 等各种异构数据源之间高效的数据同步功能。

3、使用:
mysql的数据导入hive:
image.png
导入到mysql中。
image.png
接着将area_code中的数据导入ods_nshop中。(mysql->Hive)
首先需要在hive中创建表:

  1. create table if not exists ods_nshop.ods_01_base_area (
  2. id int COMMENT 'id标识',
  3. area_code string COMMENT '省份编码',
  4. province_name string COMMENT '省份名称',
  5. iso string COMMENT 'ISO编码'
  6. )row format delimited fields terminated by ','
  7. stored as TextFile
  8. location '/data/nshop/ods/ods_01_base_area/'

使用datax将mysql的数据导入到hive:
需要编写datax的脚本,json格式的:,在datax 中的job 文件夹下,创建 01.json

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. },
  7. "errorLimit": {
  8. "record": 0,
  9. "percentage": 0.02
  10. }
  11. },
  12. "content": [
  13. {
  14. "reader": {
  15. "name": "mysqlreader",
  16. "parameter": {
  17. "writeMode": "insert",
  18. "username": "root",
  19. "password": "123456",
  20. "column": [
  21. "id",
  22. "area_code",
  23. "province_name",
  24. "iso"
  25. ],
  26. "splitPk": "id",
  27. "connection": [
  28. {
  29. "table": [
  30. "base_area"
  31. ],
  32. "jdbcUrl": [
  33. "jdbc:mysql://192.168.32.100:3306/nshop"
  34. ]
  35. }
  36. ]
  37. }
  38. },
  39. "writer": {
  40. "name": "hdfswriter",
  41. "parameter": {
  42. "defaultFS": "hdfs://bigdata01:9820",
  43. "fileType": "text",
  44. "path": "/data/nshop/ods/ods_01_base_area/",
  45. "fileName": "base_area_txt",
  46. "column": [
  47. {
  48. "name": "id",
  49. "type": "int"
  50. },
  51. {
  52. "name": "area_code",
  53. "type": "string"
  54. },
  55. {
  56. "name": "province_name",
  57. "type": "string"
  58. },
  59. {
  60. "name": "iso",
  61. "type": "string"
  62. }
  63. ],
  64. "writeMode": "append",
  65. "fieldDelimiter": ","
  66. }
  67. }
  68. }
  69. ]
  70. }
  71. }

image.png
因为每次执行datax.py 这个命令的时候都要写路径,很麻烦,可以配置datax的环境变量:

  1. export DATAX_HOME=/usr/local/datax
  2. export PATH=$PATH:$DATAX_HOME/bin

记得刷新环境变量:

  1. source /etc/profile

继续运行我们的脚本:

  1. datax.py job/01.json

一定要注意数据库的连接和账户密码是否正确!!!!
导入结束看一下数据是否正确:
image.png

四、DataX详细讲解

1、玩一下自带的案例 job.json 就是从控制台到控制台
image.png
image.png
image.png
每一个TaskGroup 里面有5个并行的任务。所以我们的Datax脚本在运行的时候,是多线程同时进行的,速度很快。
DataX是基于内存的数据转换工具。Sqoop是基于磁盘的(MapTask),DataX的执行效率要远远高于Sqoop。

我们的DataX可以完成哪些类型的数据转换?

  1. Stream(控制台) --> Stream
  2. mysql --> Hdfs
  3. hdfs --> Mysql
  4. hive --> mysql
  5. mysql --> Hive

调优手段:
两个地方:
1、增加并发度
image.png
2、调整jvm内存大小
image.png
关于配置文件,有两处,一个是核心配置文件 datax的conf下的core.json
修改次数就是全局修改,一般我们不修改这个地方。
我们一般都会在自己的json文件开头的部分,设置并发度:
image.png
在运行json的时候,可以指定jvm内存大小:

  1. datax.py --jvm="-Xms3G -Xmx3G" ../job/test.json

五、使用SuperSet展示图表

1、根据用户所在地区展示全国的访问量:

  1. create external table if not exists ads_nshop.ads_nshop_customer(
  2. customer_gender string COMMENT '性别:1男 0女',
  3. os string comment '手机系统',
  4. customer_natives string COMMENT '所在地区ISO编码',
  5. user_view_count int comment '每个用户浏览次数'
  6. ) partitioned by (bdp_day string)
  7. row format delimited fields terminated by ','
  8. stored as TextFile
  9. location '/data/nshop/ads/operation/ads_nshop_customer/';

将之前的两个表关联,查询的数据插入到新表中:

  1. insert overwrite table ads_nshop.ads_nshop_customer partition(bdp_day='20220509')
  2. select
  3. b.customer_gender,
  4. a.os,
  5. c.iso,
  6. a.view_count
  7. from dws_nshop.dws_nshop_ulog_view a
  8. join ods_nshop.ods_02_customer b
  9. on a.user_id=b.customer_id
  10. join ods_nshop.ods_01_base_area c
  11. on b.customer_natives=c.area_code
  12. where a.bdp_day='20220509';

将统计的结果导出到mysql中(datax)
现在mysql中创建一个表:

  1. CREATE TABLE `ads_nshop_customer` (
  2. `customer_gender` tinyint(4) DEFAULT NULL,
  3. `os` varchar(255) DEFAULT NULL,
  4. `iso` varchar(255) DEFAULT NULL,
  5. `user_view_count` int(11) DEFAULT NULL
  6. )

接着编写02.json

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_customer/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "string"
  19. },
  20. {
  21. "index": 1,
  22. "type": "string"
  23. },
  24. {
  25. "index": 2,
  26. "type": "string"
  27. },
  28. {
  29. "index": 3,
  30. "type": "long"
  31. }
  32. ],
  33. "fileType": "text",
  34. "encoding": "UTF-8",
  35. "fieldDelimiter": ","
  36. }
  37. },
  38. "writer": {
  39. "name": "mysqlwriter",
  40. "parameter": {
  41. "writeMode": "insert",
  42. "username": "root",
  43. "password": "123456",
  44. "column": [
  45. "customer_gender",
  46. "os",
  47. "iso",
  48. "user_view_count"
  49. ],
  50. "connection": [
  51. {
  52. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  53. "table": [
  54. "ads_nshop_customer"
  55. ]
  56. }
  57. ]
  58. }
  59. }
  60. }
  61. ]
  62. }
  63. }

运行该脚本完成数据的导出:

  1. datax.py job/02.json -p "-Ddt=20220509"

接着继续superSet的展示:
image.png
image.png

  1. mysql://root:123456@bigdata01/nshop?charset=utf8

image.png
image.png
image.png
image.png
双击该表,进入页面的编辑:
image.png
image.png
image.png

2、平台浏览统计

先在mysql中创建数据库:

  1. CREATE TABLE `ads_nshop_flowpu_stat` (
  2. `uv` int DEFAULT NULL,
  3. `pv` int DEFAULT NULL,
  4. `pv_avg` double DEFAULT NULL
  5. )

接着将ads层统计的数据导出mysql:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_flow/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "Long"
  19. },
  20. {
  21. "index": 1,
  22. "type": "Long"
  23. },
  24. {
  25. "index": 2,
  26. "type": "Double"
  27. }
  28. ],
  29. "fileType": "text",
  30. "encoding": "UTF-8",
  31. "fieldDelimiter": ","
  32. }
  33. },
  34. "writer": {
  35. "name": "mysqlwriter",
  36. "parameter": {
  37. "writeMode": "insert",
  38. "username": "root",
  39. "password": "123456",
  40. "column": [
  41. "uv",
  42. "pv",
  43. "pv_avg"
  44. ],
  45. "connection": [
  46. {
  47. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  48. "table": [
  49. "ads_nshop_flowpu_stat"
  50. ]
  51. }
  52. ]
  53. }
  54. }
  55. }
  56. ]
  57. }
  58. }

执行job任务:

  1. datax.py 03.json -p "-Ddt=20220509"

使用数字那个图表:
image.png
image.png
总共三个值,需要生成三个图表:
image.png

3、平台搜索热词统计

创建mysql的表:

  1. CREATE TABLE `ads_nshop_search_keys` (
  2. `search_keys` varchar(255) DEFAULT NULL,
  3. `gender` varchar(255) DEFAULT NULL,
  4. `age_range` varchar(255) DEFAULT NULL,
  5. `os` varchar(255) DEFAULT NULL,
  6. `manufacturer` varchar(255) DEFAULT NULL,
  7. `area_code` varchar(255) DEFAULT NULL,
  8. `search_users` int DEFAULT NULL,
  9. `search_records` int DEFAULT NULL,
  10. `search_orders` varchar(255) DEFAULT NULL,
  11. `search_targets` int DEFAULT NULL
  12. )

从ads_nshop的数据库中导出数据到mysql:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_search_keys/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "string"
  19. },
  20. {
  21. "index": 1,
  22. "type": "string"
  23. },
  24. {
  25. "index": 2,
  26. "type": "string"
  27. },
  28. {
  29. "index": 3,
  30. "type": "string"
  31. },
  32. {
  33. "index": 4,
  34. "type": "string"
  35. },
  36. {
  37. "index": 5,
  38. "type": "string"
  39. },
  40. {
  41. "index": 6,
  42. "type": "long"
  43. },
  44. {
  45. "index": 7,
  46. "type": "long"
  47. },
  48. {
  49. "index": 8,
  50. "type": "string"
  51. },
  52. {
  53. "index": 9,
  54. "type": "long"
  55. }
  56. ],
  57. "fileType": "text",
  58. "encoding": "UTF-8",
  59. "fieldDelimiter": ","
  60. }
  61. },
  62. "writer": {
  63. "name": "mysqlwriter",
  64. "parameter": {
  65. "writeMode": "insert",
  66. "username": "root",
  67. "password": "123456",
  68. "column": [
  69. "search_keys",
  70. "gender",
  71. "age_range",
  72. "os",
  73. "manufacturer",
  74. "area_code",
  75. "search_users",
  76. "search_records",
  77. "search_orders",
  78. "search_targets"
  79. ],
  80. "connection": [
  81. {
  82. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  83. "table": [
  84. "ads_nshop_search_keys"
  85. ]
  86. }
  87. ]
  88. }
  89. }
  90. }
  91. ]
  92. }
  93. }
  1. datax.py 04.json -p "-Ddt=20220509"

image.png
我们发现热词统计出来是一个个的编号,所以展示不好看,所以需要特殊处理一下。
在mysql中导入一个表:
image.png
导入进去之后,创建一个新的表:

  1. CREATE TABLE `ads_nshop_search_keys_2` (
  2. `search_keys` varchar(255) DEFAULT NULL,
  3. `gender` varchar(255) DEFAULT NULL,
  4. `age_range` varchar(255) DEFAULT NULL,
  5. `os` varchar(255) DEFAULT NULL,
  6. `manufacturer` varchar(255) DEFAULT NULL,
  7. `area_code` varchar(255) DEFAULT NULL,
  8. `search_users` int DEFAULT NULL,
  9. `search_records` int DEFAULT NULL,
  10. `search_orders` varchar(255) DEFAULT NULL,
  11. `search_targets` int DEFAULT NULL
  12. )

关联shop_code 以及 ads_nshop_search_keys 将关联好的数据插入到新的表中,将来使用superset展示新的表数据即可:

  1. insert into ads_nshop_search_keys_2(search_keys ,gender,age_range,os,manufacturer,area_code,search_users,search_records,search_orders,search_targets)
  2. SELECT name as search_keys ,gender,age_range,os,manufacturer,area_code,search_users,search_records,search_orders,search_targets FROM ads_nshop_search_keys JOIN shop_code ON search_keys=id

使用superset展示图表:
image.png
image.png
热词饼状图:
image.png
image.png
image.png

5、总体运营指标统计

  1. insert overwrite table ads_nshop.ads_nshop_oper_stat partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. e.province_name,
  6. c.category_code,
  7. count(distinct b.order_id) ,
  8. count(distinct b.order_id) / sum(d.view_count),
  9. sum(b.payment_money),
  10. sum(b.district_money),
  11. sum(b.shipping_money),
  12. sum(b.payment_money) / count(distinct b.customer_id)
  13. from ods_nshop.ods_02_customer a
  14. join dwd_nshop.dwd_nshop_orders_details b
  15. on a.customer_id=b.customer_id
  16. join ods_nshop.dim_pub_product c
  17. on b.supplier_code=c.supplier_code
  18. join dws_nshop.dws_nshop_ulog_view d
  19. on b.customer_id=d.user_id
  20. join ods_nshop.ods_01_base_area e
  21. on a.customer_natives=e.area_code
  22. where d.bdp_day='20220509'
  23. group by
  24. a.customer_gender,
  25. a.customer_age_range,
  26. e.province_name,
  27. c.category_code;

为什么要把昨天的指标再次统计一遍,因为今天新增了一个ods_01_base_area,统计出来的结果更加的便于展示。
运行结束后,将ads层的运营指标数据,导出到mysql中。

  1. CREATE TABLE `ads_nshop_oper_stat` (
  2. `customer_gender` int DEFAULT NULL,
  3. `age_range` varchar(255) DEFAULT NULL,
  4. `customer_natives` varchar(255) DEFAULT NULL,
  5. `product_type` int DEFAULT NULL,
  6. `order_counts` int DEFAULT NULL,
  7. `order_rate` double DEFAULT NULL,
  8. `order_amounts` int DEFAULT NULL,
  9. `order_discounts` int DEFAULT NULL,
  10. `shipping_amounts` int DEFAULT NULL,
  11. `per_customer_transaction` int DEFAULT NULL
  12. )

编写job任务:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_oper_stat/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "long"
  19. },
  20. {
  21. "index": 1,
  22. "type": "string"
  23. },
  24. {
  25. "index": 2,
  26. "type": "string"
  27. },
  28. {
  29. "index": 3,
  30. "type": "long"
  31. },
  32. {
  33. "index": 4,
  34. "type": "long"
  35. },
  36. {
  37. "index": 5,
  38. "type": "double"
  39. },
  40. {
  41. "index": 6,
  42. "type": "long"
  43. },
  44. {
  45. "index": 7,
  46. "type": "long"
  47. },
  48. {
  49. "index": 8,
  50. "type": "long"
  51. },
  52. {
  53. "index": 9,
  54. "type": "long"
  55. }
  56. ],
  57. "fileType": "text",
  58. "encoding": "UTF-8",
  59. "fieldDelimiter": ","
  60. }
  61. },
  62. "writer": {
  63. "name": "mysqlwriter",
  64. "parameter": {
  65. "writeMode": "insert",
  66. "username": "root",
  67. "password": "123456",
  68. "column": [
  69. "customer_gender",
  70. "age_range",
  71. "customer_natives",
  72. "product_type",
  73. "order_counts",
  74. "order_rate",
  75. "order_amounts",
  76. "order_discounts",
  77. "shipping_amounts",
  78. "per_customer_transaction"
  79. ],
  80. "connection": [
  81. {
  82. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  83. "table": [
  84. "ads_nshop_oper_stat"
  85. ]
  86. }
  87. ]
  88. }
  89. }
  90. }
  91. ]
  92. }
  93. }

运行:

  1. datax.py 05.json -p "-Ddt=20220509"

展示图表:
image.png
image.png
个性化设置:
image.png
image.png
通过柱状图,统计订单量及下单率:
image.png
image.png

6、风控类指标统计

  1. insert overwrite table ads_nshop.ads_nshop_risk_mgt partition(bdp_day='20220509')
  2. select
  3. a.customer_gender,
  4. a.customer_age_range,
  5. e.province_name,
  6. c.category_code,
  7. count(distinct case when b.order_status=6 then b.order_id end),
  8. count(distinct case when b.order_status=6 then b.order_id end)/count(distinct b.order_id)*100
  9. from ods_nshop.ods_02_customer a
  10. join dwd_nshop.dwd_nshop_orders_details b
  11. on a.customer_id=b.customer_id
  12. join ods_nshop.dim_pub_product c
  13. on b.supplier_code=c.supplier_code
  14. join ods_nshop.ods_01_base_area e
  15. on a.customer_natives=e.area_code
  16. where b.bdp_day='20220509'
  17. group by
  18. a.customer_gender,
  19. a.customer_age_range,
  20. e.province_name,
  21. c.category_code;

创建mysql的表:

  1. CREATE TABLE `ads_nshop_risk_mgt` (
  2. `customer_gender` int DEFAULT NULL,
  3. `age_range` varchar(255) DEFAULT NULL,
  4. `customer_natives` varchar(255) DEFAULT NULL,
  5. `product_type` varchar(255) DEFAULT NULL,
  6. `start_complaint_counts` int DEFAULT NULL,
  7. `complaint_rate` double DEFAULT NULL
  8. )

将hive中新跑的数据导出到mysql中:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_risk_mgt/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "long"
  19. },
  20. {
  21. "index": 1,
  22. "type": "string"
  23. },
  24. {
  25. "index": 2,
  26. "type": "string"
  27. },
  28. {
  29. "index": 3,
  30. "type": "string"
  31. },
  32. {
  33. "index": 4,
  34. "type": "long"
  35. },
  36. {
  37. "index": 5,
  38. "type": "double"
  39. }
  40. ],
  41. "fileType": "text",
  42. "encoding": "UTF-8",
  43. "fieldDelimiter": ","
  44. }
  45. },
  46. "writer": {
  47. "name": "mysqlwriter",
  48. "parameter": {
  49. "writeMode": "insert",
  50. "username": "root",
  51. "password": "123456",
  52. "column": [
  53. "customer_gender",
  54. "age_range",
  55. "customer_natives",
  56. "product_type",
  57. "start_complaint_counts",
  58. "complaint_rate"
  59. ],
  60. "connection": [
  61. {
  62. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  63. "table": [
  64. "ads_nshop_risk_mgt"
  65. ]
  66. }
  67. ]
  68. }
  69. }
  70. }
  71. ]
  72. }
  73. }

执行该脚本:

  1. datax.py 06.json -p "-Ddt=20220509"

image.png
image.png

7、统计类的TopN

重新运行指标任务,原因是添加了地区:

  1. insert overwrite table ads_nshop.ads_nshop_pay_stat_topn partition(bdp_day='20220509')
  2. select
  3. case when b.pay_type='10' then '网上银行' when b.pay_type='11' then '微信' when b.pay_type='12' then '支付宝' else '线下支付' end,
  4. e.province_name,
  5. count(distinct b.pay_id),
  6. sum(b.pay_amount) as pay_sum
  7. from ods_nshop.ods_02_customer a
  8. join ods_nshop.ods_02_orders_pay_records b
  9. on a.customer_id=b.customer_id
  10. join ods_nshop.ods_01_base_area e
  11. on a.customer_natives=e.area_code
  12. group by
  13. e.province_name,
  14. b.pay_type order by pay_sum desc limit 500;

在mysql中创建表:

  1. CREATE TABLE `ads_nshop_pay_stat_topn` (
  2. `pay_type` varchar(255) DEFAULT NULL,
  3. `customer_area_code` varchar(255) DEFAULT NULL,
  4. `pay_count` int DEFAULT NULL,
  5. `pay_amounts` int DEFAULT NULL
  6. )

编写job任务:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_pay_stat_topn/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "string"
  19. },
  20. {
  21. "index": 1,
  22. "type": "string"
  23. },
  24. {
  25. "index": 2,
  26. "type": "long"
  27. },
  28. {
  29. "index": 3,
  30. "type": "long"
  31. }
  32. ],
  33. "fileType": "text",
  34. "encoding": "UTF-8",
  35. "fieldDelimiter": ","
  36. }
  37. },
  38. "writer": {
  39. "name": "mysqlwriter",
  40. "parameter": {
  41. "writeMode": "insert",
  42. "username": "root",
  43. "password": "123456",
  44. "column": [
  45. "pay_type",
  46. "customer_area_code",
  47. "pay_count",
  48. "pay_amounts"
  49. ],
  50. "connection": [
  51. {
  52. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  53. "table": [
  54. "ads_nshop_pay_stat_topn"
  55. ]
  56. }
  57. ]
  58. }
  59. }
  60. }
  61. ]
  62. }
  63. }

运行任务:

  1. datax.py 07.json -p "-Ddt=20220509"

展示:
image.png
image.png
树状图统计:
image.png

8、广告投放类指标统计

重新运行指标:

  1. insert overwrite table ads_nshop.ads_nshop_release_stat partition(bdp_day='20220509')
  2. select
  3. a.device_type,
  4. a.os,
  5. b.customer_gender,
  6. b.customer_age_range,
  7. e.province_name,
  8. a.release_sources,
  9. a.release_category,
  10. count(distinct a.customer_id),
  11. count(*)
  12. from dwd_nshop.dwd_nshop_releasedatas a
  13. join ods_nshop.ods_02_customer b
  14. on a.customer_id=b.customer_id
  15. join ods_nshop.ods_01_base_area e
  16. on b.customer_natives=e.area_code
  17. where a.bdp_day='20220509'
  18. group by
  19. a.device_type,
  20. a.os,
  21. b.customer_gender,
  22. b.customer_age_range,
  23. e.province_name,
  24. a.release_sources,
  25. a.release_category;

创建mysql数据库:

  1. CREATE TABLE `ads_nshop_release_stat` (
  2. `device_type` varchar(255) DEFAULT NULL,
  3. `os` varchar(255) DEFAULT NULL,
  4. `customer_gender` int DEFAULT NULL,
  5. `age_range` varchar(255) DEFAULT NULL,
  6. `customer_natives` varchar(255) DEFAULT NULL,
  7. `release_sources` varchar(255) DEFAULT NULL,
  8. `release_category` varchar(255) DEFAULT NULL,
  9. `visit_total_customers` int DEFAULT NULL,
  10. `visit_total_counts` int DEFAULT NULL
  11. )

执行任务,导出数据:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 3
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "hdfsreader",
  12. "parameter": {
  13. "path": "/data/nshop/ads/operation/ads_nshop_release_stat/bdp_day=${dt}/*",
  14. "defaultFS": "hdfs://bigdata01:9820",
  15. "column": [
  16. {
  17. "index": 0,
  18. "type": "string"
  19. },
  20. {
  21. "index": 1,
  22. "type": "string"
  23. },
  24. {
  25. "index": 2,
  26. "type": "long"
  27. },
  28. {
  29. "index": 3,
  30. "type": "string"
  31. },
  32. {
  33. "index": 4,
  34. "type": "string"
  35. },
  36. {
  37. "index": 5,
  38. "type": "string"
  39. },
  40. {
  41. "index": 6,
  42. "type": "string"
  43. },
  44. {
  45. "index": 7,
  46. "type": "long"
  47. },
  48. {
  49. "index": 8,
  50. "type": "long"
  51. }
  52. ],
  53. "fileType": "text",
  54. "encoding": "UTF-8",
  55. "fieldDelimiter": ","
  56. }
  57. },
  58. "writer": {
  59. "name": "mysqlwriter",
  60. "parameter": {
  61. "writeMode": "insert",
  62. "username": "root",
  63. "password": "123456",
  64. "column": [
  65. "device_type",
  66. "os",
  67. "customer_gender",
  68. "age_range",
  69. "customer_natives",
  70. "release_sources",
  71. "release_category",
  72. "visit_total_customers",
  73. "visit_total_counts"
  74. ],
  75. "connection": [
  76. {
  77. "jdbcUrl": "jdbc:mysql://bigdata01:3306/nshop",
  78. "table": [
  79. "ads_nshop_release_stat"
  80. ]
  81. }
  82. ]
  83. }
  84. }
  85. }
  86. ]
  87. }
  88. }

执行该任务,导出数据即可:

  1. datax.py 08.json -p "-Ddt=20220509"

使用桑基图:
image.png
image.png
image.png

9、使用面板汇总图表

image.png
image.png
image.png
image.png