一、前言

1、TPC-DS vs TPC-H

TPC-DS测试基准是TPC组织推出的用于替代TPC-H的下一代决策支持系统测试基准。
TPC-DS在健壮行方面要好很多,因为它的SQL本身比较复杂,也比较多,Hack起来相对困难,并且只hack几个SQL对整体性能提高有限。
1.1 TPC-H:
TPC-H是一款面向商品零售业的决策支持系统测试基准,它定义了8张表,22个查询,遵循SQL92。
1.2 TPC-DS:
TPC-DS采用星型、雪花型等多维数据模式。它包含7张事实表,17张纬度表平均每张表含有18列。其工作负载包含99个SQL查询,覆盖SQL99和2003的核心部分以及OLAP。这个测试集包含对大数据集的统计、报表生成、联机查询、数据挖掘等复杂应用,测试用的数据和值是有倾斜的,与真实数据一致。可以说TPC-DS是与真实场景非常接近的一个测试集,也是难度较大的一个测试集。

二、安装部署

1、安装依赖

yum -y install gcc gcc-c++ libstdc++-devel bison byacc flex
image.png

2、下载测试工具

http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp
image.png

3、安装测试工具

  1. unzip tpc-ds-tool.zip
  2. cd v2.3.0/tools
  3. make

image.png
image.png

4、生成数据

4.1 生成1G数据

  1. ./dsdgen -scale 1 -dir /data

image.png

5.15 ./dsdgen —hlep

  1. -DIR 生成目录
  2. -SCALE 数据大小(单位GB
  3. -DELIMITER 字段分隔符,默认|
  4. -TERMINATE 末尾是否有分隔符,参数 Y或者N
  5. ...

4.2 并行生产数据

  1. dbgen2scale 100 dir /tmp parallel 4 child 1 &
  2. dbgen2scale 100 dir /tmp parallel 4 child 2 &
  3. dbgen2scale 100 dir /tmp parallel 4 child 3 &
  4. dbgen2scale 100 dir /tmp parallel 4 child 4 &

4.3 更新

  1. dbgen2 scale 100 dir /tmp update 3

4.5 等待数据生成完毕

5、将本地数据上传到 HDFS

  1. sudo -u hdfs hadoop fs -mkdir /tpc_ds
  2. sudo -u hdfs hadoop fs -put /data/* /tpc_ds
  3. sudo -u hdfs hdfs dfs -chown -R hive:hive /tpc_ds

image.png

6、在 SQL 引擎中创建外表和执行表,导入数据

6.1 创建hive外部表

  1. create database if not exists test_tpc_ds;
  2. use test_tpc_ds;
  3. drop table if exists call_center;
  4. create external table call_center(
  5. cc_call_center_sk bigint
  6. , cc_call_center_id string
  7. , cc_rec_start_date string
  8. , cc_rec_end_date string
  9. , cc_closed_date_sk bigint
  10. , cc_open_date_sk bigint
  11. , cc_name string
  12. , cc_class string
  13. , cc_employees int
  14. , cc_sq_ft int
  15. , cc_hours string
  16. , cc_manager string
  17. , cc_mkt_id int
  18. , cc_mkt_class string
  19. , cc_mkt_desc string
  20. , cc_market_manager string
  21. , cc_division int
  22. , cc_division_name string
  23. , cc_company int
  24. , cc_company_name string
  25. , cc_street_number string
  26. , cc_street_name string
  27. , cc_street_type string
  28. , cc_suite_number string
  29. , cc_city string
  30. , cc_county string
  31. , cc_state string
  32. , cc_zip string
  33. , cc_country string
  34. , cc_gmt_offset double
  35. , cc_tax_percentage double
  36. )
  37. row format delimited fields terminated by '|'
  38. STORED AS textfile;
  39. drop table if exists catalog_page;
  40. create external table catalog_page(
  41. cp_catalog_page_sk bigint
  42. , cp_catalog_page_id string
  43. , cp_start_date_sk bigint
  44. , cp_end_date_sk bigint
  45. , cp_department string
  46. , cp_catalog_number int
  47. , cp_catalog_page_number int
  48. , cp_description string
  49. , cp_type string
  50. )
  51. row format delimited fields terminated by '|'
  52. STORED AS textfile;
  53. drop table if exists catalog_returns;
  54. create external table catalog_returns
  55. (
  56. cr_returned_date_sk bigint,
  57. cr_returned_time_sk bigint,
  58. cr_item_sk bigint,
  59. cr_refunded_customer_sk bigint,
  60. cr_refunded_cdemo_sk bigint,
  61. cr_refunded_hdemo_sk bigint,
  62. cr_refunded_addr_sk bigint,
  63. cr_returning_customer_sk bigint,
  64. cr_returning_cdemo_sk bigint,
  65. cr_returning_hdemo_sk bigint,
  66. cr_returning_addr_sk bigint,
  67. cr_call_center_sk bigint,
  68. cr_catalog_page_sk bigint,
  69. cr_ship_mode_sk bigint,
  70. cr_warehouse_sk bigint,
  71. cr_reason_sk bigint,
  72. cr_order_number bigint,
  73. cr_return_quantity int,
  74. cr_return_amount double,
  75. cr_return_tax double,
  76. cr_return_amt_inc_tax double,
  77. cr_fee double,
  78. cr_return_ship_cost double,
  79. cr_refunded_cash double,
  80. cr_reversed_charge double,
  81. cr_store_credit double,
  82. cr_net_loss double
  83. )
  84. row format delimited fields terminated by '|'
  85. STORED AS textfile;
  86. drop table if exists catalog_sales;
  87. create external table catalog_sales
  88. (
  89. cs_sold_date_sk bigint,
  90. cs_sold_time_sk bigint,
  91. cs_ship_date_sk bigint,
  92. cs_bill_customer_sk bigint,
  93. cs_bill_cdemo_sk bigint,
  94. cs_bill_hdemo_sk bigint,
  95. cs_bill_addr_sk bigint,
  96. cs_ship_customer_sk bigint,
  97. cs_ship_cdemo_sk bigint,
  98. cs_ship_hdemo_sk bigint,
  99. cs_ship_addr_sk bigint,
  100. cs_call_center_sk bigint,
  101. cs_catalog_page_sk bigint,
  102. cs_ship_mode_sk bigint,
  103. cs_warehouse_sk bigint,
  104. cs_item_sk bigint,
  105. cs_promo_sk bigint,
  106. cs_order_number bigint,
  107. cs_quantity int,
  108. cs_wholesale_cost double,
  109. cs_list_price double,
  110. cs_sales_price double,
  111. cs_ext_discount_amt double,
  112. cs_ext_sales_price double,
  113. cs_ext_wholesale_cost double,
  114. cs_ext_list_price double,
  115. cs_ext_tax double,
  116. cs_coupon_amt double,
  117. cs_ext_ship_cost double,
  118. cs_net_paid double,
  119. cs_net_paid_inc_tax double,
  120. cs_net_paid_inc_ship double,
  121. cs_net_paid_inc_ship_tax double,
  122. cs_net_profit double
  123. )
  124. row format delimited fields terminated by '|'
  125. STORED AS textfile;
  126. drop table if exists customer_address;
  127. create external table customer_address
  128. (
  129. ca_address_sk bigint,
  130. ca_address_id string,
  131. ca_street_number string,
  132. ca_street_name string,
  133. ca_street_type string,
  134. ca_suite_number string,
  135. ca_city string,
  136. ca_county string,
  137. ca_state string,
  138. ca_zip string,
  139. ca_country string,
  140. ca_gmt_offset double,
  141. ca_location_type string
  142. )
  143. row format delimited fields terminated by '|'
  144. STORED AS textfile;
  145. drop table if exists customer_demographics;
  146. create external table customer_demographics
  147. (
  148. cd_demo_sk bigint,
  149. cd_gender string,
  150. cd_marital_status string,
  151. cd_education_status string,
  152. cd_purchase_estimate int,
  153. cd_credit_rating string,
  154. cd_dep_count int,
  155. cd_dep_employed_count int,
  156. cd_dep_college_count int
  157. )
  158. row format delimited fields terminated by '|'
  159. STORED AS textfile;
  160. drop table if exists customer;
  161. create external table customer
  162. (
  163. c_customer_sk bigint,
  164. c_customer_id string,
  165. c_current_cdemo_sk bigint,
  166. c_current_hdemo_sk bigint,
  167. c_current_addr_sk bigint,
  168. c_first_shipto_date_sk bigint,
  169. c_first_sales_date_sk bigint,
  170. c_salutation string,
  171. c_first_name string,
  172. c_last_name string,
  173. c_preferred_cust_flag string,
  174. c_birth_day int,
  175. c_birth_month int,
  176. c_birth_year int,
  177. c_birth_country string,
  178. c_login string,
  179. c_email_address string,
  180. c_last_review_date string
  181. )
  182. row format delimited fields terminated by '|'
  183. STORED AS textfile;
  184. drop table if exists date_dim;
  185. create external table date_dim
  186. (
  187. d_date_sk bigint,
  188. d_date_id string,
  189. d_date string,
  190. d_month_seq int,
  191. d_week_seq int,
  192. d_quarter_seq int,
  193. d_year int,
  194. d_dow int,
  195. d_moy int,
  196. d_dom int,
  197. d_qoy int,
  198. d_fy_year int,
  199. d_fy_quarter_seq int,
  200. d_fy_week_seq int,
  201. d_day_name string,
  202. d_quarter_name string,
  203. d_holiday string,
  204. d_weekend string,
  205. d_following_holiday string,
  206. d_first_dom int,
  207. d_last_dom int,
  208. d_same_day_ly int,
  209. d_same_day_lq int,
  210. d_current_day string,
  211. d_current_week string,
  212. d_current_month string,
  213. d_current_quarter string,
  214. d_current_year string
  215. )
  216. row format delimited fields terminated by '|'
  217. STORED AS textfile;
  218. drop table if exists dbgen_version;
  219. create external table dbgen_version(dv_version string,dv_create_date string, dv_create_time string, dv_cmdline_args string)row format delimited fields terminated by '|' STORED AS textfile;
  220. drop table if exists household_demographics;
  221. create external table household_demographics
  222. (
  223. hd_demo_sk bigint,
  224. hd_income_band_sk bigint,
  225. hd_buy_potential string,
  226. hd_dep_count int,
  227. hd_vehicle_count int
  228. )
  229. row format delimited fields terminated by '|'
  230. STORED AS textfile;
  231. drop table if exists income_band;
  232. create external table income_band(
  233. ib_income_band_sk bigint
  234. , ib_lower_bound int
  235. , ib_upper_bound int
  236. )
  237. row format delimited fields terminated by '|'
  238. STORED AS textfile;
  239. drop table if exists inventory;
  240. create external table inventory
  241. (
  242. inv_date_sk bigint,
  243. inv_item_sk bigint,
  244. inv_warehouse_sk bigint,
  245. inv_quantity_on_hand int
  246. )
  247. row format delimited fields terminated by '|'
  248. STORED AS textfile;
  249. drop table if exists item;
  250. create external table item
  251. (
  252. i_item_sk bigint,
  253. i_item_id string,
  254. i_rec_start_date string,
  255. i_rec_end_date string,
  256. i_item_desc string,
  257. i_current_price double,
  258. i_wholesale_cost double,
  259. i_brand_id int,
  260. i_brand string,
  261. i_class_id int,
  262. i_class string,
  263. i_category_id int,
  264. i_category string,
  265. i_manufact_id int,
  266. i_manufact string,
  267. i_size string,
  268. i_formulation string,
  269. i_color string,
  270. i_units string,
  271. i_container string,
  272. i_manager_id int,
  273. i_product_name string
  274. )
  275. row format delimited fields terminated by '|'
  276. STORED AS textfile;
  277. drop table if exists promotion;
  278. create external table promotion
  279. (
  280. p_promo_sk bigint,
  281. p_promo_id string,
  282. p_start_date_sk bigint,
  283. p_end_date_sk bigint,
  284. p_item_sk bigint,
  285. p_cost double,
  286. p_response_target int,
  287. p_promo_name string,
  288. p_channel_dmail string,
  289. p_channel_email string,
  290. p_channel_catalog string,
  291. p_channel_tv string,
  292. p_channel_radio string,
  293. p_channel_press string,
  294. p_channel_event string,
  295. p_channel_demo string,
  296. p_channel_details string,
  297. p_purpose string,
  298. p_discount_active string
  299. )
  300. row format delimited fields terminated by '|'
  301. STORED AS textfile;
  302. drop table if exists reason;
  303. create external table reason(
  304. r_reason_sk bigint
  305. , r_reason_id string
  306. , r_reason_desc string
  307. )
  308. row format delimited fields terminated by '|'
  309. STORED AS textfile;
  310. drop table if exists ship_mode;
  311. create external table ship_mode(
  312. sm_ship_mode_sk bigint
  313. , sm_ship_mode_id string
  314. , sm_type string
  315. , sm_code string
  316. , sm_carrier string
  317. , sm_contract string
  318. )
  319. row format delimited fields terminated by '|'
  320. STORED AS textfile;
  321. drop table if exists store_returns;
  322. create external table store_returns
  323. (
  324. sr_returned_date_sk bigint,
  325. sr_return_time_sk bigint,
  326. sr_item_sk bigint,
  327. sr_customer_sk bigint,
  328. sr_cdemo_sk bigint,
  329. sr_hdemo_sk bigint,
  330. sr_addr_sk bigint,
  331. sr_store_sk bigint,
  332. sr_reason_sk bigint,
  333. sr_ticket_number bigint,
  334. sr_return_quantity int,
  335. sr_return_amt double,
  336. sr_return_tax double,
  337. sr_return_amt_inc_tax double,
  338. sr_fee double,
  339. sr_return_ship_cost double,
  340. sr_refunded_cash double,
  341. sr_reversed_charge double,
  342. sr_store_credit double,
  343. sr_net_loss double
  344. )
  345. row format delimited fields terminated by '|'
  346. STORED AS textfile;
  347. drop table if exists store_sales;
  348. create external table store_sales
  349. (
  350. ss_sold_date_sk bigint,
  351. ss_sold_time_sk bigint,
  352. ss_item_sk bigint,
  353. ss_customer_sk bigint,
  354. ss_cdemo_sk bigint,
  355. ss_hdemo_sk bigint,
  356. ss_addr_sk bigint,
  357. ss_store_sk bigint,
  358. ss_promo_sk bigint,
  359. ss_ticket_number bigint,
  360. ss_quantity int,
  361. ss_wholesale_cost double,
  362. ss_list_price double,
  363. ss_sales_price double,
  364. ss_ext_discount_amt double,
  365. ss_ext_sales_price double,
  366. ss_ext_wholesale_cost double,
  367. ss_ext_list_price double,
  368. ss_ext_tax double,
  369. ss_coupon_amt double,
  370. ss_net_paid double,
  371. ss_net_paid_inc_tax double,
  372. ss_net_profit double
  373. )
  374. row format delimited fields terminated by '|'
  375. STORED AS textfile;
  376. drop table if exists store;
  377. create external table store
  378. (
  379. s_store_sk bigint,
  380. s_store_id string,
  381. s_rec_start_date string,
  382. s_rec_end_date string,
  383. s_closed_date_sk bigint,
  384. s_store_name string,
  385. s_number_employees int,
  386. s_floor_space int,
  387. s_hours string,
  388. s_manager string,
  389. s_market_id int,
  390. s_geography_class string,
  391. s_market_desc string,
  392. s_market_manager string,
  393. s_division_id int,
  394. s_division_name string,
  395. s_company_id int,
  396. s_company_name string,
  397. s_street_number string,
  398. s_street_name string,
  399. s_street_type string,
  400. s_suite_number string,
  401. s_city string,
  402. s_county string,
  403. s_state string,
  404. s_zip string,
  405. s_country string,
  406. s_gmt_offset double,
  407. s_tax_precentage double
  408. )
  409. row format delimited fields terminated by '|'
  410. STORED AS textfile;
  411. drop table if exists time_dim;
  412. create external table time_dim
  413. (
  414. t_time_sk bigint,
  415. t_time_id string,
  416. t_time int,
  417. t_hour int,
  418. t_minute int,
  419. t_second int,
  420. t_am_pm string,
  421. t_shift string,
  422. t_sub_shift string,
  423. t_meal_time string
  424. )
  425. row format delimited fields terminated by '|'
  426. STORED AS textfile;
  427. drop table if exists warehouse;
  428. create external table warehouse(
  429. w_warehouse_sk bigint
  430. , w_warehouse_id string
  431. , w_warehouse_name string
  432. , w_warehouse_sq_ft int
  433. , w_street_number string
  434. , w_street_name string
  435. , w_street_type string
  436. , w_suite_number string
  437. , w_city string
  438. , w_county string
  439. , w_state string
  440. , w_zip string
  441. , w_country string
  442. , w_gmt_offset double
  443. )
  444. row format delimited fields terminated by '|'
  445. STORED AS textfile;
  446. drop table if exists web_page;
  447. create external table web_page(
  448. wp_web_page_sk bigint
  449. , wp_web_page_id string
  450. , wp_rec_start_date string
  451. , wp_rec_end_date string
  452. , wp_creation_date_sk bigint
  453. , wp_access_date_sk bigint
  454. , wp_autogen_flag string
  455. , wp_customer_sk bigint
  456. , wp_url string
  457. , wp_type string
  458. , wp_char_count int
  459. , wp_link_count int
  460. , wp_image_count int
  461. , wp_max_ad_count int
  462. )
  463. row format delimited fields terminated by '|'
  464. STORED AS textfile;
  465. drop table if exists web_returns;
  466. create external table web_returns
  467. (
  468. wr_returned_date_sk bigint,
  469. wr_returned_time_sk bigint,
  470. wr_item_sk bigint,
  471. wr_refunded_customer_sk bigint,
  472. wr_refunded_cdemo_sk bigint,
  473. wr_refunded_hdemo_sk bigint,
  474. wr_refunded_addr_sk bigint,
  475. wr_returning_customer_sk bigint,
  476. wr_returning_cdemo_sk bigint,
  477. wr_returning_hdemo_sk bigint,
  478. wr_returning_addr_sk bigint,
  479. wr_web_page_sk bigint,
  480. wr_reason_sk bigint,
  481. wr_order_number bigint,
  482. wr_return_quantity int,
  483. wr_return_amt double,
  484. wr_return_tax double,
  485. wr_return_amt_inc_tax double,
  486. wr_fee double,
  487. wr_return_ship_cost double,
  488. wr_refunded_cash double,
  489. wr_reversed_charge double,
  490. wr_account_credit double,
  491. wr_net_loss double
  492. )
  493. row format delimited fields terminated by '|'
  494. STORED AS textfile;
  495. drop table if exists web_sales;
  496. create external table web_sales
  497. (
  498. ws_sold_date_sk bigint,
  499. ws_sold_time_sk bigint,
  500. ws_ship_date_sk bigint,
  501. ws_item_sk bigint,
  502. ws_bill_customer_sk bigint,
  503. ws_bill_cdemo_sk bigint,
  504. ws_bill_hdemo_sk bigint,
  505. ws_bill_addr_sk bigint,
  506. ws_ship_customer_sk bigint,
  507. ws_ship_cdemo_sk bigint,
  508. ws_ship_hdemo_sk bigint,
  509. ws_ship_addr_sk bigint,
  510. ws_web_page_sk bigint,
  511. ws_web_site_sk bigint,
  512. ws_ship_mode_sk bigint,
  513. ws_warehouse_sk bigint,
  514. ws_promo_sk bigint,
  515. ws_order_number bigint,
  516. ws_quantity int,
  517. ws_wholesale_cost double,
  518. ws_list_price double,
  519. ws_sales_price double,
  520. ws_ext_discount_amt double,
  521. ws_ext_sales_price double,
  522. ws_ext_wholesale_cost double,
  523. ws_ext_list_price double,
  524. ws_ext_tax double,
  525. ws_coupon_amt double,
  526. ws_ext_ship_cost double,
  527. ws_net_paid double,
  528. ws_net_paid_inc_tax double,
  529. ws_net_paid_inc_ship double,
  530. ws_net_paid_inc_ship_tax double,
  531. ws_net_profit double
  532. )
  533. row format delimited fields terminated by '|'
  534. STORED AS textfile;
  535. drop table if exists web_site;
  536. create external table web_site
  537. (
  538. web_site_sk bigint,
  539. web_site_id string,
  540. web_rec_start_date string,
  541. web_rec_end_date string,
  542. web_name string,
  543. web_open_date_sk bigint,
  544. web_close_date_sk bigint,
  545. web_class string,
  546. web_manager string,
  547. web_mkt_id int,
  548. web_mkt_class string,
  549. web_mkt_desc string,
  550. web_market_manager string,
  551. web_company_id int,
  552. web_company_name string,
  553. web_street_number string,
  554. web_street_name string,
  555. web_street_type string,
  556. web_suite_number string,
  557. web_city string,
  558. web_county string,
  559. web_state string,
  560. web_zip string,
  561. web_country string,
  562. web_gmt_offset double,
  563. web_tax_percentage double
  564. )
  565. row format delimited fields terminated by '|'
  566. STORED AS textfile;

image.png
6.2 加载数据到hive里

  1. LOAD DATA inpath '/tpc_ds/call_center.dat' INTO TABLE call_center;
  2. LOAD DATA inpath '/tpc_ds/catalog_page.dat' INTO TABLE catalog_page;
  3. LOAD DATA inpath '/tpc_ds/catalog_returns.dat' INTO TABLE catalog_returns;
  4. LOAD DATA inpath '/tpc_ds/catalog_sales.dat' INTO TABLE catalog_sales;
  5. LOAD DATA inpath '/tpc_ds/customer.dat' INTO TABLE customer;
  6. LOAD DATA inpath '/tpc_ds/customer_address.dat' INTO TABLE customer_address;
  7. LOAD DATA inpath '/tpc_ds/customer_demographics.dat' INTO TABLE customer_demographics;
  8. LOAD DATA inpath '/tpc_ds/date_dim.dat' INTO TABLE date_dim;
  9. LOAD DATA inpath '/tpc_ds/dbgen_version.dat' INTO TABLE dbgen_version;
  10. LOAD DATA inpath '/tpc_ds/household_demographics.dat' INTO TABLE household_demographics;
  11. LOAD DATA inpath '/tpc_ds/income_band.dat' INTO TABLE income_band;
  12. LOAD DATA inpath '/tpc_ds/inventory.dat' INTO TABLE inventory;
  13. LOAD DATA inpath '/tpc_ds/item.dat' INTO TABLE item;
  14. LOAD DATA inpath '/tpc_ds/promotion.dat' INTO TABLE promotion;
  15. LOAD DATA inpath '/tpc_ds/reason.dat' INTO TABLE reason;
  16. LOAD DATA inpath '/tpc_ds/ship_mode.dat' INTO TABLE ship_mode;
  17. LOAD DATA inpath '/tpc_ds/store.dat' INTO TABLE store;
  18. LOAD DATA inpath '/tpc_ds/store_returns.dat' INTO TABLE store_returns;
  19. LOAD DATA inpath '/tpc_ds/store_sales.dat' INTO TABLE store_sales;
  20. LOAD DATA inpath '/tpc_ds/time_dim.dat' INTO TABLE time_dim;
  21. LOAD DATA inpath '/tpc_ds/warehouse.dat' INTO TABLE warehouse;
  22. LOAD DATA inpath '/tpc_ds/web_sales.dat' INTO TABLE web_sales;
  23. LOAD DATA inpath '/tpc_ds/web_page.dat' INTO TABLE web_page;
  24. LOAD DATA inpath '/tpc_ds/web_returns.dat' INTO TABLE web_returns;
  25. LOAD DATA inpath '/tpc_ds/web_site.dat' INTO TABLE web_site;

image.png

6.3 创建hive内部表

  1. create database if not exists tpc_ds;
  2. use tpc_ds;
  3. drop table if exists call_center;
  4. create table call_center(
  5. cc_call_center_sk bigint
  6. , cc_call_center_id string
  7. , cc_rec_start_date string
  8. , cc_rec_end_date string
  9. , cc_closed_date_sk bigint
  10. , cc_open_date_sk bigint
  11. , cc_name string
  12. , cc_class string
  13. , cc_employees int
  14. , cc_sq_ft int
  15. , cc_hours string
  16. , cc_manager string
  17. , cc_mkt_id int
  18. , cc_mkt_class string
  19. , cc_mkt_desc string
  20. , cc_market_manager string
  21. , cc_division int
  22. , cc_division_name string
  23. , cc_company int
  24. , cc_company_name string
  25. , cc_street_number string
  26. , cc_street_name string
  27. , cc_street_type string
  28. , cc_suite_number string
  29. , cc_city string
  30. , cc_county string
  31. , cc_state string
  32. , cc_zip string
  33. , cc_country string
  34. , cc_gmt_offset double
  35. , cc_tax_percentage double
  36. )
  37. row format delimited fields terminated by '|'
  38. STORED AS textfile;
  39. drop table if exists catalog_page;
  40. create table catalog_page(
  41. cp_catalog_page_sk bigint
  42. , cp_catalog_page_id string
  43. , cp_start_date_sk bigint
  44. , cp_end_date_sk bigint
  45. , cp_department string
  46. , cp_catalog_number int
  47. , cp_catalog_page_number int
  48. , cp_description string
  49. , cp_type string
  50. )
  51. row format delimited fields terminated by '|'
  52. STORED AS textfile;
  53. drop table if exists catalog_returns;
  54. create table catalog_returns
  55. (
  56. cr_returned_date_sk bigint,
  57. cr_returned_time_sk bigint,
  58. cr_item_sk bigint,
  59. cr_refunded_customer_sk bigint,
  60. cr_refunded_cdemo_sk bigint,
  61. cr_refunded_hdemo_sk bigint,
  62. cr_refunded_addr_sk bigint,
  63. cr_returning_customer_sk bigint,
  64. cr_returning_cdemo_sk bigint,
  65. cr_returning_hdemo_sk bigint,
  66. cr_returning_addr_sk bigint,
  67. cr_call_center_sk bigint,
  68. cr_catalog_page_sk bigint,
  69. cr_ship_mode_sk bigint,
  70. cr_warehouse_sk bigint,
  71. cr_reason_sk bigint,
  72. cr_order_number bigint,
  73. cr_return_quantity int,
  74. cr_return_amount double,
  75. cr_return_tax double,
  76. cr_return_amt_inc_tax double,
  77. cr_fee double,
  78. cr_return_ship_cost double,
  79. cr_refunded_cash double,
  80. cr_reversed_charge double,
  81. cr_store_credit double,
  82. cr_net_loss double
  83. )
  84. row format delimited fields terminated by '|'
  85. STORED AS textfile;
  86. drop table if exists catalog_sales;
  87. create table catalog_sales
  88. (
  89. cs_sold_date_sk bigint,
  90. cs_sold_time_sk bigint,
  91. cs_ship_date_sk bigint,
  92. cs_bill_customer_sk bigint,
  93. cs_bill_cdemo_sk bigint,
  94. cs_bill_hdemo_sk bigint,
  95. cs_bill_addr_sk bigint,
  96. cs_ship_customer_sk bigint,
  97. cs_ship_cdemo_sk bigint,
  98. cs_ship_hdemo_sk bigint,
  99. cs_ship_addr_sk bigint,
  100. cs_call_center_sk bigint,
  101. cs_catalog_page_sk bigint,
  102. cs_ship_mode_sk bigint,
  103. cs_warehouse_sk bigint,
  104. cs_item_sk bigint,
  105. cs_promo_sk bigint,
  106. cs_order_number bigint,
  107. cs_quantity int,
  108. cs_wholesale_cost double,
  109. cs_list_price double,
  110. cs_sales_price double,
  111. cs_ext_discount_amt double,
  112. cs_ext_sales_price double,
  113. cs_ext_wholesale_cost double,
  114. cs_ext_list_price double,
  115. cs_ext_tax double,
  116. cs_coupon_amt double,
  117. cs_ext_ship_cost double,
  118. cs_net_paid double,
  119. cs_net_paid_inc_tax double,
  120. cs_net_paid_inc_ship double,
  121. cs_net_paid_inc_ship_tax double,
  122. cs_net_profit double
  123. )
  124. row format delimited fields terminated by '|'
  125. STORED AS textfile;
  126. drop table if exists customer_address;
  127. create table customer_address
  128. (
  129. ca_address_sk bigint,
  130. ca_address_id string,
  131. ca_street_number string,
  132. ca_street_name string,
  133. ca_street_type string,
  134. ca_suite_number string,
  135. ca_city string,
  136. ca_county string,
  137. ca_state string,
  138. ca_zip string,
  139. ca_country string,
  140. ca_gmt_offset double,
  141. ca_location_type string
  142. )
  143. row format delimited fields terminated by '|'
  144. STORED AS textfile;
  145. drop table if exists customer_demographics;
  146. create table customer_demographics
  147. (
  148. cd_demo_sk bigint,
  149. cd_gender string,
  150. cd_marital_status string,
  151. cd_education_status string,
  152. cd_purchase_estimate int,
  153. cd_credit_rating string,
  154. cd_dep_count int,
  155. cd_dep_employed_count int,
  156. cd_dep_college_count int
  157. )
  158. row format delimited fields terminated by '|'
  159. STORED AS textfile;
  160. drop table if exists customer;
  161. create table customer
  162. (
  163. c_customer_sk bigint,
  164. c_customer_id string,
  165. c_current_cdemo_sk bigint,
  166. c_current_hdemo_sk bigint,
  167. c_current_addr_sk bigint,
  168. c_first_shipto_date_sk bigint,
  169. c_first_sales_date_sk bigint,
  170. c_salutation string,
  171. c_first_name string,
  172. c_last_name string,
  173. c_preferred_cust_flag string,
  174. c_birth_day int,
  175. c_birth_month int,
  176. c_birth_year int,
  177. c_birth_country string,
  178. c_login string,
  179. c_email_address string,
  180. c_last_review_date string
  181. )
  182. row format delimited fields terminated by '|'
  183. STORED AS textfile;
  184. drop table if exists date_dim;
  185. create table date_dim
  186. (
  187. d_date_sk bigint,
  188. d_date_id string,
  189. d_date string,
  190. d_month_seq int,
  191. d_week_seq int,
  192. d_quarter_seq int,
  193. d_year int,
  194. d_dow int,
  195. d_moy int,
  196. d_dom int,
  197. d_qoy int,
  198. d_fy_year int,
  199. d_fy_quarter_seq int,
  200. d_fy_week_seq int,
  201. d_day_name string,
  202. d_quarter_name string,
  203. d_holiday string,
  204. d_weekend string,
  205. d_following_holiday string,
  206. d_first_dom int,
  207. d_last_dom int,
  208. d_same_day_ly int,
  209. d_same_day_lq int,
  210. d_current_day string,
  211. d_current_week string,
  212. d_current_month string,
  213. d_current_quarter string,
  214. d_current_year string
  215. )
  216. row format delimited fields terminated by '|'
  217. STORED AS textfile;
  218. drop table if exists dbgen_version;
  219. create table dbgen_version(dv_version string,dv_create_date string, dv_create_time string, dv_cmdline_args string)row format delimited fields terminated by '|' STORED AS textfile;
  220. drop table if exists household_demographics;
  221. create table household_demographics
  222. (
  223. hd_demo_sk bigint,
  224. hd_income_band_sk bigint,
  225. hd_buy_potential string,
  226. hd_dep_count int,
  227. hd_vehicle_count int
  228. )
  229. row format delimited fields terminated by '|'
  230. STORED AS textfile;
  231. drop table if exists income_band;
  232. create table income_band(
  233. ib_income_band_sk bigint
  234. , ib_lower_bound int
  235. , ib_upper_bound int
  236. )
  237. row format delimited fields terminated by '|'
  238. STORED AS textfile;
  239. drop table if exists inventory;
  240. create table inventory
  241. (
  242. inv_date_sk bigint,
  243. inv_item_sk bigint,
  244. inv_warehouse_sk bigint,
  245. inv_quantity_on_hand int
  246. )
  247. row format delimited fields terminated by '|'
  248. STORED AS textfile;
  249. drop table if exists item;
  250. create table item
  251. (
  252. i_item_sk bigint,
  253. i_item_id string,
  254. i_rec_start_date string,
  255. i_rec_end_date string,
  256. i_item_desc string,
  257. i_current_price double,
  258. i_wholesale_cost double,
  259. i_brand_id int,
  260. i_brand string,
  261. i_class_id int,
  262. i_class string,
  263. i_category_id int,
  264. i_category string,
  265. i_manufact_id int,
  266. i_manufact string,
  267. i_size string,
  268. i_formulation string,
  269. i_color string,
  270. i_units string,
  271. i_container string,
  272. i_manager_id int,
  273. i_product_name string
  274. )
  275. row format delimited fields terminated by '|'
  276. STORED AS textfile;
  277. drop table if exists promotion;
  278. create table promotion
  279. (
  280. p_promo_sk bigint,
  281. p_promo_id string,
  282. p_start_date_sk bigint,
  283. p_end_date_sk bigint,
  284. p_item_sk bigint,
  285. p_cost double,
  286. p_response_target int,
  287. p_promo_name string,
  288. p_channel_dmail string,
  289. p_channel_email string,
  290. p_channel_catalog string,
  291. p_channel_tv string,
  292. p_channel_radio string,
  293. p_channel_press string,
  294. p_channel_event string,
  295. p_channel_demo string,
  296. p_channel_details string,
  297. p_purpose string,
  298. p_discount_active string
  299. )
  300. row format delimited fields terminated by '|'
  301. STORED AS textfile;
  302. drop table if exists reason;
  303. create table reason(
  304. r_reason_sk bigint
  305. , r_reason_id string
  306. , r_reason_desc string
  307. )
  308. row format delimited fields terminated by '|'
  309. STORED AS textfile;
  310. drop table if exists ship_mode;
  311. create table ship_mode(
  312. sm_ship_mode_sk bigint
  313. , sm_ship_mode_id string
  314. , sm_type string
  315. , sm_code string
  316. , sm_carrier string
  317. , sm_contract string
  318. )
  319. row format delimited fields terminated by '|'
  320. STORED AS textfile;
  321. drop table if exists store_returns;
  322. create table store_returns
  323. (
  324. sr_returned_date_sk bigint,
  325. sr_return_time_sk bigint,
  326. sr_item_sk bigint,
  327. sr_customer_sk bigint,
  328. sr_cdemo_sk bigint,
  329. sr_hdemo_sk bigint,
  330. sr_addr_sk bigint,
  331. sr_store_sk bigint,
  332. sr_reason_sk bigint,
  333. sr_ticket_number bigint,
  334. sr_return_quantity int,
  335. sr_return_amt double,
  336. sr_return_tax double,
  337. sr_return_amt_inc_tax double,
  338. sr_fee double,
  339. sr_return_ship_cost double,
  340. sr_refunded_cash double,
  341. sr_reversed_charge double,
  342. sr_store_credit double,
  343. sr_net_loss double
  344. )
  345. row format delimited fields terminated by '|'
  346. STORED AS textfile;
  347. drop table if exists store_sales;
  348. create table store_sales
  349. (
  350. ss_sold_date_sk bigint,
  351. ss_sold_time_sk bigint,
  352. ss_item_sk bigint,
  353. ss_customer_sk bigint,
  354. ss_cdemo_sk bigint,
  355. ss_hdemo_sk bigint,
  356. ss_addr_sk bigint,
  357. ss_store_sk bigint,
  358. ss_promo_sk bigint,
  359. ss_ticket_number bigint,
  360. ss_quantity int,
  361. ss_wholesale_cost double,
  362. ss_list_price double,
  363. ss_sales_price double,
  364. ss_ext_discount_amt double,
  365. ss_ext_sales_price double,
  366. ss_ext_wholesale_cost double,
  367. ss_ext_list_price double,
  368. ss_ext_tax double,
  369. ss_coupon_amt double,
  370. ss_net_paid double,
  371. ss_net_paid_inc_tax double,
  372. ss_net_profit double
  373. )
  374. row format delimited fields terminated by '|'
  375. STORED AS textfile;
  376. drop table if exists store;
  377. create table store
  378. (
  379. s_store_sk bigint,
  380. s_store_id string,
  381. s_rec_start_date string,
  382. s_rec_end_date string,
  383. s_closed_date_sk bigint,
  384. s_store_name string,
  385. s_number_employees int,
  386. s_floor_space int,
  387. s_hours string,
  388. s_manager string,
  389. s_market_id int,
  390. s_geography_class string,
  391. s_market_desc string,
  392. s_market_manager string,
  393. s_division_id int,
  394. s_division_name string,
  395. s_company_id int,
  396. s_company_name string,
  397. s_street_number string,
  398. s_street_name string,
  399. s_street_type string,
  400. s_suite_number string,
  401. s_city string,
  402. s_county string,
  403. s_state string,
  404. s_zip string,
  405. s_country string,
  406. s_gmt_offset double,
  407. s_tax_precentage double
  408. )
  409. row format delimited fields terminated by '|'
  410. STORED AS textfile;
  411. drop table if exists time_dim;
  412. create table time_dim
  413. (
  414. t_time_sk bigint,
  415. t_time_id string,
  416. t_time int,
  417. t_hour int,
  418. t_minute int,
  419. t_second int,
  420. t_am_pm string,
  421. t_shift string,
  422. t_sub_shift string,
  423. t_meal_time string
  424. )
  425. row format delimited fields terminated by '|'
  426. STORED AS textfile;
  427. drop table if exists warehouse;
  428. create table warehouse(
  429. w_warehouse_sk bigint
  430. , w_warehouse_id string
  431. , w_warehouse_name string
  432. , w_warehouse_sq_ft int
  433. , w_street_number string
  434. , w_street_name string
  435. , w_street_type string
  436. , w_suite_number string
  437. , w_city string
  438. , w_county string
  439. , w_state string
  440. , w_zip string
  441. , w_country string
  442. , w_gmt_offset double
  443. )
  444. row format delimited fields terminated by '|'
  445. STORED AS textfile;
  446. drop table if exists web_page;
  447. create table web_page(
  448. wp_web_page_sk bigint
  449. , wp_web_page_id string
  450. , wp_rec_start_date string
  451. , wp_rec_end_date string
  452. , wp_creation_date_sk bigint
  453. , wp_access_date_sk bigint
  454. , wp_autogen_flag string
  455. , wp_customer_sk bigint
  456. , wp_url string
  457. , wp_type string
  458. , wp_char_count int
  459. , wp_link_count int
  460. , wp_image_count int
  461. , wp_max_ad_count int
  462. )
  463. row format delimited fields terminated by '|'
  464. STORED AS textfile;
  465. drop table if exists web_returns;
  466. create table web_returns
  467. (
  468. wr_returned_date_sk bigint,
  469. wr_returned_time_sk bigint,
  470. wr_item_sk bigint,
  471. wr_refunded_customer_sk bigint,
  472. wr_refunded_cdemo_sk bigint,
  473. wr_refunded_hdemo_sk bigint,
  474. wr_refunded_addr_sk bigint,
  475. wr_returning_customer_sk bigint,
  476. wr_returning_cdemo_sk bigint,
  477. wr_returning_hdemo_sk bigint,
  478. wr_returning_addr_sk bigint,
  479. wr_web_page_sk bigint,
  480. wr_reason_sk bigint,
  481. wr_order_number bigint,
  482. wr_return_quantity int,
  483. wr_return_amt double,
  484. wr_return_tax double,
  485. wr_return_amt_inc_tax double,
  486. wr_fee double,
  487. wr_return_ship_cost double,
  488. wr_refunded_cash double,
  489. wr_reversed_charge double,
  490. wr_account_credit double,
  491. wr_net_loss double
  492. )
  493. row format delimited fields terminated by '|'
  494. STORED AS textfile;
  495. drop table if exists web_sales;
  496. create table web_sales
  497. (
  498. ws_sold_date_sk bigint,
  499. ws_sold_time_sk bigint,
  500. ws_ship_date_sk bigint,
  501. ws_item_sk bigint,
  502. ws_bill_customer_sk bigint,
  503. ws_bill_cdemo_sk bigint,
  504. ws_bill_hdemo_sk bigint,
  505. ws_bill_addr_sk bigint,
  506. ws_ship_customer_sk bigint,
  507. ws_ship_cdemo_sk bigint,
  508. ws_ship_hdemo_sk bigint,
  509. ws_ship_addr_sk bigint,
  510. ws_web_page_sk bigint,
  511. ws_web_site_sk bigint,
  512. ws_ship_mode_sk bigint,
  513. ws_warehouse_sk bigint,
  514. ws_promo_sk bigint,
  515. ws_order_number bigint,
  516. ws_quantity int,
  517. ws_wholesale_cost double,
  518. ws_list_price double,
  519. ws_sales_price double,
  520. ws_ext_discount_amt double,
  521. ws_ext_sales_price double,
  522. ws_ext_wholesale_cost double,
  523. ws_ext_list_price double,
  524. ws_ext_tax double,
  525. ws_coupon_amt double,
  526. ws_ext_ship_cost double,
  527. ws_net_paid double,
  528. ws_net_paid_inc_tax double,
  529. ws_net_paid_inc_ship double,
  530. ws_net_paid_inc_ship_tax double,
  531. ws_net_profit double
  532. )
  533. row format delimited fields terminated by '|'
  534. STORED AS textfile;
  535. drop table if exists web_site;
  536. create table web_site
  537. (
  538. web_site_sk bigint,
  539. web_site_id string,
  540. web_rec_start_date string,
  541. web_rec_end_date string,
  542. web_name string,
  543. web_open_date_sk bigint,
  544. web_close_date_sk bigint,
  545. web_class string,
  546. web_manager string,
  547. web_mkt_id int,
  548. web_mkt_class string,
  549. web_mkt_desc string,
  550. web_market_manager string,
  551. web_company_id int,
  552. web_company_name string,
  553. web_street_number string,
  554. web_street_name string,
  555. web_street_type string,
  556. web_suite_number string,
  557. web_city string,
  558. web_county string,
  559. web_state string,
  560. web_zip string,
  561. web_country string,
  562. web_gmt_offset double,
  563. web_tax_percentage double
  564. )
  565. row format delimited fields terminated by '|'
  566. STORED AS textfile;

6.4 加载外部表数据到内部表

  1. use tpc_ds;
  2. INSERT OVERWRITE TABLE call_center SELECT * FROM test_tpc_ds.call_center;
  3. INSERT OVERWRITE TABLE catalog_page SELECT * FROM test_tpc_ds.catalog_page;
  4. INSERT OVERWRITE TABLE catalog_returns SELECT * FROM test_tpc_ds.catalog_returns;
  5. INSERT OVERWRITE TABLE catalog_sales SELECT * FROM test_tpc_ds.catalog_sales;
  6. INSERT OVERWRITE TABLE customer SELECT * FROM test_tpc_ds.customer;
  7. INSERT OVERWRITE TABLE customer_address SELECT * FROM test_tpc_ds.customer_address;
  8. INSERT OVERWRITE TABLE customer_demographics SELECT * FROM test_tpc_ds.customer_demographics;
  9. INSERT OVERWRITE TABLE date_dim SELECT * FROM test_tpc_ds.date_dim;
  10. INSERT OVERWRITE TABLE dbgen_version SELECT * FROM test_tpc_ds.dbgen_version;
  11. INSERT OVERWRITE TABLE household_demographics SELECT * FROM test_tpc_ds.household_demographics;
  12. INSERT OVERWRITE TABLE income_band SELECT * FROM test_tpc_ds.income_band;
  13. INSERT OVERWRITE TABLE inventory SELECT * FROM test_tpc_ds.inventory;
  14. INSERT OVERWRITE TABLE item SELECT * FROM test_tpc_ds.item;
  15. INSERT OVERWRITE TABLE promotion SELECT * FROM test_tpc_ds.promotion;
  16. INSERT OVERWRITE TABLE reason SELECT * FROM test_tpc_ds.reason;
  17. INSERT OVERWRITE TABLE ship_mode SELECT * FROM test_tpc_ds.ship_mode;
  18. INSERT OVERWRITE TABLE store SELECT * FROM test_tpc_ds.store;
  19. INSERT OVERWRITE TABLE store_returns SELECT * FROM test_tpc_ds.store_returns;
  20. INSERT OVERWRITE TABLE store_sales SELECT * FROM test_tpc_ds.store_sales;
  21. INSERT OVERWRITE TABLE time_dim SELECT * FROM test_tpc_ds.time_dim;
  22. INSERT OVERWRITE TABLE warehouse SELECT * FROM test_tpc_ds.warehouse;
  23. INSERT OVERWRITE TABLE web_page SELECT * FROM test_tpc_ds.web_page;
  24. INSERT OVERWRITE TABLE web_returns SELECT * FROM test_tpc_ds.web_returns;
  25. INSERT OVERWRITE TABLE web_sales SELECT * FROM test_tpc_ds.web_sales;
  26. INSERT OVERWRITE TABLE web_site SELECT * FROM test_tpc_ds.web_site;

image.png

7、执行 Linux 清理缓存操作

  1. sync
  2. echo 3 > /proc/sys/vm/drop_caches

8、执行测试sql

三、生成99个查询语句
1、修改query_template下query1-99模板,在行尾加define _END = “”,否则执行生成命令会出错;
   1)、编写shell脚本update_query.sh,并添加如下内容:

  1. #!/bin/bash
  2. COUNTER=1
  3. while [ $COUNTER -lt 100 ]
  4. do
  5. echo $COUNTER
  6. echo "define _END = \"\";">>query$COUNTER.tpl
  7. COUNTER=`expr $COUNTER + 1`
  8. done

   2)、将update_query.sh复制到query_template目录下,并执行脚本

  1. sh  ./update_query.sh

2、到/tools目录下执行如下命令:

  1. ./dsqgen -output_dir /data/ -input ../query_templates/templates.lst -scale 1 -dialect oracle -directory ../query_templates/