- 使用hive分析,使用sqoop导出结果到数据库
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-use-oozie
运行环境
- hive metastore
- oozie bootstrap
oozie自带示例
```bash $ cd oozie/ $ cp -r examples/apps/hive/ oozie-apps/ $ cat oozie-apps/hive/ job.properties script.q workflow.xml
<a name="OEap9"></a>## job.properties
nameNode=hdfs://192.168.32.130:8020 jobTracker=192.168.32.130:8032 queueName=default examplesRoot=oozie-apps
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/hive
<a name="LOfqg"></a>## workflow.xml注意:- 输出、输入目录- <br />```xml<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf"><start to="hive-node"/><action name="hive-node"><hive xmlns="uri:oozie:hive-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><prepare><delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive"/><mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/></prepare><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property></configuration><script>script.q</script><param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param><param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive</param></hive><ok to="end"/><error to="fail"/></action><kill name="fail"><message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/></workflow-app>
script.q
- 后缀名“.q”、“.hql”等
- 该脚本,创建外部表test,加载文件形式为textfile,文件load地址“${INPUT}”;
查询(覆盖)输出到地址“${OUTPUT}”;
已显示区别,修改查询语句:“ SELECT FROM test *WHERE a > 2”。CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test WHERE a > 2;-- original-- INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
执行测试
准备并上传测试数据
上传oozie任务并运行测试$ mkdir -p oozie-apps/input-data/table$ echo 1 > oozie-apps/input-data/table/part-m-00000$ echo 2 >> oozie-apps/input-data/table/part-m-00000$ echo 3 >> oozie-apps/input-data/table/part-m-00000$ echo 4 >> oozie-apps/input-data/table/part-m-00000$ ~/Documents/hadoop/bin/hadoop fs -put -f oozie-apps/input-data/table oozie-apps/input-data/
$ ~/Documents/hadoop/bin/hadoop fs -put -f oozie-apps/hive oozie-apps$ export OOZIE_URL="http://192.168.32.130:11000/oozie"$ bin/oozie job -config oozie-apps/hive/job.properties -run -verbose -debug$ bin/oozie job -info 0000068-200513181559469-oozie-jack-C# 查看结果$ ~/Documents/hadoop/bin/hadoop fs -cat /user/jack/oozie-apps/output-data/hive/*34
自定义hive表user分析
$ cd oozie/$ cp -r oozie-apps/hive oozie-apps/hive-user$ ls oozie-apps/hive-userjob.properties script.q workflow.xml
job.properties
``` nameNode=hdfs://192.168.32.130:8020 jobTracker=192.168.32.130:8032 queueName=default examplesRoot=oozie-apps
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/hive-user
<a name="mNIdv"></a>## script.q```sql-- hive> desc user;-- OK-- id int-- name string-- 查询所有name长度大于4的记录INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT id FROM default.user WHERE length(name) > 4;
测试(修改 ${OUTPUT} 进行测试)
$ cp ~/Documents/oozie/oozie-apps/hive-user/script.q ~/Documents/hive/hql# ~/Documents/hive/bin/hive -e \# " \# INSERT OVERWRITE DIRECTORY '/user/jack/oozie-apps/output-data/hive-user' SELECT id FROM default.user WHERE length(name) > 4; \# "# ${OUTPUT} -> /user/jack/oozie-apps/output-data/hive-user$ ~/Documents/hive/bin/hive -f ~/Documents/hive/hql/script.q$ ~/Documents/hive/bin/hivehive> select * from user;OK1 jack2 tom3 white4 black5 ming6 ning7 zhangsan8 lisi9 wangwu10 zhaoliu11 aaaaa12 bbbbb
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf"><start to="hive-node"/><action name="hive-node"><hive xmlns="uri:oozie:hive-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><prepare><delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive-user"/><mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/></prepare><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>hive.metastore.uris</name><value>thrift://192.168.32.130:9083</value></property></configuration><script>script.q</script><param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive-user</param></hive><ok to="end"/><error to="fail-hive-node"/></action><kill name="fail"><message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/></workflow-app>
执行测试
上传oozie任务并运行测试
$ ~/Documents/hadoop/bin/hadoop fs -put -f oozie-apps/hive-user oozie-apps$ export OOZIE_URL="http://192.168.32.130:11000/oozie"$ bin/oozie job -config oozie-apps/hive-user/job.properties -run -verbose -debug$ bin/oozie job -info 0000101-200513181559469-oozie-jack-WJob ID : 0000101-200513181559469-oozie-jack-W------------------------------------------------------------------------------------------------------------------------------------Workflow Name : hive-wfApp Path : hdfs://192.168.32.130:8020/user/jack/oozie-apps/hive-userStatus : SUCCEEDEDRun : 0User : jackGroup : -Created : 2020-05-18 06:16 GMTStarted : 2020-05-18 06:16 GMTLast Modified : 2020-05-18 06:17 GMTEnded : 2020-05-18 06:17 GMTCoordAction ID: -Actions------------------------------------------------------------------------------------------------------------------------------------ID Status Ext ID Ext Status Err Code------------------------------------------------------------------------------------------------------------------------------------0000101-200513181559469-oozie-jack-W@:start: OK - OK -------------------------------------------------------------------------------------------------------------------------------------0000101-200513181559469-oozie-jack-W@hive-node OK job_1588552715711_0206 SUCCEEDED -------------------------------------------------------------------------------------------------------------------------------------0000101-200513181559469-oozie-jack-W@end OK - OK -------------------------------------------------------------------------------------------------------------------------------------# 查看结果$ ~/Documents/hadoop/bin/hadoop fs -cat /user/jack/oozie-apps/output-data/hive-user/*34791011121314
