参考

Oozie Shell Action Extension - [http://oozie.apache.org/docs/4.0.0/DG_ShellActionExtension.html](http://oozie.apache.org/docs/4.0.0/DG_ShellActionExtension.html)

  • exec 元素必须包含要执行的Shell命令的路径。
  • 然后可以使用一个或多个 argument 元素指定Shell命令的参数。argument (如果存在)包含传递给Shell命令的参数。

oozie安装目录下,oozie-examples.tar.gz 中存在一些案例。
解压到当前目录,并查看。创建oozie-apps目录,保存测试示例。

  1. $ tar -zxf oozie-examples.tar.gz
  2. $ cd examples/
  3. $ ls
  4. apps input-data src
  5. $ ls apps
  6. aggregator cron custom-main demo hadoop-el hive map-reduce pig sla sqoop-freeform streaming
  7. bundle cron-schedule datelist-java-main distcp hcatalog java-main no-op shell sqoop ssh subwf
  8. $ cd ..
  9. $ mkdir oozie-apps

安装目录下,调度shell脚本的示例路径 examples/apps/shell

oozie 命令

执行工作流任务

  1. $ oozie job -oozie http://192.168.32.130:11000/oozie -config oozie-apps/shell/job.properties -run

检查工作流任务状态

  1. $ oozie job -oozie http://192.168.32.130:11000/oozie -info 0000000-200420185350972-oozie-jack-W

要通过Oozie web控制台检查工作流作业状态,可以使用浏览器转到“http://localhost:11000/oozie”。
为了避免在每个 Oozie 命令中使用 Oozie URL 提供-oozie选项,可以在shell环境中将 OOZIE_URL 环境变量设置为 Oozie URL。例如:

  1. $ export OOZIE_URL="http://192.168.32.130:11000/oozie"
  2. $
  3. $ oozie job -info 0000000-200420185350972-oozie-jack-W
  4. ··· ···

杀掉工作流任务

  1. $ oozie job -oozie http://192.168.32.130:11000/oozie -kill 0000000-200420185350972-oozie-jack-W

调度shell命令

参考examples下shell工作流任务示例,编辑job.propertiesworkflow.xml 文件。执行运行shell命令的工作流任务。

  1. $ cp -r examples/apps/shell/ oozie-apps/
  2. $ ls oozie-apps/shell
  3. job.properties workflow.xml

job.properties

  1. # job.properties
  2. #
  3. # Licensed to the Apache Software Foundation (ASF) under one
  4. # or more contributor license agreements. See the NOTICE file
  5. # distributed with this work for additional information
  6. # regarding copyright ownership. The ASF licenses this file
  7. # to you under the Apache License, Version 2.0 (the
  8. # "License"); you may not use this file except in compliance
  9. # with the License. You may obtain a copy of the License at
  10. #
  11. # http://www.apache.org/licenses/LICENSE-2.0
  12. #
  13. # Unless required by applicable law or agreed to in writing, software
  14. # distributed under the License is distributed on an "AS IS" BASIS,
  15. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  16. # See the License for the specific language governing permissions and
  17. # limitations under the License.
  18. #
  19. nameNode=hdfs://192.168.32.130:8020
  20. # yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
  21. jobTracker=192.168.32.130:8032
  22. queueName=default
  23. examplesRoot=oozie-apps
  24. # 执行应用程序信息路径 hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell
  25. oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell

workflow.xml

  1. <!--
  2. Licensed to the Apache Software Foundation (ASF) under one
  3. or more contributor license agreements. See the NOTICE file
  4. distributed with this work for additional information
  5. regarding copyright ownership. The ASF licenses this file
  6. to you under the Apache License, Version 2.0 (the
  7. "License"); you may not use this file except in compliance
  8. with the License. You may obtain a copy of the License at
  9. http://www.apache.org/licenses/LICENSE-2.0
  10. Unless required by applicable law or agreed to in writing, software
  11. distributed under the License is distributed on an "AS IS" BASIS,
  12. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. See the License for the specific language governing permissions and
  14. limitations under the License.
  15. -->
  16. <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
  17. <start to="shell-node"/>
  18. <action name="shell-node">
  19. <shell xmlns="uri:oozie:shell-action:0.2">
  20. <job-tracker>${jobTracker}</job-tracker>
  21. <name-node>${nameNode}</name-node>
  22. <configuration>
  23. <property>
  24. <name>mapred.job.queue.name</name>
  25. <value>${queueName}</value>
  26. </property>
  27. </configuration>
  28. <exec>echo</exec>
  29. <argument>my_output=Hello Oozie</argument>
  30. <capture-output/>
  31. </shell>
  32. <ok to="check-output"/>
  33. <error to="fail"/>
  34. </action>
  35. <decision name="check-output">
  36. <switch>
  37. <case to="end">
  38. ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
  39. </case>
  40. <default to="fail-output"/>
  41. </switch>
  42. </decision>
  43. <kill name="fail">
  44. <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  45. </kill>
  46. <kill name="fail-output">
  47. <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
  48. </kill>
  49. <end name="end"/>
  50. </workflow-app>

执行

  1. $ # 上传任务
  2. $ ~/Documents/hadoop/bin/hadoop fs -put oozie-apps oozie-apps
  3. $ # 查看目录信息
  4. $ ~/Documents/hadoop/bin/hadoop fs -ls /user/jack/oozie-apps/shell/
  5. Found 2 items
  6. -rw-r--r-- 1 jack supergroup 1047 2020-04-21 00:48 /user/jack/oozie-apps/shell/job.properties
  7. -rw-r--r-- 1 jack supergroup 2075 2020-04-21 00:48 /user/jack/oozie-apps/shell/workflow.xml
  8. $ # 执行工作流任务
  9. $ bin/oozie job -oozie http://192.168.32.130:11000/oozie -config oozie-apps/shell/job.properties -run
  10. job: 0000000-200420185350972-oozie-jack-W
  11. $ # 检查工作流任务状态,也可以通过Oozie web控制台检查工作流作业状态,http://localhost:11000/oozie
  12. $ bin/oozie job -oozie http://192.168.32.130:11000/oozie -info 0000000-200420185350972-oozie-jack-W
  13. Job ID : 0000000-200420185350972-oozie-jack-W
  14. ------------------------------------------------------------------------------------------------------------------------------------
  15. Workflow Name : shell-wf
  16. App Path : hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell
  17. Status : SUCCEEDED
  18. Run : 0
  19. User : jack
  20. Group : -
  21. Created : 2020-04-21 07:58 GMT
  22. Started : 2020-04-21 07:58 GMT
  23. Last Modified : 2020-04-21 07:58 GMT
  24. Ended : 2020-04-21 07:58 GMT
  25. CoordAction ID: -
  26. Actions
  27. ------------------------------------------------------------------------------------------------------------------------------------
  28. ID Status Ext ID Ext Status Err Code
  29. ------------------------------------------------------------------------------------------------------------------------------------
  30. 0000000-200420185350972-oozie-jack-W@:start: OK - OK -
  31. ------------------------------------------------------------------------------------------------------------------------------------
  32. 0000000-200420185350972-oozie-jack-W@shell-node OK job_1586921478592_0021 SUCCEEDED -
  33. ------------------------------------------------------------------------------------------------------------------------------------
  34. 0000000-200420185350972-oozie-jack-W@check-output OK - end -
  35. ------------------------------------------------------------------------------------------------------------------------------------
  36. 0000000-200420185350972-oozie-jack-W@end OK - OK -
  37. ------------------------------------------------------------------------------------------------------------------------------------

调度shell脚本

修改上述shell命令工作流任务,编辑 job.propertiesworkflow.xml 文件,创建shell脚本,创建运行shell脚本的工作流任务。

  1. $ cp -r examples/apps/shell oozie-apps/shell-script
  2. $ ls oozie-apps/shell-script
  3. job.properties workflow.xml
  4. $ vi oozie-apps/shell-script/batch.sh
  5. #!/bin/bash
  6. echo "Hello! It's time to run. [`date`]" > `p=/tmp/oozie;[[ ! -d "${p}" ]] && mkdir -p ${p};echo ${p}/workflow.log`

job.properties

  1. nameNode=hdfs://192.168.32.130:8020
  2. # yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
  3. jobTracker=192.168.32.130:8032
  4. queueName=default
  5. examplesRoot=oozie-apps
  6. # 执行应用程序信息路径 hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell-script
  7. oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell-script
  8. EXEC=batch.sh

workflow.xml

  1. <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
  2. <start to="shell-node"/>
  3. <action name="shell-node">
  4. <shell xmlns="uri:oozie:shell-action:0.2">
  5. <job-tracker>${jobTracker}</job-tracker>
  6. <name-node>${nameNode}</name-node>
  7. <configuration>
  8. <property>
  9. <name>mapred.job.queue.name</name>
  10. <value>${queueName}</value>
  11. </property>
  12. </configuration>
  13. <exec>${EXEC}</exec>
  14. <file>${EXEC}#${EXEC}</file>
  15. <capture-output/>
  16. </shell>
  17. <ok to="end"/>
  18. <error to="fail"/>
  19. </action>
  20. <kill name="fail">
  21. <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  22. </kill>
  23. <end name="end"/>
  24. </workflow-app>

执行

  1. $ # 上传任务 -f 覆盖
  2. $ ~/Documents/hadoop/bin/hadoop fs -put oozie-apps/shell-script oozie-apps/
  3. $ # 查看目录信息
  4. $ ~/Documents/hadoop/bin/hadoop fs -ls /user/jack/oozie-apps/shell-script/
  5. Found 3 items
  6. -rw-r--r-- 1 jack supergroup 131 2020-04-21 02:30 /user/jack/oozie-apps/shell-script/batch.sh
  7. -rw-r--r-- 1 jack supergroup 1164 2020-04-21 02:30 /user/jack/oozie-apps/shell-script/job.properties
  8. -rw-r--r-- 1 jack supergroup 1640 2020-04-21 02:30 /user/jack/oozie-apps/shell-script/workflow.xml
  9. $ export OOZIE_URL="http://192.168.32.130:11000/oozie"
  10. $ # 执行工作流任务
  11. $ bin/oozie job -config oozie-apps/shell-script/job.properties -run
  12. job: 0000001-200420185350972-oozie-jack-W
  13. $ # 检查工作流任务状态,也可以通过Oozie web控制台检查工作流作业状态,http://localhost:11000/oozie
  14. $ bin/oozie job -info 0000001-200420185350972-oozie-jack-W
  15. Job ID : 0000001-200420185350972-oozie-jack-W
  16. ------------------------------------------------------------------------------------------------------------------------------------
  17. Workflow Name : shell-wf
  18. App Path : hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell-script
  19. Status : SUCCEEDED
  20. Run : 0
  21. User : jack
  22. Group : -
  23. Created : 2020-04-21 09:32 GMT
  24. Started : 2020-04-21 09:32 GMT
  25. Last Modified : 2020-04-21 09:32 GMT
  26. Ended : 2020-04-21 09:32 GMT
  27. CoordAction ID: -
  28. Actions
  29. ------------------------------------------------------------------------------------------------------------------------------------
  30. ID Status Ext ID Ext Status Err Code
  31. ------------------------------------------------------------------------------------------------------------------------------------
  32. 0000001-200420185350972-oozie-jack-W@:start: OK - OK -
  33. ------------------------------------------------------------------------------------------------------------------------------------
  34. 0000001-200420185350972-oozie-jack-W@shell-node OK job_1586921478592_0022 SUCCEEDED -
  35. ------------------------------------------------------------------------------------------------------------------------------------
  36. 0000001-200420185350972-oozie-jack-W@end OK - OK -
  37. ------------------------------------------------------------------------------------------------------------------------------------
  38. $ # 查看输出内容
  39. $ cat /tmp/oozie/workflow.log
  40. Hello! It's time to run. [Tue Apr 21 02:33:44 PDT 2020]

注意:当使用使用完全分布式集群时,该任务由Yarn的resource manager分配容器执行,具体运行位置信息,需要查看JobHistory。

逻辑调度多个shell任务

复制 oozie-apps/shell-script 文件夹,创建 log.sh 脚本。
使用多action,创建多个shell脚本连续运行的工作流任务。

  1. $ cp oozie-apps/shell-script oozie-apps/shell-scripts
  2. $ ls oozie-apps/shell-scripts
  3. batch.sh job.properties workflow.xml
  4. $ vi oozie-apps/shell-scripts/log.sh
  5. #!/bin/bash
  6. # echo -n 末尾不自动换行
  7. echo "bytes length of current file: `cat /tmp/oozie/workflow.log | wc -c`" >> /tmp/oozie/workflow.log

job.properties

  1. nameNode=hdfs://192.168.32.130:8020
  2. # yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
  3. jobTracker=192.168.32.130:8032
  4. queueName=default
  5. examplesRoot=oozie-apps
  6. # 执行应用程序信息路径 hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell-scripts
  7. oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell-scripts
  8. EXEC1=batch.sh
  9. EXEC2=log.sh

workflow.xml

  1. <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
  2. <start to="shell-node1"/>
  3. <action name="shell-node1">
  4. <shell xmlns="uri:oozie:shell-action:0.2">
  5. <job-tracker>${jobTracker}</job-tracker>
  6. <name-node>${nameNode}</name-node>
  7. <configuration>
  8. <property>
  9. <name>mapred.job.queue.name</name>
  10. <value>${queueName}</value>
  11. </property>
  12. </configuration>
  13. <exec>${EXEC1}</exec>
  14. <file>${EXEC1}#${EXEC1}</file>
  15. <capture-output/>
  16. </shell>
  17. <ok to="shell-node2"/>
  18. <error to="fail"/>
  19. </action>
  20. <action name="shell-node2">
  21. <shell xmlns="uri:oozie:shell-action:0.2">
  22. <job-tracker>${jobTracker}</job-tracker>
  23. <name-node>${nameNode}</name-node>
  24. <configuration>
  25. <property>
  26. <name>mapred.job.queue.name</name>
  27. <value>${queueName}</value>
  28. </property>
  29. </configuration>
  30. <exec>${EXEC2}</exec>
  31. <file>${EXEC2}#${EXEC2}</file>
  32. <capture-output/>
  33. </shell>
  34. <ok to="end"/>
  35. <error to="fail"/>
  36. </action>
  37. <kill name="fail">
  38. <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  39. </kill>
  40. <end name="end"/>
  41. </workflow-app>

执行

  1. $ ~/Documents/hadoop/bin/hadoop fs -put oozie-apps/shell-scripts oozie-apps/shell-scripts
  2. $ ~/Documents/hadoop/bin/hadoop fs -ls /user/jack/oozie-apps/shell-scripts
  3. Found 4 items
  4. -rw-r--r-- 1 jack supergroup 131 2020-04-21 04:49 /user/jack/oozie-apps/shell-scripts/batch.sh
  5. -rw-r--r-- 1 jack supergroup 1179 2020-04-21 04:49 /user/jack/oozie-apps/shell-scripts/job.properties
  6. -rw-r--r-- 1 jack supergroup 117 2020-04-21 04:49 /user/jack/oozie-apps/shell-scripts/log.sh
  7. -rw-r--r-- 1 jack supergroup 2241 2020-04-21 04:49 /user/jack/oozie-apps/shell-scripts/workflow.xml
  8. $ export OOZIE_URL="http://192.168.32.130:11000/oozie"
  9. $ bin/oozie job -config oozie-apps/shell-scripts/job.properties -run
  10. job: 0000004-200420185350972-oozie-jack-W
  11. $ bin/oozie job -info 0000004-200420185350972-oozie-jack-W
  12. Job ID : 0000004-200420185350972-oozie-jack-W
  13. ------------------------------------------------------------------------------------------------------------------------------------
  14. Workflow Name : shell-wf
  15. App Path : hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell-scripts
  16. Status : RUNNING
  17. Run : 0
  18. User : jack
  19. Group : -
  20. Created : 2020-04-21 11:51 GMT
  21. Started : 2020-04-21 11:51 GMT
  22. Last Modified : 2020-04-21 11:52 GMT
  23. Ended : -
  24. CoordAction ID: -
  25. Actions
  26. ------------------------------------------------------------------------------------------------------------------------------------
  27. ID Status Ext ID Ext Status Err Code
  28. ------------------------------------------------------------------------------------------------------------------------------------
  29. 0000004-200420185350972-oozie-jack-W@:start: OK - OK -
  30. ------------------------------------------------------------------------------------------------------------------------------------
  31. 0000004-200420185350972-oozie-jack-W@shell-node1 OK job_1586921478592_0025 SUCCEEDED -
  32. ------------------------------------------------------------------------------------------------------------------------------------
  33. 0000004-200420185350972-oozie-jack-W@shell-node2 RUNNING job_1586921478592_0026 RUNNING -
  34. ------------------------------------------------------------------------------------------------------------------------------------
  35. $ bin/oozie job -info 0000004-200420185350972-oozie-jack-W
  36. Job ID : 0000004-200420185350972-oozie-jack-W
  37. ------------------------------------------------------------------------------------------------------------------------------------
  38. Workflow Name : shell-wf
  39. App Path : hdfs://192.168.32.130:8020/user/jack/oozie-apps/shell-scripts
  40. Status : SUCCEEDED
  41. Run : 0
  42. User : jack
  43. Group : -
  44. Created : 2020-04-21 11:51 GMT
  45. Started : 2020-04-21 11:51 GMT
  46. Last Modified : 2020-04-21 11:52 GMT
  47. Ended : 2020-04-21 11:52 GMT
  48. CoordAction ID: -
  49. Actions
  50. ------------------------------------------------------------------------------------------------------------------------------------
  51. ID Status Ext ID Ext Status Err Code
  52. ------------------------------------------------------------------------------------------------------------------------------------
  53. 0000004-200420185350972-oozie-jack-W@:start: OK - OK -
  54. ------------------------------------------------------------------------------------------------------------------------------------
  55. 0000004-200420185350972-oozie-jack-W@shell-node1 OK job_1586921478592_0025 SUCCEEDED -
  56. ------------------------------------------------------------------------------------------------------------------------------------
  57. 0000004-200420185350972-oozie-jack-W@shell-node2 OK job_1586921478592_0026 SUCCEEDED -
  58. ------------------------------------------------------------------------------------------------------------------------------------
  59. 0000004-200420185350972-oozie-jack-W@end OK - OK -
  60. ------------------------------------------------------------------------------------------------------------------------------------
  61. $ cat /tmp/oozie/workflow.log
  62. Hello! It's time to run. [Tue Apr 21 04:52:10 PDT 2020]
  63. bytes length of current file: 56