加载文件

下一步就是让 ecflow_server 知道你的 suite,或者称为加载 suite definition 文件。 这一步将检查 test.def 文件,并向 ecflow_server 描述 suite。可以通过多种方法实现,取决于 suite 的创建方式。

注意:从下面的两种方法中选择一种方法,避免两次加载 suite definition 时会出现的错误。

文本

ECF_HOME 目录执行下面的命令

  1. $ecflow_client --load=test.def --port=33083

这将检查并加载 suite definition 到 ecflow_server。如果检查失败,suite 不会被加载。 之前的章节中提到过 ecflow_client,该命令被用在 head.htail.h 头文件中。

注:请确保已经设置 ECF_HOSTECF_PORT 环境变量,否则需要在命令行中使用 --host-–port 参数。

加载 suite 后的 ecflowview

加载文件 - 图1

译者注:

为了测试另一种方式,需要删除 ecflow 服务中的 suite,使用 ecflow_client --delete 命令:

  1. $ecflow_client --host=login05 --port=33083 --delete=/test
  2. Are you sure want to delete nodes at paths:
  3. /test ? y

Python

使用 Python 脚本将 defs 写为 .def 定义文件,在 suite 定义比较复杂的时候对调试很有帮助。

  1. import os
  2. from pathlib import Path
  3. from ecflow import Defs, Suite, Task, Edit
  4. print("Creating suite definition")
  5. home = os.path.abspath(Path(Path(__file__).parent, "../../../build/course"))
  6. defs = Defs(
  7. Suite('test',
  8. Edit(ECF_HOME=home),
  9. Task('t1')))
  10. print(defs)
  11. print("Checking job creation: .ecf -> .job0")
  12. print(defs.check_job_creation())
  13. print("Saving definition to file 'test.def'")
  14. defs.save_as_defs(str(Path(home, "test.def")))
  15. # To restore the definition from file 'test.def' we can use:
  16. # restored_defs = ecflow.Defs("test.def")

输出结果:

  1. $python test.py
  2. Creating suite definition
  3. # 4.8.0
  4. suite test
  5. edit ECF_HOME 'PROJECT_ROOT/build/course'
  6. task t1
  7. endsuite
  8. Checking job creation: .ecf -> .job0
  9. Saving definition to file 'test.def'

运行上述脚本后,生成的 def 文件内容如下:

  1. # 4.8.0
  2. suite test
  3. edit ECF_HOME 'PROJECT_ROOT/build/course'
  4. task t1
  5. endsuite

可以使用上面的方法通过命令行接口加载 def 文件。

因为 Suite Definition API 在内存中建立 definition,所以可以直接加载到 ecflow_server 中,用 ecflow.Client 类实现。

在之前的 test.py 文件结尾添加如下代码:

  1. try:
  2. print("Load the in memory definition(defs) into the server")
  3. ci = Client('login05', '33083')
  4. ci.load(defs)
  5. except RuntimeError as e:
  6. print("Failed:", e)

运行修改后的代码:

  1. $ python test.py
  2. Creating suite definition
  3. # 4.0.9
  4. suite test
  5. edit ECF_HOME '/home/windroc/course'
  6. task t1
  7. endsuite
  8. Checking job creation: .ecf -> .job0
  9. Saving definition to file 'test.def'
  10. Load the in memory definition(defs) into the server

ecflow_ui 中可以看到 test 已加载到 ecflow 服务中。

加载文件 - 图2

此时服务处于 halted 状态,需要手动开启服务。

  1. $ecflow_client --begin --host=login05 --port=33083

因为没有添加任何触发机制,在服务开始后,t1 就会立即执行,执行结束后的状态如下:

加载文件 - 图3

强烈建议创建 suite definition 和将它加载到服务器的过程分开。加载过程应该放到一个文件中,例如下面的 client.py 文件。

  1. import os
  2. from pathlib import Path
  3. import ecflow
  4. home = os.path.abspath(Path(Path(__file__).parent, "../../../build/course"))
  5. try:
  6. print("Loading definition in 'test.def' into the server")
  7. ci = ecflow.Client('login05', '33083')
  8. ci.load(str(Path(home, "test.def"))) # read definition from disk and load into the server
  9. except RuntimeError as e:
  10. print("Failed:", e)

测试结果:

  1. $python test_client.py
  2. Loading definition in 'test.def' into the server

如果一切顺利,server 中应该有 suite 的定义。ecflow_server 的日志文件中会记录 ecflow_client 的调用情况:

  1. MSG:[14:07:06 30.1.2018] --delete yes /test :wangdp
  2. ...
  3. MSG:[14:08:05 30.1.2018] --load=/g3/wangdp/project/study/ecflow/ecflow-tutorial-code/build/course/test.def :wangdp
  4. ...
  5. MSG:[14:09:14 30.1.2018] --begin :wangdp
  6. LOG:[14:09:14 30.1.2018] queued: /test
  7. LOG:[14:09:14 30.1.2018] queued: /test/t1
  8. LOG:[14:09:14 30.1.2018] queued: /
  9. LOG:[14:09:14 30.1.2018] submitted: /test/t1 job_size:1843
  10. LOG:[14:09:14 30.1.2018] submitted: /test
  11. LOG:[14:09:14 30.1.2018] submitted: /
  12. MSG:[14:09:14 30.1.2018] chd:init /test/t1
  13. LOG:[14:09:14 30.1.2018] active: /test/t1
  14. LOG:[14:09:14 30.1.2018] active: /test
  15. LOG:[14:09:14 30.1.2018] active: /
  16. MSG:[14:09:14 30.1.2018] chd:complete /test/t1
  17. LOG:[14:09:14 30.1.2018] complete: /test/t1
  18. LOG:[14:09:14 30.1.2018] complete: /test
  19. LOG:[14:09:14 30.1.2018] complete: /
  20. ...

任务

  1. 加载定义文件。从上述两种方法中选择一种。
  2. 如果使用python方式,检查 test.def 并创建 client.py
  3. 查看日志文件

备注

如果遇到重复加载两次 suite 的错误,可以删掉服务器中的 suite

  1. ecflow_client --delete /test

然后 definition 可以被重新加载。

或者使用 replace 替换 suite,请查看相关帮助

  1. $ecflow_client --help replace
  2. replace
  3. -------
  4. Replaces a node in the server, with the given path
  5. Can also be used to add nodes in the server
  6. arg1 = path to node
  7. must exist in the client defs(arg2). This is also the node we want to
  8. replace in the server
  9. arg2 = path to client definition file
  10. provides the definition of the new node
  11. arg3 = (optional) [ parent | false ] (default = parent)
  12. create parent families or suite as needed, when arg1 does not
  13. exist in the server
  14. arg4 = (optional) force (default = false)
  15. Force the replacement even if it causes zombies to be created
  16. Replace can fail if:
  17. - The node path(arg1) does not exist in the provided client definition(arg2)
  18. - The client definition(arg2) must be free of errors
  19. - If the third argument is not provided, then node path(arg1) must exist in the server
  20. - Nodes to be replaced are in active/submitted state, in which case arg4(force) can be used
  21. After replace is done, we check trigger expressions. These are reported to standard output.
  22. It is up to the user to correct invalid trigger expressions, otherwise the tasks will *not* run.
  23. Please note, you can use --check to check trigger expression and limits in the server.
  24. For more information use --help check.
  25. Usage:
  26. --replace=/suite/f1/t1 /tmp/client.def parent # Add/replace node tree /suite/f1/t1
  27. --replace=/suite/f1/t1 /tmp/client.def false force # replace t1 even if its active or submitted
  28. The client reads in the following environment variables. These are read by user and child command
  29. |----------|----------|------------|-------------------------------------------------------------------|
  30. | Name | Type | Required | Description |
  31. |----------|----------|------------|-------------------------------------------------------------------|
  32. | ECF_NODE | <string> | Mandatory* | The host name of the main server. defaults to 'localhost' |
  33. | ECF_PORT | <int> | Mandatory* | The TCP/IP port to call on the server. Must be unique to a server |
  34. |----------|----------|------------|-------------------------------------------------------------------|
  35. * The host and port must be specified in order for the client to communicate with the server, this can
  36. be done by setting ECF_NODE, ECF_PORT or by specifying --host <host> --port <int> on the command line

简要说明

  1. # 替换整个 suite
  2. ecflow_client --replace=/test test.def
  3. # 或者替换单个节点
  4. ecflow_client --replace=/test/f1/t1 test.def