13.3 使用OpenMP并行化交叉编译Windows二进制文件

NOTE:此示例代码可以在 https://github.com/dev-cafe/cmake-cookbook/tree/v1.0/chapter-13/recipe-02 中找到,其中包含一个C++示例和Fortran示例。该示例在CMake 3.5版(或更高版本)中是有效的,并且已经在GNU/Linux、macOS和Windows上进行过测试。

在这个示例中,我们将交叉编译一个OpenMP并行化的Windows二进制文件。

准备工作

我们将使用第3章第5节中的未修改的源代码,示例代码将所有自然数加到N (example.cpp):

  1. #include <iostream>
  2. #include <omp.h>
  3. #include <string>
  4. int main(int argc, char *argv[]) {
  5. std::cout << "number of available processors: " << omp_get_num_procs()
  6. << std::endl;
  7. std::cout << "number of threads: " << omp_get_max_threads() << std::endl;
  8. auto n = std::stol(argv[1]);
  9. std::cout << "we will form sum of numbers from 1 to " << n << std::endl;
  10. // start timer
  11. auto t0 = omp_get_wtime();
  12. auto s = 0LL;
  13. #pragma omp parallel for reduction(+ : s)
  14. for (auto i = 1; i <= n; i++) {
  15. s += i;
  16. }
  17. // stop timer
  18. auto t1 = omp_get_wtime();
  19. std::cout << "sum: " << s << std::endl;
  20. std::cout << "elapsed wall clock time: " << t1 - t0 << " seconds" << std::endl;
  21. return 0;
  22. }

CMakeLists.txt检测OpenMP并行环境方面基本没有变化,除了有一个额外的安装目标:

  1. # set minimum cmake version
  2. cmake_minimum_required(VERSION 3.9 FATAL_ERROR)
  3. # project name and language
  4. project(recipe-02 LANGUAGES CXX)
  5. set(CMAKE_CXX_STANDARD 11)
  6. set(CMAKE_CXX_EXTENSIONS OFF)
  7. set(CMAKE_CXX_STANDARD_REQUIRED ON)
  8. include(GNUInstallDirs)
  9. set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY
  10. ${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_LIBDIR})
  11. set(CMAKE_LIBRARY_OUTPUT_DIRECTORY
  12. ${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_LIBDIR})
  13. set(CMAKE_RUNTIME_OUTPUT_DIRECTORY
  14. ${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_BINDIR})
  15. find_package(OpenMP REQUIRED)
  16. add_executable(example example.cpp)
  17. target_link_libraries(example
  18. PUBLIC
  19. OpenMP::OpenMP_CXX
  20. )
  21. install(
  22. TARGETS
  23. example
  24. DESTINATION
  25. ${CMAKE_INSTALL_BINDIR}
  26. )

具体实施

通过以下步骤,我们将设法交叉编译一个OpenMP并行化的Windows可执行文件:

  1. 创建一个包含example.cppCMakeLists.txt的目录。

  2. 我们将使用与之前例子相同的toolchain.cmake:

    1. # the name of the target operating system
    2. set(CMAKE_SYSTEM_NAME Windows)
    3. # which compilers to use
    4. set(CMAKE_CXX_COMPILER i686-w64-mingw32-g++)
    5. # adjust the default behaviour of the find commands:
    6. # search headers and libraries in the target environment
    7. set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
    8. set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
    9. # search programs in the host environment
    10. set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
  3. CMAKE_CXX_COMPILER设置为对应的编译器(路径)。

  4. 然后,通过CMAKE_TOOLCHAIN_FILE指向工具链文件来配置代码(本例中,使用了从源代码构建的MXE编译器):

    1. $ mkdir -p build
    2. $ cd build
    3. $ cmake -D CMAKE_TOOLCHAIN_FILE=toolchain.cmake ..
    4. -- The CXX compiler identification is GNU 5.4.0
    5. -- Check for working CXX compiler: /home/user/mxe/usr/bin/i686-w64-mingw32.static-g++
    6. -- Check for working CXX compiler: /home/user/mxe/usr/bin/i686-w64-mingw32.static-g++ -- works
    7. -- Detecting CXX compiler ABI info
    8. -- Detecting CXX compiler ABI info - done
    9. -- Detecting CXX compile features
    10. -- Detecting CXX compile features - done
    11. -- Found OpenMP_CXX: -fopenmp (found version "4.0")
    12. -- Found OpenMP: TRUE (found version "4.0")
    13. -- Configuring done
    14. -- Generating done
    15. -- Build files have been written to: /home/user/cmake-recipes/chapter-13/recipe-02/cxx-example/build
  5. 构建可执行文件:

    1. $ cmake --build .
    2. Scanning dependencies of target example
    3. [ 50%] Building CXX object CMakeFiles/example.dir/example.cpp.obj
    4. [100%] Linking CXX executable bin/example.exe
    5. [100%] Built target example
  6. example.exe拷贝到Windows环境下。

  7. Windows环境下,将看到如下的输出:

    1. $ set OMP_NUM_THREADS=1
    2. $ example.exe 1000000000
    3. number of available processors: 2
    4. number of threads: 1
    5. we will form sum of numbers from 1 to 1000000000
    6. sum: 500000000500000000
    7. elapsed wall clock time: 2.641 seconds
    8. $ set OMP_NUM_THREADS=2
    9. $ example.exe 1000000000
    10. number of available processors: 2
    11. number of threads: 2
    12. we will form sum of numbers from 1 to 1000000000
    13. sum: 500000000500000000
    14. elapsed wall clock time: 1.328 seconds
  8. 正如我们所看到的,二进制文件可以在Windows上工作,而且由于OpenMP并行化,我们可以观察到加速效果!

工作原理

我们已经成功地使用一个简单的工具链进行交叉编译了一个可执行文件,并可以在Windows平台上并行执行。我们可以通过设置OMP_NUM_THREADS来指定OpenMP线程的数量。从一个线程到两个线程,我们观察到运行时从2.6秒减少到1.3秒。有关工具链文件的讨论,请参阅前面的示例。

更多信息

可以交叉编译一组目标平台(例如:Android),可以参考:https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html