8000 Enable multicore CUDA compilation by sisakat · Pull Request #24382 · opencv/opencv · GitHub
[go: up one dir, main page]

Skip to content

Enable multicore CUDA compilation #24382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 19, 2023
Merged

Conversation

sisakat
Copy link
Contributor
@sisakat sisakat commented Oct 10, 2023

CUDA source files are compiled single threaded. The option --threads was introduced in NVCC 11.2. The option specifies the number of threads to be used for compilation (see NVIDIA NVCC Documentation).

With CMake 3.12 the environment variable CMAKE_BUILD_PARALLEL_LEVEL was introduced (see CMake Documentation). This variable is used to set the NVCC --threads option.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-p 8000 ull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Specify the number of threads to be used for compilation to the CUDA compiler NVCC. Option introduced in NVCC 11.2.
Parallel builds with GNU Make run multiple instances of NVCC, therefor there is no need to use the internal parallelism of NVCC.
msbuild compiles the CUDA files sequentially. Enable the internal parallelism of NVCC if msbuild is used.
Copy link
Contributor
@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov
Copy link
Contributor

@cudawarped Could you take a look again.

@cudawarped
Copy link
Contributor

@asmorkalov I've had a quick look and it seems that there is a significant improvement in build time when using this modification with Visual Studio.

Below is a table of the build time (mm:ss) for cudaimgproc and its dependancies on Windows and Ubuntu for two binary and one ptx output. The build time for VS when setting the CMAKE_BUILD_PARALLEL_LEVEL environmental variable to the number of "threads" available on my CPU is reduced from 22:50 to 14:36 which is significant. That said the build time is still painfully slow compared to Ninja (04:00) so although this PR is a valid addition I still would recommend Ninja over VS to anyone building OpenCV with CUDA on Windows.

Windows     Ubuntu    
VS VS (CMAKE_BUILD_PARALLEL_LEVEL=20) Ninja Make Make -j20 Ninja
22:50 14:36 04:00 21:26 03:46 03:44

@asmorkalov asmorkalov added this to the 4.9.0 milestone Oct 19, 2023
@asmorkalov asmorkalov merged commit 5bffcdf into opencv:4.x Oct 19, 2023
@asmorkalov asmorkalov mentioned this pull request Nov 3, 2023
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Enable multicore CUDA compilation opencv#24382

CUDA source files are compiled single threaded. The option `--threads` was introduced in NVCC 11.2. The option specifies the number of threads to be used for compilation (see [NVIDIA NVCC Documentation](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#threads-number-t)).

With CMake 3.12 the environment variable `CMAKE_BUILD_PARALLEL_LEVEL` was introduced (see [CMake Documentation](https://cmake.org/cmake/help/latest/envvar/CMAKE_BUILD_PARALLEL_LEVEL.html)). This variable is used to set the NVCC `--threads` option.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Enable multicore CUDA compilation opencv#24382

CUDA source files are compiled single threaded. The option `--threads` was introduced in NVCC 11.2. The option specifies the number of threads to be used for compilation (see [NVIDIA NVCC Documentation](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#threads-number-t)).

With CMake 3.12 the environment variable `CMAKE_BUILD_PARALLEL_LEVEL` was introduced (see [CMake Documentation](https://cmake.org/cmake/help/latest/envvar/CMAKE_BUILD_PARALLEL_LEVEL.html)). This variable is used to set the NVCC `--threads` option.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Enable multicore CUDA compilation opencv#24382

CUDA source files are compiled single threaded. The option `--threads` was introduced in NVCC 11.2. The option specifies the number of threads to be used for compilation (see [NVIDIA NVCC Documentation](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#threads-number-t)).

With CMake 3.12 the environment variable `CMAKE_BUILD_PARALLEL_LEVEL` was introduced (see [CMake Documentation](https://cmake.org/cmake/help/latest/envvar/CMAKE_BUILD_PARALLEL_LEVEL.html)). This variable is used to set the NVCC `--threads` option.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0