Enable multicore CUDA compilation #24382

sisakat · 2023-10-10T12:10:13Z

CUDA source files are compiled single threaded. The option --threads was introduced in NVCC 11.2. The option specifies the number of threads to be used for compilation (see NVIDIA NVCC Documentation).

With CMake 3.12 the environment variable CMAKE_BUILD_PARALLEL_LEVEL was introduced (see CMake Documentation). This variable is used to set the NVCC --threads option.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-p 8000 ull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

Specify the number of threads to be used for compilation to the CUDA compiler NVCC. Option introduced in NVCC 11.2.

cmake/OpenCVDetectCUDA.cmake

Parallel builds with GNU Make run multiple instances of NVCC, therefor there is no need to use the internal parallelism of NVCC.

msbuild compiles the CUDA files sequentially. Enable the internal parallelism of NVCC if msbuild is used.

asmorkalov

👍

asmorkalov · 2023-10-17T13:03:34Z

@cudawarped Could you take a look again.

cudawarped · 2023-10-19T08:04:21Z

@asmorkalov I've had a quick look and it seems that there is a significant improvement in build time when using this modification with Visual Studio.

Below is a table of the build time (mm:ss) for cudaimgproc and its dependancies on Windows and Ubuntu for two binary and one ptx output. The build time for VS when setting the CMAKE_BUILD_PARALLEL_LEVEL environmental variable to the number of "threads" available on my CPU is reduced from 22:50 to 14:36 which is significant. That said the build time is still painfully slow compared to Ninja (04:00) so although this PR is a valid addition I still would recommend Ninja over VS to anyone building OpenCV with CUDA on Windows.

Windows			Ubuntu
VS	VS (CMAKE_BUILD_PARALLEL_LEVEL=20)	Ninja	Make	Make -j20	Ninja
22:50	14:36	04:00	21:26	03:46	03:44

Enable multicore CUDA compilation opencv#24382 CUDA source files are compiled single threaded. The option `--threads` was introduced in NVCC 11.2. The option specifies the number of threads to be used for compilation (see [NVIDIA NVCC Documentation](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#threads-number-t)). With CMake 3.12 the environment variable `CMAKE_BUILD_PARALLEL_LEVEL` was introduced (see [CMake Documentation](https://cmake.org/cmake/help/latest/envvar/CMAKE_BUILD_PARALLEL_LEVEL.html)). This variable is used to set the NVCC `--threads` option. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Enable multicore CUDA compilation

5d84f70

Specify the number of threads to be used for compilation to the CUDA compiler NVCC. Option introduced in NVCC 11.2.

opencv-alalek added feature category: build/install category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib labels Oct 10, 2023

asmorkalov reviewed Oct 12, 2023

View reviewed changes

cmake/OpenCVDetectCUDA.cmake Show resolved Hide resolved

sisakat added 2 commits October 12, 2023 17:00

Enable NVCC parallelism only with MSVC

362aae3

Parallel builds with GNU Make run multiple instances of NVCC, therefor there is no need to use the internal parallelism of NVCC.

Enable NVCC parallelism only with msbuild

dd00c15

msbuild compiles the CUDA files sequentially. Enable the internal parallelism of NVCC if msbuild is used.

asmorkalov approved these changes Oct 17, 2023

View reviewed changes

cudawarped approved these changes Oct 19, 2023

View reviewed changes

asmorkalov assigned cudawarped Oct 19, 2023

asmorkalov added this to the 4.9.0 milestone Oct 19, 2023

asmorkalov merged commit 5bffcdf into opencv:4.x Oct 19, 2023

asmorkalov mentioned this pull request Nov 3, 2023

(5.x) Merge 4.x #24486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable multicore CUDA compilation #24382

Enable multicore CUDA compilation #24382

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable multicore CUDA compilation #24382

Enable multicore CUDA compilation #24382

Uh oh!

Conversation

Uh oh!

Pull Request Readiness Checklist

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!