8000 Support Delay Loading of c10.dll in when using libtorch as a thirdparty library. · Issue #105058 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Support Delay Loading of c10.dll in when using libtorch as a thirdparty library. #105058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kohillyang opened this issue Jul 12, 2023 · 7 comments
Assignees
Labels
module: abi libtorch C++ ABI related problems module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@kohillyang
Copy link
kohillyang commented Jul 12, 2023

🚀 The feature, motivation and pitch

Hi PyTorch team,

I'm currently working on a project where I have added libtorch as a third-party dependency. Due to the size of libtorch, I wanted to make it an optional dependency using delay load hooks. However, I've encountered an issue with c10.dll, which is a dependency of libtorch. It seems that c10.dll exports a global variable, causing it to be unable to be delay loaded.

Here's the error message I encountered
image

I'm wondering if it's possible to add support for delay loading c10.dll in libtorch. Having this feature would be really helpful for my project, as it will allow me to better manage the dependency loading and reduce the initial load time.

Alternatives

No response

Additional context

No response

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite

@soulitzer soulitzer added module: windows Windows support for PyTorch module: abi libtorch C++ ABI related problems triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 14, 2023
@kalaskarsanket kalaskarsanket moved this from Backlog to Groomed in PyTorch On Windows Nov 20, 2023
@mantaionut mantaionut self-assigned this Mar 12, 2024
@mantaionut mantaionut moved this from Groomed to In Progress in PyTorch On Windows Mar 12, 2024
@obitodaitu
Copy link
obitodaitu commented Jan 17, 2025

in “include\c10\macros\Export.h”,you can see as follow:
#ifdef NO_EXPORT
#undef C10_EXPORT
#define C10_EXPORT
#endif

so you need to define macro "NO_EXPORT", e.g in unreal engine 5,you need to add:
PublicDefinitions.AddRange(new[]
{
"NO_EXPORT"
});
in build.cs file.

@taras-janea taras-janea self-assigned this Apr 22, 2025
@taras-janea taras-janea moved this from Backlog to In Progress in PyTorch On Windows Apr 22, 2025
@taras-janea
Copy link
Collaborator

The following steps enable delay loading of c10.dll when using LibTorch as a third-party library:

  1. Define NO_EXPORT macro in your project’s source code, specifically in the scope where c10.dll needs to be delayed for loading.
  2. Enable delay loading in the project’s build configuration. For example, in CMake configuration snippet:
...

# Link delay loading support library
target_link_libraries(<ProjectName> PRIVATE delayimp)

# Enable delay-load for specific DLLs
set_target_properties(<ProjectName> PROPERTIES
    LINK_FLAGS "/DELAYLOAD:torch_cpu.dll /DELAYLOAD:torch.dll /DELAYLOAD:c10.dll"
)
...

@kohillyang, if the problem persists, please share:

  • The output of this script;
  • Project configuration that demonstrates the issue.

@kohillyang
Copy link
Author

@taras-janea Thank you for the prompt response and suggestions. To clarify, the core issue arises because c10.dll exports a non-function symbol (specifically, a global variable). Since delay-loading only works for function symbols (as global variables require immediate address resolution at load time), this makes c10.dll fundamentally incompatible with delay-loading mechanisms, regardless of NO_EXPORT or linker flags.

In our current setup, we’ve successfully applied delay-loading to other larger LibTorch DLLs (e.g., torch_cpu.dll), but c10.dll remains a mandatory immediate dependency due to this limitation. While c10.dll itself is small, its forced inclusion defeats the flexibility of fully optional dependency management.

Thanks again for your support!

@taras-janea
Copy link
Collaborator

@kohillyang, the NO_EXPORT macro is intended to disable the exporting of symbols, that you've described as the core issue.
I would suggest ensuring that it is properly defined.

If the problem persists, please provide a sample project that reproduces the issue using the latest PyTorch version, along with the steps outlined in the comment above.

@taras-janea taras-janea moved this from In Progress to Blocked in PyTorch On Windows Apr 29, 2025
@kohillyang
Copy link
Author

30>LINK : fatal error LNK1194: 无法延迟加载“c10_cuda.dll”,原因在于数据符号“"__declspec(dllimport) struct std::atomic<class c10::cuda::CUDACachingAllocator::CUDAAllocator *> c10::cuda::CUDACachingAllocator::allocator" (_imp?allocator@CUDACachingAllocator@cuda@c10@@3u?$atomic@PEAVCUDAAllocator@CUDACachingAllocator@cuda@c10@@@std@@A)”的导入;链接时不使用 /DELAYLOAD:c10_cuda.dll

@taras-janea Thank you for the clarification. Let me verify our understanding:

We use pre-built binaries specifically to avoid maintenance overhead from custom builds. So I want to ask if the NO_EXPORT macro needs to be defined during libtorch's compilation to prevent exporting global variables in c10.dll. Simply defining it in our application project (using pre-built libtorch binaries) cannot retroactively modify the existing DLL's export table. Is this correct?

For users of official pre-built libtorch binaries, is there any way to achieve delay-loading of c10.dll without recompiling libtorch from source? Our tests show that even with NO_EXPORT defined in our project and delay-load flags set, we still get the same error during linking.

@taras-janea
Copy link
Collaborator

@kohillyang thank you for the question.

Simply defining it in our application project (using pre-built libtorch binaries) cannot retroactively modify the existing DLL's export table. Is this correct?

Yes, that's correct. The LNK1194 error occurs because application imports a global symbol (e.g., atomic<CUDAAllocator*>) from c10_cuda.dll marked with __declspec(dllimport), which prevents delay-loading.

Defining NO_EXPORT suppresses __declspec(dllimport), allowing delay-load to work. NO_EXPORT does not modify the pre-built DLL, but prevents the __declspec(dllimport) attribute from being applied in headers, so the linker doesn’t enforce immediate resolution of data symbols.

Suggest to ensure that the macro is correctly defined throughout your project.
If the issue persists, please provide a sample project that reproduces the problem using the latest PyTorch version, along with the steps outlined in this comment.

@kohillyang
Copy link
Author

Thank you for the detailed explanation, @taras-janea. I've created a minimal reproducible example at https://github.com/kohillyang/libtorch-delay-load-example to demonstrate the issue.

After examining 7974 the linker errors, it appears that NO_EXPORT doesn't seem to be controlling all the relevant symbols in c10_cuda.dll. Even with NO_EXPORT defined globally in my project, I still get the LNK1194 error about data symbols being incompatible with delay loading.

@taras-janea taras-janea moved this from Blocked to In Progress in PyTorch On Windows May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: abi libtorch C++ ABI related problems module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: In Progress
Development

No branches or pull requests

5 participants
0