8000 RFC: Use AdvancedDLSupport instead of P/Invoke or GetFunctionPointerForDelegate · Issue #987 · pythonnet/pythonnet · GitHub
[go: up one dir, main page]

Skip to content

RFC: Use AdvancedDLSupport instead of P/Invoke or GetFunctionPointerForDelegate #987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
filmor opened this issue Nov 13, 2019 · 23 comments
Closed

Comments

@filmor
Copy link
Member
filmor commented Nov 13, 2019

While investigating how to proceed on binding libpython I stumbled over this post: https://medium.com/@jarl.gullberg/an-introduction-to-adl-or-how-to-double-your-native-net-interop-performance-c008e4da54db

It showcases https://github.com/Firwood-Software/AdvancedDLSupport and also benchmarks the individual options, which look quite devastating for the combination GetFunctionPointerForDelegate+Mono and at least "concerning" for the other frameworks.

The library looks really slick and solves exactly our problem while (seemingly) delivering slightly better performance. It has however the catch of being LGPL. While this is perfectly fine for me, this means there might be a problem for applications trying to embed Python.Runtime ahead-of-time compiled or publishing to a closed app-store that prevents you from exchanging the LGPLed library locally.

Comments?

@lostmsu
Copy link
Member
lostmsu commented Nov 13, 2019

I am a bit concerned about

Additionally, .NET Core lacks a way to set the unmanaged calling convention at runtime, resulting in unreliable results when running as a 32-bit process on Windows. This is, fortunately, a very uncommon configuration.

I don't know how much is this applicable to our users. We should discuss at the next call.

Additionally, reading this I wonder if the calling convention is the reason for crash in #980 @Sngmng

@filmor
Copy link
Member Author
filmor commented Nov 14, 2019

This is a pure .NET Core problem AFAIK (and the article is from 2018, it could be that this is fixed on .NET Core 3). Also ADL allows you to choose the style and fall back to delegates for unsupported cases.

@thesn10
Copy link
Contributor
thesn10 commented Nov 14, 2019

It has however the catch of being LGPL

Well they mentioned if you dont want LGPL you can still ask them for a custom license:
"For companies, established open-source projects, or individuals wanting a custom license, please chuck us an email."

Additionally, reading this I wonder if the calling convention is the reason for crash in #980 @Sngmng

This could be the issue, maybe. Should i try to use LoadLibrary and GetProcAdress instead?

@lostmsu
Copy link
Member
lostmsu commented Nov 14, 2019

@Sngmng you can try fiddling with DllImportAttribute.CallingConvention

@lostmsu
Copy link
Member
lostmsu commented Nov 22, 2019

I started prototyping Python C API interop layer: https://github.com/losttech/Python.Runtime.Native/tree/InterfaceBasedApiSets

Run my own tests on Windows for DllImport vs GetDelegateForFunctionPointer and results came a bit surprising: GetDelegateForFunctionPointer is 2x faster than DllImport on CoreRT and 2x slower on regular .NET Framework (for a function with no return value and a pointer parameter).

Method Runtime Mean Error StdDev Median Ratio RatioSD
RefCounting_PInvoke .NET 4.8 2,473.1 us 48.61 us 71.26 us 2,486.5 us 1.00 0.00
RefCounting_Marshal .NET 4.8 4,791.3 us 95.56 us 137.05 us 4,790.6 us 1.94 0.07
RefCounting_PInvoke CoreRt 3.0 2,164.5 us 313.71 us 924.97 us 1,617.4 us 1.00 0.00
RefCounting_Marshal CoreRt 3.0 968.2 us 20.69 us 50.35 us 956.8 us 0.55 0.15

@lostmsu
Copy link
Member
lostmsu commented Nov 22, 2019

Hm, nevermind, longer reruns normalize it to 1.1-2.0x on all platforms.

Method Runtime Mean Error StdDev Median Ratio RatioSD
RefCounting_PInvoke .NET 4.8 9.213 ms 0.2156 ms 0.6256 ms 9.068 ms 1.00 0.00
RefCounting_Marshal .NET 4.8 19.699 ms 0.3926 ms 0.3480 ms 19.612 ms 2.16 0.09
RefCounting_PInvoke .NET Core 3.0 5.796 ms 0.1706 ms 0.5003 ms 5.595 ms 1.00 0.00
RefCounting_Marshal .NET Core 3.0 12.411 ms 0.2258 ms 0.1885 ms 12.367 ms 2.20 0.15
RefCounting_PInvoke CoreRt 3.0 6.837 ms 0.4437 ms 1.2803 ms 6.317 ms 1.00 0.00
RefCounting_Marshal CoreRt 3.0 7.315 ms 0.3562 ms 1.0106 ms 6.831 ms 1.10 0.23

For some reason BenchmarkDotNet did not pick up my local Mono installation.

@lostmsu
Copy link
Member
lostmsu commented Nov 22, 2019

After some tweaking this is what I got (Calli through AdvancedDLSupport):

Method Runtime Mean Error StdDev Median Ratio RatioSD
RefCounting_PInvoke .NET 4.8 9.622 ms 0.2937 ms 0.8475 ms 9.390 ms 1.00 0.00
RefCounting_Marshal .NET 4.8 9.990 ms 0.1981 ms 0.5218 ms 9.844 ms 1.04 0.10
RefCounting_Calli .NET 4.8 9.095 ms 0.2344 ms 0.6837 ms 9.008 ms 0.95 0.11
RefCounting_PInvoke .NET Core 3.0 5.792 ms 0.1279 ms 0.3648 ms 5.698 ms 1.00 0.00
RefCounting_Marshal .NET Core 3.0 11.578 ms 0.3002 ms 0.2948 ms 11.523 ms 2.00 0.13
RefCounting_Calli .NET Core 3.0 7.464 ms 0.1471 ms 0.2537 ms 7.458 ms 1.31 0.07
RefCounting_PInvoke CoreRt 3.0 6.348 ms 0.1827 ms 0.5359 ms 6.270 ms 1.00 0.00
RefCounting_Marshal CoreRt 3.0 7.036 ms 0.2673 ms 0.7363 ms 6.820 ms 1.12 0.14
RefCounting_Calli CoreRt 3.0 NA NA NA NA ? ?

@filmor
Copy link
Member Author
filmor commented Nov 22, 2019

Thank you very much for running these :)

What is the difference between CoreRt 3.0 and .NET Core 3.0 here?

/edit: nvm, it's clear now :)

@sekkit
Copy link
sekkit commented Nov 23, 2019

This might solve the problem of Unity AOT compatibility issue

@lostmsu
Copy link
Member
lostmsu commented Nov 24, 2019

@sekkit which issue are you referring to?

BTW, I prototyped and benchmarked a simple PInvoke generator called EmitBinder, that uses Reflection.Emit (for AOT, Marshal.GetDelegateForFunctionPointer seem to be a necessary fallback). Added it to the benchmark:

Method Runtime Mean Error StdDev Median Ratio RatioSD
RefCounting_PInvoke .NET 4.8 10.157 ms 0.3036 ms 0.8759 ms 9.977 ms 1.00 0.00
RefCounting_Marshal .NET 4.8 10.026 ms 0.2758 ms 0.8046 ms 9.906 ms 0.99 0.11
RefCounting_Calli .NET 4.8 9.562 ms 0.2809 ms 0.8013 ms 9.319 ms 0.95 0.11
RefCounting_EmitBinder .NET 4.8 11.460 ms 0.3938 ms 1.1612 ms 11.379 ms 1.14 0.13
RefCounting_PInvoke .NET Core 3.0 6.486 ms 0.1296 ms 0.3781 ms 6.455 ms 1.00 0.00
RefCounting_Marshal .NET Core 3.0 13.229 ms 0.2612 ms 0.6553 ms 13.154 ms 2.04 0.14
RefCounting_Calli .NET Core 3.0 8.777 ms 0.2472 ms 0.7288 ms 8.592 ms 1.36 0.14
RefCounting_EmitBinder .NET Core 3.0 7.168 ms 0.1483 ms 0.4374 ms 7.177 ms 1.11 0.10
RefCounting_PInvoke CoreRt 3.0 7.009 ms 0.2007 ms 0.5853 ms 6.853 ms 1.00 0.00
RefCounting_Marshal CoreRt 3.0 7.617 ms 0.3072 ms 0.9011 ms 7.422 ms 1.10 0.16
RefCounting_Calli CoreRt 3.0 NA NA NA NA ? ?
RefCounting_EmitBinder CoreRt 3.0 NA NA NA NA ? ?

@lostmsu
Copy link
Member
lostmsu commented Nov 24, 2019

Finally, tested on Linux x64 (this one is in VM, so results might be inaccurate):

Method Runtime Mean Error StdDev Ratio RatioSD
RefCounting_PInvoke .NET Core 3.0 13.396 ms 0.2672 ms 0.5274 ms 1.00 0.00
RefCounting_Marshal .NET Core 3.0 27.201 ms 0.5417 ms 1.3886 ms 2.04 0.14
RefCounting_Calli .NET Core 3.0 18.262 ms 0.3977 ms 1.1663 ms 1.38 0.10
RefCounting_EmitBinder .NET Core 3.0 14.398 ms 0.2820 ms 0.4391 ms 1.09 0.06
RefCounting_PInvoke Mono 4.776 ms 0.1375 ms 0.4010 ms 1.00 0.00
RefCounting_Marshal Mono 5.869 ms 0.1945 ms 0.5736 ms 1.24 0.18
RefCounting_Calli Mono 4.021 ms 0.1356 ms 0.3997 ms 0.85 0.12
RefCounting_EmitBinder Mono 5.851 ms 0.1605 ms 0.4732 ms 1.24 0.16

@lostmsu
Copy link
Member
lostmsu commented Nov 24, 2019

Actually, I think the way to go forward is pretty straightforward:

  1. Create a separate package like in the prototype, that exposes more or less the same interface as P/Invoke methods from the current runtime.cs
  2. Begin with Marshal.GetDelegateForFunctionPointer implementation, because it works on all platforms (the only one, that works with CoreRT), has no extra dependencies in .NET Standard 2.0, and is only really slow on .NET Core 3.0 (what a surprise!)
  3. If we have resources for better performance make special implementations with EmitBinder and/or Calli. The later does not have to use AdvancedDLSupport package, as implementation would be very similar to EmitBinder, so probably under 100 lines of code.

@filmor
Copy link
Member Author
filmor commented Nov 24, 2019

Sounds like a good plan, but I think if it's not too much work, we should have all of this directly in the main repository (at least after a bit of experimentation).

I suggest we start a modernisation branch that is supposed to be 2.5 in the end (or 3.0 if we break backwards-compatibility too much) where we do only the following:

  • Drop clrmodule and monoclr
  • Drop console
  • Drop non-.NET Standard build
  • Restructure the repository into a more standard C# style (projects Python.Runtime, Python.Runtime.Interop, embed and perf tests, helper library for python tests, all of these under a common Python.Test namespace)
  • Separate the (few) python only parts into a pythonnet directory

That way we can do major changes to the architecture without dropping all of the current PRs. As long as we don't change Python.Runtime itself significantly, most of them can probably be reapplied manually for 2.5.

@filmor
Copy link
Member Author
filmor commented Nov 27, 2019

I started this now, are you okay with me merging your work into the branch?

@sekkit
Copy link
sekkit commented Nov 28, 2019

@sekkit which issue are you referring to?

BTW, I prototyped and benchmarked a simple PInvoke generator called EmitBinder, that uses Reflection.Emit (for AOT, Marshal.GetDelegateForFunctionPointer seem to be a necessary fallback). Added it to the benchmark:

Method Runtime Mean Error StdDev Median Ratio RatioSD
RefCounting_PInvoke .NET 4.8 10.157 ms 0.3036 ms 0.8759 ms 9.977 ms 1.00 0.00
RefCounting_Marshal .NET 4.8 10.026 ms 0.2758 ms 0.8046 ms 9.906 ms 0.99 0.11
RefCounting_Calli .NET 4.8 9.562 ms 0.2809 ms 0.8013 ms 9.319 ms 0.95 0.11
RefCounting_EmitBinder .NET 4.8 11.460 ms 0.3938 ms 1.1612 ms 11.379 ms 1.14 0.13
RefCounting_PInvoke .NET Core 3.0 6.486 ms 0.1296 ms 0.3781 ms 6.455 ms 1.00 0.00
RefCounting_Marshal .NET Core 3.0 13.229 ms 0.2612 ms 0.6553 ms 13.154 ms 2.04 0.14
RefCounting_Calli .NET Core 3.0 8.777 ms 0.2472 ms 0.7288 ms 8.592 ms 1.36 0.14
RefCounting_EmitBinder .NET Core 3.0 7.168 ms 0.1483 ms 0.4374 ms 7.177 ms 1.11 0.10
RefCounting_PInvoke CoreRt 3.0 7.009 ms 0.2007 ms 0.5853 ms 6.853 ms 1.00 0.00
RefCounting_Marshal CoreRt 3.0 7.617 ms 0.3072 ms 0.9011 ms 7.422 ms 1.10 0.16
RefCounting_Calli CoreRt 3.0 NA NA NA NA ? ?
RefCounting_EmitBinder CoreRt 3.0 NA NA NA NA ? ?

#997
@filmor Would u consider removing Reflection.Emit in Modernisation branch?

@lostmsu
Copy link
Member
lostmsu commented Nov 28, 2019

@sekkit would Unity AOT allow choosing Emit vs non-Emit implementation at runtime? The former is required for better performance.

@sekkit
Copy link
sekkit commented Nov 28, 2019

@sekkit would Unity AOT allow choosing Emit vs non-Emit implementation at runtime? The former is required for better performance.

Reflectoin.Emit is not allowed. The above fork I posted: https://github.com/joonhwan/pythonnet is a AOT compatible implementation. No reflection overhead( fast-call branch)

@lostmsu
Copy link
Member
lostmsu commented Nov 28, 2019

@sekkit it still relies on DllImport, which we want to stop depending on.

@filmor
Copy link
Member Author
filmor commented Nov 28, 2019

Well, we can still depend on it for Mono and .NET Core as both implement the __Internal DllImport target. It's just .NET Framework that requires a different approach.

@lostmsu
Copy link
Member
lostmsu commented Nov 28, 2019

@filmor I started this now, are you okay with me merging your work into the branch?

Sure, go ahead.
But which work are you referring to?

@lostmsu
Copy link
Member
lostmsu commented Nov 28, 2019
8000

.NET Core as both implement the __Internal DllImport

Wait, on Windows too?

@filmor
Copy link
Member Author
filmor commented Nov 28, 2019

Ah, sorry, I again fell for CoreRt vs CoreClr :/

Nevertheless, Mono and CoreRt support __Internal and CoreClr has at least an open issue on this (https://github.com/dotnet/coreclr/issues/9038) and https://github.com/dotnet/corefx/issues/32015 which (as far as I understand) allows us to just hook into the library loading mechanism to make __Internal point at the Python DLL of our liking.

@filmor filmor mentioned this issue Dec 13, 2020
4 tasks
@filmor
Copy link
Member Author
filmor commented Feb 4, 2021

Since the unmanaged delegate changes are now merged, this one is obsolete.

@filmor filmor closed this as completed Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0