-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimize extensions with LTO and hidden visibility #17003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
18fcbe7
to
0fc338f
Compare
Interesting. If we have to bump to newer versions of manylinux that is probably OK to do. Now that we have switched to depending on pillow, I suspect that the manylinux docker images will "just work" for us. |
I checked in a |
0fc338f
to
60ea731
Compare
This is the first time I have heard of "LTO", so I googled it. Apparently it doubles compilation time, correct? And makes huge .o files? But in the end, the .so files are smaller? I'm not opposed, just curious. What is the "macOS thing" that needs to be fixed? |
It definitely increases compile time, though I never timed it to see how much. Object files seem to end up smaller, actually. The macOS thing is fixed; the clang wrapper just outputs versions in a different case, so I added a |
We should document that |
Do we expect this to make things faster or just smaller (both are good)? It might also be interesting to try using PGO (based on running the test suite and building the docs?) to see if we can get any speed ups (but that is clearly out of scope here). |
Theoretically, it could be faster, as it does optimization at the end as a group and can decide to inline things or throw away other indirections. However, I don't see any real change in the benchmarks we have right now. Possibly because most of our extensions are nearly-single files anyway. |
cppflags.append(os.environ['CPPFLAGS']) | ||
cxxflags = [] | ||
if 'CXXFLAGS' in os.environ: | ||
cxxflags.append(os.environ['CXXFLAGS']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT distutils does not read this; is it passed to anyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's returned in the new environment, so it can be applied to external things, like FreeType, or qhull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was confused, but perhaps you can document this a bit better? even just naming the thing e.g. get_env_for_custom_build()
would be enough, I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all in BuildExtraLibraries
, so that's a little bit redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring, but also note that extra_compile_args
is used for internal extensions. The environment variables are for the vendored code.
When compiling with -fvisibility=hidden, a non-explicit initialization seems to produce an implicit constructor, or at least one that is not available in older libstdc++. When mixing a newer native compiler with an older libstdc++ (such as in a conda environment), this causes the library to fail to load from missing symbols.
Note that using `__attribute__((visibility("default")))` on individual functions is the recommended way to expose things (e.g., for module _init), but the way `PyMODINIT_FUNC` is defined does not allow that to be used. Instead, use `#pragma`s to toggle the visibility setting.
This saves a little bit more in _contour.
We don't need shared FreeType, so save a bit of compile/link time by not creating it.
And rename to `add_optimization_flags`.
c70ea82
to
fa5059d
Compare
This PR makes a few assumptions that are not true.
|
No, the assumption is that if Now possibly, if we set |
It was enough to fix the conda-forge build to: export AR = $GCC_AR
export RANLIB = $GCC_RANLIB |
PR Summary
I've been trying this for a while, but only just figured out the right incantation to get both settings working together. The main reason for hidden visibility is to fix #7537 (I hope; I don't have a way to reproduce), but I threw LTO in to make some nice space savings. That being said, LTO took some time to get into a working state, and it's possible that the
manylinux1
images won't be able to do it. So this will need some extra testing to ensure it's actually usable.This table shows the effect on stripped binary size. The first row is the original size on master, and the subsequent rows show the size change for each commit. However, note that some of the visibility changes only really had an effect on the LTO build, so they appear as 0. Overall, this would save 17% on shared libraries size, maybe more with newer gcc.
PR Checklist