8000 Optimize extensions with LTO and hidden visibility by QuLogic · Pull Request #17003 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Optimize extensions with LTO and hidden visibility #17003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 23, 2020

Conversation

QuLogic
Copy link
Member
@QuLogic QuLogic commented Apr 2, 2020

PR Summary

I've been trying this for a while, but only just figured out the right incantation to get both settings working together. The main reason for hidden visibility is to fix #7537 (I hope; I don't have a way to reproduce), but I threw LTO in to make some nice space savings. That being said, LTO took some time to get into a working state, and it's possible that the manylinux1 images won't be able to do it. So this will need some extra testing to ensure it's actually usable.

This table shows the effect on stripped binary size. The first row is the original size on master, and the subsequent rows show the size change for each commit. However, note that some of the visibility changes only really had an effect on the LTO build, so they appear as 0. Overall, this would save 17% on shared libraries size, maybe more with newer gcc.

backend_agg contour image path qhull tkagg tri ft2font ttconv total
0 373480.0 85760.0 278440.0 205464.0 395096.0 31696.0 107120.0 913072.0 71360.0 2461488.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 -25632.0 -4608.0 -25312.0 -17152.0 9376.0 -32.0 -17408.0 -38432.0 -8512.0 -127712.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 -69712.0 -28928.0 -69808.0 -33024.0 -28768.0 -16664.0 -20688.0 -24944.0 1840.0 -290696.0
The savings for _unstripped_ binaries are even bigger at ~54%, but unless we start shipping debug nightly builds or something, it's probably not as impactful. It might be nice for binary distros that ship debug symbols though.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • [N/A] New features are documented, with examples if plot related
  • [N/A] Documentation is sphinx and numpydoc compliant
  • [N/A] Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • [N/A] Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@QuLogic QuLogic changed the title Optmize extensions with LTO and hidden visibility Optimize extensions with LTO and hidden visibility Apr 2, 2020
@QuLogic QuLogic force-pushed the hidden-visibility branch 6 times, most recently from 18fcbe7 to 0fc338f Compare April 8, 2020 03:24
@tacaswell
Copy link
Member

Interesting. If we have to bump to newer versions of manylinux that is probably OK to do.

Now that we have switched to depending on pillow, I suspect that the manylinux docker images will "just work" for us.

@QuLogic
Copy link
Member Author
QuLogic commented Apr 8, 2020

I checked in a manylinux1 container and it was fine. I just managed to get it to work on CI, so now we just need to fix the macOS thing.

@QuLogic QuLogic force-pushed the hidden-visibility branch from 0fc338f to 60ea731 Compare April 8, 2020 21:48
@QuLogic QuLogic marked this pull request as ready for review April 9, 2020 02:09
@QuLogic QuLogic added this to the v3.3.0 milestone Apr 13, 2020
@efiring
Copy link
Member
efiring commented May 8, 2020

This is the first time I have heard of "LTO", so I googled it. Apparently it doubles compilation time, correct? And makes huge .o files? But in the end, the .so files are smaller? I'm not opposed, just curious.

What is the "macOS thing" that needs to be fixed?

@QuLogic
Copy link
Member Author
QuLogic commented May 8, 2020

It definitely increases compile time, though I never timed it to see how much. Object files seem to end up smaller, actually.

The macOS thing is fixed; the clang wrapper just outputs versions in a different case, so I added a .lower() to fix it.

@tacaswell
Copy link
Member

It definitely increases compile time

We should document that ccache exists someplace...

@tacaswell
Copy link
Member

Do we expect this to make things faster or just smaller (both are good)?

It might also be interesting to try using PGO (based on running the test suite and building the docs?) to see if we can get any speed ups (but that is clearly out of scope here).

@QuLogic
Copy link
Member Author
QuLogic commented May 9, 2020

Theoretically, it could be faster, as it does optimization at the end as a group and can decide to inline things or throw away other indirections. However, I don't see any real change in the benchmarks we have right now. Possibly because most of our extensions are nearly-single files anyway.

cppflags.append(os.environ['CPPFLAGS'])
cxxflags = []
if 'CXXFLAGS' in os.environ:
cxxflags.append(os.environ['CXXFLAGS'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT distutils does not read this; is it passed to anyone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's returned in the new environment, so it can be applied to external things, like FreeType, or qhull.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was confused, but perhaps you can document this a bit better? even just naming the thing e.g. get_env_for_custom_build() would be enough, I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all in BuildExtraLibraries, so that's a little bit redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a docstring, but also note that extra_compile_args is used for internal extensions. The environment variables are for the vendored code.

QuLogic added 6 commits May 23, 2020 01:50
When compiling with -fvisibility=hidden, a non-explicit initialization
seems to produce an implicit constructor, or at least one that is not
available in older libstdc++. When mixing a newer native compiler with
an older libstdc++ (such as in a conda environment), this causes the
library to fail to load from missing symbols.
Note that using `__attribute__((visibility("default")))` on individual
functions is the recommended way to expose things (e.g., for module
_init), but the way `PyMODINIT_FUNC` is defined does not allow that to
be used. Instead, use `#pragma`s to toggle the visibility setting.
This saves a little bit more in _contour.
We don't need shared FreeType, so save a bit of compile/link time by not
creating it.
And rename to `add_optimization_flags`.
@QuLogic QuLogic force-pushed the hidden-visibility branch from c70ea82 to fa5059d Compare May 23, 2020 05:51
@anntzer anntzer merged commit 786d18a into matplotlib:master May 23, 2020
@anntzer anntzer added the Build label May 23, 2020
@QuLogic QuLogic deleted the hidden-visibility branch May 23, 2020 08:34
@isuruf
Copy link
Contributor
isuruf commented Jul 22, 2020

This PR makes a few assumptions that are not true.

  1. If RANLIB env variable is set it is gcc-ranlib
  2. If RANLIB env variable is set, AR env variable is set
  3. If AR env variable is set, it is gcc-ar.

@QuLogic
Copy link
Member Author
QuLogic commented Jul 22, 2020
  • If RANLIB env variable is set it is gcc-ranlib
  • If RANLIB env variable is set, AR env variable is set
  • If AR env variable is set, it is gcc-ar.

No, the assumption is that if RANLIB env var is set, then it's set by someone who know better, and shouldn't be overridden. There's no way for us to figure out what really should be used instead if it's in some custom spot/name.

Now possibly, if we set RANLIB, we should also set AR, but again, not if they're already set.

@dopplershift
Copy link
Contributor

It was enough to fix the conda-forge build to:

export AR = $GCC_AR
export RANLIB = $GCC_RANLIB

6DAF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conflict between different AGG static libraries in a same binary
6 participants
0