8000 Simplify and unify character tracking in pdf and ps backends (with linked fonts) by sauerburger · Pull Request #15686 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Simplify and unify character tracking in pdf and ps backends (with linked fonts) #15686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 24, 2019

Conversation

sauerburger
Copy link
Member
@sauerburger sauerburger commented Nov 12, 2019

PR Summary

This PR builds on the implementation of #15320 but also considers the issues from #15629.

This PR resolves #15628.

The code produces a valid PDF with unicode and regular characters if linked fonts are used:
example.pdf CI run

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

anntzer and others added 2 commits November 9, 2019 17:09
Instead of trying to resolve font paths to absolute files and key off by
inode(!), just track fonts using whatever names they use, and simplify
used_characters to be a straight mapping of filenames to character ids
(making the attribute private -- with a backcompat shim) at the same
time).

The previous approach would avoid embedding the same file twice if it is
given under two different filenames (hardlinks to the same file...), but
it would fail if the user passes a relative path, chdir()s to another
directory, and passes another different font with the same filename,
because of the lru_cache().  None of these seem likely to happen in
practice, and in any case we can cover most of it by making the font
paths absolute before passing them to FreeType (which is going to open
the file anyways, so the cost of making them absolute doesn't matter).
@anntzer
Copy link
Contributor
anntzer commented Nov 12, 2019

Why remove the resolve() in font_manager? If two fonts are resolving to the same path, it's still good to be able to collapse them, no?

@sauerburger
Copy link
Member Author
sauerburger commented Nov 12, 2019

In the fontNames dict, the fonts are stored as they are returned by findfont(). This is the unresolved path. These names are used when the font is embedded.

In draw_text(), first _get_font_ttf() is called and thenfont_manager.get_font(), which resolves links. So, when text is drawn, it uses the resolved names.

I thought the idea of #15320 was to ignore that fact that links might resolve to the same file, and embed the font twice in this case.

If we want to collapse these fonts, we need to do this consistently when tracking the used characters and fonts.

@anntzer
Copy link
Contributor
anntzer commented Nov 12, 2019

Good point, but the idea was also to still get some of the dedupe by resolving early. I guess you can put a resolve() call (or even do a full dedupe based on st_inode (and st_dev if you want to take care of the multiple-mounted-filesystems case...)) on the output of findSystemFonts to get most of this benefits?

@sauerburger sauerburger force-pushed the unrealpath branch 3 times, most recently from b8e93dc to 5e030c8 Compare November 16, 2019 13:32
@sauerburger
Copy link
Member Author
sauerburger commented Nov 16, 2019

I've move the os.path.realpath() call to fontManager.findfont(). I cannot move it to an earlier stage, e.g. addfont() or findSystemFonts(), since this would break the directory checks done in findfont().

With this solution we get the following benefits:

  • Consistent usage of realpaths to make sure linked fonts are embedded and referenced properly.
  • Resolving symlinks to prevent duplicated embedded fonts.

The optimization is not sensitive to hard links, but I think this was already sacrificed in #15320.

My test setup with linked fonts produces valid PDFs when using this implementation:

example.pdf, CI run.
example (PDF converted to PNG)

@sauerburger sauerburger force-pushed the unrealpath branch 2 times, most recently from fe2b338 to adb8412 Compare November 16, 2019 14:23
8000
The commit ensures that the realpath of linked fonts is used across the library.
This prevents embedding linked fonts twice and makes sure that linked fonts are
referenced consistently in PDF files. The link resolution cannot happend in
addfont() since this would break the *directory* argument of findfont().
Copy link
Contributor
@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, but I have some commits in there as well.

@anntzer
Copy link
Contributor
anntzer commented Nov 20, 2019

@timhoffm You already accepted #15320, any chance you can have a look at this as well?

@timhoffm timhoffm merged commit 6f4f19e into matplotlib:master Nov 24, 2019
@timhoffm timhoffm added this to the v3.3.0 milestone Nov 24, 2019
@timhoffm
Copy link
Member

Thanks and congratulations on your first contribution to Matplotlib! Hope to see you back some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid unicode characters in PDF when font is a symlink
3 participants
0