-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Consistently use realpaths to build XObject names #15629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before this comment, the PDF backend used two inconsistent ways to build the name for XObjects required for Unicode glyphs above code point 255. The first method is based on the file name of the font returned by `font_manager.findfont()`. This method is used to refer to the XObject glyphs when drawing text. When the XObjects are created inside `writeFonts()` the realpath of the font returned by `cbook.get_realpath_and_stat()` is used instead. The realpath differs from the file name of the font if the font is a symbolic link. In this case, the produced PDF contains invalid references to XObjects. This commit adds a safe-guard in `PdfFile._get_xobject_symbol_name()`. The returned XObject names are always based on the realpath, even if the supplied path is a link. This achieves consistent XObject names and prevents invalid PDF files.
This does fix it, but do you know if all callers of |
The method
Since the |
While the fix seems reasonable, do we really need to resolve font realpaths? Can't we just use fonts under whatever path they are found by findfont(), without ever resolving them? Note that for a given request, findfont() should not ever return the same font under two different (symlinked) paths, so the only case we can end up with the same font embedded twice is if the user specifies two symlinks to the same font by explicit paths -- which should be rare enough, and just results in a file with the same font embedded under two different names, i.e. just a missed optimization on the file size. |
Looking at the history, two symlinks pointing to the same file was exactly why this link-resolving code was added in the first place: c8ae4e2. Has |
|
Actually I just realized that I already implemented my suggestion as #15320; @sauerburger can you check whether that PR fixes your issue? |
Well, it came in around the same time as subsetting itself, so I assume the point is to make the subset as minimal as possible |
I simply don't believe the same font file being accessed from two different paths occurs frequently enough (or at all...) to make the path canonicalization worthwhile. One can for example check that luatex/xetex (for which an important selling point is indeed better font support than pdftex...) don't bother with this optimization: compiling \documentclass{article}
\usepackage{fontspec}
\begin{document}
\setmainfont{DejaVuSans.ttf} Hello
\setmainfont{times.ttf} world!
\setmainfont{altdejavu.ttf} And hello
\setmainfont{times.ttf} again!
\end{document} (where DejaVuSans.ttf and times.ttf are in the current directory, and altdejavu.ttf is either a hardlink or a symlink to DejaVuSans.ttf) with lualatex/xelatex and inspecting the resulting pdf using |
I've tested the implementation of #15320. There seems to be another issue. The PDF contains errors, and now, no character is shown correctly. example.pdf, CI run. Correct me if I'm wrong, #15320 tracks fonts by their full path (without resolving links) and uses the filename (basename) of the font to build XObject names. Doesn't this lead to problems if two fonts have the same filename, such as I can only think of two truely unique identifiers for the fonts:
What do you think about using stub font names for XObjects such as |
Fair point re duplicate basenames; I guess using e.g.
8000
|
Closing this PR in favor of #15686. |
PR Summary
This PR addresses an issue in the PDF backend which lead to invalid PDFs. The PDF backend used two inconsistent ways to build the name for XObjects required for Unicode glyphs above code point 255.
writeFonts()
the realpath of the font returned bycbook.get_realpath_and_stat()
is used instead.The realpath differs from the file name of the font if the font is a symbolic link. In this case, the produced PDF contains invalid references to XObjects.
This commit adds a safe-guard in
PdfFile._get_xobject_symbol_name()
. The returned XObject names are always based on the realpath, even if the supplied path is a link. This achieves consistent XObject names and prevents invalid PDF files.This PR resolves #15628.
PR Checklist