8000 Characters doesn't display correctly when figure saved as pdf with a custom font · Issue #12636 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Characters doesn't display correctly when figure saved as pdf with a custom font #12636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gepcel opened this issue Oct 26, 2018 · 13 comments · Fixed by #18651
Closed

Characters doesn't display correctly when figure saved as pdf with a custom font #12636

gepcel opened this issue Oct 26, 2018 · 13 comments · Fixed by #18651

Comments

@gepcel
Copy link
Contributor
gepcel commented Oct 26, 2018

Problem description:

When using a custom font, saving the figure as pdf format, and then opening the pdf with adobe reader (or many other pdf readers except SumatraPDF), some characters don't display correctly.

I use windows 10 operating system. The font is for the Chinese language, and already install into C:\windows\fonts

Check the font

from matplotlib.font_manager import fontManager
for f in fontManager.ttflist:
    if 'Source Han Serif' in f.name:
        print(f.name, ' : ', f.fname)

Output:

Source Han Serif CN  :  C:\Windows\Fonts\SourceHanSerifCN-Regular.otf
Source Han Serif CN  :  C:\Windows\Fonts\SourceHanSerifCN-Bold.otf

Generate the figure

Code (a minimal and fully example)

import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
plt.text(0.5, 0.5, '图像', ha='center', va='center', 
         fontdict={'family': 'Source Han Serif CN', 'size': 18})
plt.savefig('figure_with_font.pdf')
plt.savefig('figure_with_font.png')

There will be a warning message:

'SourceHanSerifCN-Regular.otf' cannot be subsetted into a Type 3 font. The entire font will be embedded in the output.

And the output in the notebook and the png format both displays correctly:
figure_with_font

But when open figure_with_font.pdf with "adobe reader", it displays like:
image

When I check the document properties, it seems like the font is already embedded in the pdf:
image

I've tried some other readers, only Sumatra PDF works.

**I'm using: **

  1. Windows 10
  2. matplotlib: 3.0.0

And The output pdf file:

figure_with_font.pdf

Question:

  • Can matplotlib save a figure to pdf without embedding the font? Sometimes when saving thousands of figures, it will help to reduce the file size.
  • What's wrong with the pdf not displaying correctly?
@anntzer
Copy link
Contributor
anntzer commented Oct 26, 2018

Indeed, there is currently no support for subsetting otf fonts, i.e. including only the glyphs that are needed (we do this for ttf; for large fonts such as cjk this can mean megabytes of data).

More curious is the bad rendering by certain pdf readers: I get correct rendering with chromium and zathura, and incorrect rendering (with different glyphs) with acroread and okular. I think this specific part qualifies as a bug (subsetting is more of a feature request, desirable but likely to require a bit of work...).

As a workaround, you can use the cairo-based backends (either the builtin one -- which suffers from some other issues, e.g. you don't actually have much control over the font used --; or https://github.com/anntzer/mplcairo, which works better) which will properly subset the font (cairo has code for that) and generate pdfs that all readers I've tested render correctly.

@gepcel
Copy link
Contributor Author
gepcel commented Oct 26, 2018

I've tried mplcairo and the builtin cairo. And get 4 kinds of output. Restarted jupyter notebook kernel before each example.

1st example (cairo.pdf)

With the buildin cairo backend:

import matplotlib as mpl
mpl.use("Cairo")

import matplotlib.pyplot as plt
plt.text(0.5, 0.5, "图像")
plt.savefig('d:/cairo.pdf')

2nd (default.pdf)

With the default backend:

import matplotlib as mpl
import matplotlib.pyplot as plt
plt.text(0.5, 0.5, "图像")
plt.savefig('d:/default.pdf')
print(mpl.get_backend())

# Output: 'module://ipykernel.pylab.backend_inline'

3rd (default_inline.pdf)

With the default backend and inline mode:

%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.text(0.5, 0.5, "图像")
plt.savefig('d:/default_inline.pdf')
print(mpl.get_backend())

# Output: module://ipykernel.pylab.backend_inline

4th (mplcairo.pdf)

import matplotlib as mpl
mpl.use("module://mplcairo.qt")

import matplotlib.pyplot as plt
plt.text(0.5, 0.5, "图像")
plt.savefig('d:/mplcairo.pdf')

Got error messages like:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-63ae148ad48a> in <module>()
      4 import matplotlib.pyplot as plt
      5 plt.text(0.5, 0.5, "图像")
----> 6 plt.savefig('d:/mplcairo.pdf')

C:\anaconda3\lib\site-packages\matplotlib\pyplot.py in savefig(*args, **kwargs)
    686 def savefig(*args, **kwargs):
    687     fig = gcf()
--> 688     res = fig.savefig(*args, **kwargs)
    689     fig.canvas.draw_idle()   # need this if 'transparent=True' to reset colors
    690     return res

C:\anaconda3\lib\site-packages\matplotlib\figure.py in savefig(self, fname, frameon, transparent, **kwargs)
   2095             self.set_frameon(frameon)
   2096 
-> 2097         self.canvas.print_figure(fname, **kwargs)
   2098 
   2099         if frameon:

C:\anaconda3\lib\site-packages\matplotlib\backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
   2073                     orientation=orientation,
   2074                     bbox_inches_restore=_bbox_inches_restore,
-> 2075                     **kwargs)
   2076             finally:
   2077                 if bbox_inches and restore_bbox:

C:\anaconda3\lib\site-packages\mplcairo\base.py in _print_method(self, renderer_factory, path_or_stream, metadata, dpi, facecolor, edgecolor, orientation, dryrun, bbox_inches_restore)
    251         with cbook.open_file_cm(path_or_stream, "wb") as stream:
    252             renderer = renderer_factory(
--> 253                 stream, self.figure.bbox.width, self.figure.bbox.height, dpi)
    254             renderer._set_metadata(metadata)
    255             with _LOCK:

C:\anaconda3\lib\site-packages\mplcairo\base.py in _for_fmt_output(cls, fmt, stream, width, height, dpi)
     85         args = fmt, stream, width, height, dpi
     86         obj = _mplcairo.GraphicsContextRendererCairo.__new__(cls, *args)
---> 87         _mplcairo.GraphicsContextRendererCairo.__init__(obj, *args)
     88         return obj
     89 

RuntimeError: cairo was built without support for the requested file format

Compares the total 3 successful output:

  • cairo.pdf: characters displays correctly, with 48.6KB of file size (font subsetted correctly, I assume). 25.4×20.32 cm.
  • default.pdf: characters displays incorrectly, with 9.1MB of file size. 25.4×20.32 cm.
  • default_inline.pdf: characters displays incorrectly, with 9.1MB of file size. 15.238×10.16 cm. Different figsize and fontsize with the previous 2 files.

So:

  1. Since I almost always use the inline mode, I thought the figsize and font size of this mode is the correct style. But why the difference of figsize and fontsize?
  2. Am I missing anything with mplcairo backend? I can live with the builtin cairo as long as I figure out the figsize and font size problem.

merge

@anntzer
Copy link
Contributor
anntzer commented Oct 26, 2018

I realized there's a bug on mplcairo+windows, I'll look into fixing it (some symbol loading not yet implemented...); I only had it working on Linux/OSX so far.
For the builtin cairo backend, fixing the figsize may be possible (it's likely "just" a bug, few people are using it -- adding a backend/cairo label to the issue), fixing the font selection is going to be very very hard (because pycairo only provides access to cairo's "toy" font API).

@anntzer
Copy link
Contributor
anntzer commented Oct 29, 2018

I uploaded a new wheel for mplcairo (v0.1.post28) at https://github.com/anntzer/mplcairo/releases/tag/nightly which you can download and install with pip.
For reasons I don't understand, pdf output is still not working on Windows (you get an empty pdf); however, you should be able to generate postscript (ps) output with Chinese characters and then use a ps-to-pdf converter (e.g. https://www.ghostscript.com/doc/current/Ps2pdf.htm) to get a pdf.
Let me know if this works in your hands (I know it's not optimal, but hopefully still better than nothing...).

@gepcel
Copy link
Contributor Author
gepcel commented Oct 29, 2018

@anntzer Thanks for the nightly built wheel, besides the cp36-linux and cp37-win, any chance to upload a cp36-win-64? Sorry for the inconvenience.

And another thing, regards the difference of figsize I posted earlier, it seems that give figsize and dpi explicitly like fig = plt.figure(figsize=(6,4)); plt.savefig('..', dpi=100) will generate pdfs with the same size.

@anntzer
Copy link
Contributor
anntzer commented Oct 30, 2018

any chance to upload a cp36-win-64? Sorry for the inconvenience.

Done.

@gepcel
Copy link
Contributor Author
gepcel commented Oct 30, 2018

It works. I can save it to .ps and then convert it to .pdf. Thank you.

@anntzer
Copy link
Contributor
anntzer commented Oct 31, 2018

By the way, can you confirm that direct pdf output also fails for you?

@gepcel
Copy link
Contributor Author
gepcel commented Oct 31, 2018

I tried the same code as 4th as I posted earlier. No error message was given, the qt windows showed with the correct plot, and an mplcairo.pdf was generated. But the mplcairo.pdf has only 509 bytes of file size, and cannot be opened as pdf.

I opened the mplcairo.pdf with a text editor, the content was:

%PDF-1.5
%µí®û
1 0 0 -1 0 345.6 cm
1 0 obj
<< /Type /Pages
   /Kids [ 2 0 R ]
   /Count 1
>>
endobj
6 0 obj
<< /Producer (cairo 1.15.12 (http://cairographics.org))
   /CreationDate (D:20181031204628+08'00)
>>
endobj
7 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj
xref
0 8
0000000000 65535 f 
0000000035 00000 n 
0000000015 00000 n 
0000000015 00000 n 
0000000015 00000 n 
0000000015 00000 n 
0000000100 00000 n 
0000000216 00000 n 
trailer
<< /Size 8
   /Root 7 0 R
   /Info 6 0 R
>>
startxref
268
%%EOF

@anntzer
Copy link
Contributor
anntzer commented Oct 31, 2018

Indeed, the issue seems to come from the specific cairo build I'm using (https://preshing.com/20170529/heres-a-standalone-cairo-dll-for-windows/#IDComment1047546463). Happy to get help on that if anyone can look into it...

@anntzer
Copy link
Contributor
anntzer commented Nov 2, 2018

Actually I found a (hackish) way to get a better cairo.dll that should render pdf correctly, can you try with the new wheels at https://github.com/anntzer/mplcairo/releases/tag/nightly?

@gepcel
Copy link
Contributor Author
gepcel commented Nov 3, 2018

It works. Thanks very much.

Can I use both mplcairo and %matplotlib inline mode in notebook at the same time?

@anntzer
Copy link
Contributor
anntzer commented Nov 3, 2018

You can set the environment variable MPLCAIRO_PATCH_AGG=1 before starting the notebook, as mentioned in https://github.com/anntzer/mplcairo#use. Again, not a really great solution but better than nothing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0