ENH add fontname argument in export_graphviz for non-English characters #18959
ENH add fontname argument in export_graphviz for non-English characters #18959ogrisel merged 17 commits intoscikit-learn:masterfrom Zeroto521:graphviz-font
Conversation
There was a problem hiding this comment.
Before reviewing, could you revert all cosmetic changes linked to style issues? We intend to not add any of these changes and only focus on the code that need real changes because it makes it difficult to review otherwise.
.gitignore
Outdated
| sklearn/linear_model/_sag_fast.pyx | ||
|
|
||
| # history codes | ||
| .history |
|
Please add an entry to the change log at |
fine, I will do that later. But when I open the |
Yes this is possible. But we only check on the diff so you can ignore those. |
|
I reset the branch and reopen pr this again |
There was a problem hiding this comment.
Since, fontname is created in _BaseTreeExporter, it could be shared with plot_tree (using matplotlib) and export_graphviz (using graphviz).
Could you update the plot_tree signature as well and pass the parameter to the underlying tree exported. It would also required to update the docstring of plot_tree and to change the tests.
doc/whats_new/v0.24.rst
Outdated
| - |Enhancement| :func:`tree.plot_tree` now uses colors from the matplotlib | ||
| configuration settings. :pr:`17187` by `Andreas Müller`_. | ||
|
|
||
| - |Fix| :func:`tree.export_graphviz` |
There was a problem hiding this comment.
Could you add the entry in v1.0.rst (you might need to merge master into your branch first). We already on our way to release 0.24 and we will probably be a bit too late to get this PR in.
There was a problem hiding this comment.
The v1.0.rst file is not in my branch. The v1.0.rst was committed to master two days ago.
There was a problem hiding this comment.
Yes you need to merge master into your branch to synchronize with master
doc/whats_new/v0.24.rst
Outdated
| configuration settings. :pr:`17187` by `Andreas Müller`_. | ||
|
|
||
| - |Fix| :func:`tree.export_graphviz` | ||
| Add fontname argument in plot_tree for non-English characters. |
There was a problem hiding this comment.
| Add fontname argument in plot_tree for non-English characters. | |
| Add `fontname` argument in :func:`tree.export_graphviz` for non-English characters. |
My test script as follows: import graphviz
from sklearn import tree
from sklearn.datasets import load_iris
clf = tree.DecisionTreeClassifier()
iris = load_iris()
names = ['中文', 'にほんご', '조선말', 'ру́сский язы́к']
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(
clf, feature_names=names,
filled=True, leaves_parallel=True, node_ids=True,
rounded=True, special_characters=True, fontname='sans')
graph = graphviz.Source(dot_data)
graph.render("tree")As for |
|
Using A related issue, matplotlib/matplotlib#15062 plt.rcParams['font.family'] = [fontname]
plt.rcParams['font.sans-serif'] = [fontname] |
doc/whats_new/v0.24.rst
Outdated
| - |Enhancement| :func:`tree.plot_tree` now uses colors from the matplotlib | ||
| configuration settings. :pr:`17187` by `Andreas Müller`_. | 4D9A||
|
|
||
| - |Fix| Add `fontname` argument in :func:`tree.export_graphviz` and |
There was a problem hiding this comment.
You can remove this entry now.
|
Regarding your test scrip in #18959 (comment), I think that we should include it and check the content as for the other test. It will allow us to make sure that we overwrite the fontname. |
|
I think that for from sklearn.datasets import load_iris
from sklearn import tree
clf = tree.DecisionTreeClassifier(random_state=0)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
tree_mpl = tree.plot_tree(clf)Then, we can check for annotation in tree_mpl:
assert x.get_family() == ["fontname"]I did not tested it but it should something around that line. |
For this, there are four different kinds of languages. I couldn't find a font to render them all. |
What I meant is to check the |
When I tested it, I found So the And my testing codes as follow: import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.datasets import load_iris
fontname = 'Microsoft YaHei'
plt.rcParams['font.family'] = [fontname]
plt.rcParams['font.sans-serif'] = [fontname]
iris = load_iris()
clf = tree.DecisionTreeClassifier(random_state=0)
clf = clf.fit(iris.data, iris.target)
names = ['中文', 'にほんご', '조선말', 'ру́сский язы́к']
plt.figure()
tree.plot_tree(
clf, feature_names=names,
filled=True, node_ids=True,
rounded=True)
plt.show() |
doc/whats_new/v1.0.rst
Outdated
| ................... | ||
|
|
||
| - |Fix| Add `fontname` argument in :func:`tree.export_graphviz` and | ||
| :func:`tree.plot_tree` for non-English characters. |
There was a problem hiding this comment.
So we should remove plot_tree then?
sklearn/tree/_export.py
Outdated
| precision=3, ax=None, fontsize=None): | ||
| impurity=True, node_ids=False, proportion=False, | ||
| rotate='deprecated', rounded=False, precision=3, | ||
| ax=None, fontsize=None, fontname='helvetica'): |
There was a problem hiding this comment.
And we should revert the changes here as well for plot_tree
This seems to need to update keywords to fix failing checks. |
There was a problem hiding this comment.
LGTM, thanks. We will need a second review.
|
@thomasjpfan @rth @jnothman Do you want to have a look since you looked at the former PR. |
There was a problem hiding this comment.
Thank you for the PR @Zeroto521 !
There was a problem hiding this comment.
This LGTM
@glemaitre The most recent commits makes it so that fontname changes the font regardness of rounded.
Reference Issues/PRs
See also #14290, #14275
What does this implement/fix? Explain your changes.
Add an arg called
fontnamereplace using the freezing Helvetica font.Any other comments?
Well, this bug is very mad at me.
I have to handle it by changing the
dotfile font name and then using cmddot -Tpdf tree -o dot_data.pdfagain and again.So just do it.