10000 ENH add fontname argument in export_graphviz for non-English characters by Zeroto521 · Pull Request #18959 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH add fontname argument in export_graphviz for non-English characters #18959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 10, 2021
Merged

ENH add fontname argument in export_graphviz for non-English characters #18959

merged 17 commits into from
Jan 10, 2021

Conversation

Zeroto521
Copy link
Contributor
@Zeroto521 Zeroto521 commented Dec 3, 2020

Reference Issues/PRs

See also #14290, #14275

What does this implement/fix? Explain your changes.

Add an arg called fontname replace using the freezing Helvetica font.

Any other comments?

Well, this bug is very mad at me.
I have to handle it by changing the dot file font name and then using cmd dot -Tpdf tree -o dot_data.pdf again and again.

So just do it.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reviewing, could you revert all cosmetic changes linked to style issues? We intend to not add any of these changes and only focus on the code that need real changes because it makes it difficult to review otherwise.

.gitignore Outdated
@@ -79,3 +79,6 @@ _configtest.o.d
sklearn/utils/_seq_dataset.pyx
sklearn/utils/_seq_dataset.pxd
sklearn/linear_model/_sag_fast.pyx

# history codes
.history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this part

@glemaitre
Copy link
Member

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@Zeroto521
Copy link
Contributor Author

Before reviewing, could you revert all cosmetic changes linked to style issues? We intend to not add any of these changes and only focus on the code that need real changes because it makes it difficult to review otherwise.

fine, I will do that later.

But when I open the _export.py, all lint errors come up to the editor.
And the screen is full of red or yellow lines.

@glemaitre glemaitre changed the title Fix graphviz font rendering problem FIX add fontname argument in plot_tree for non-English characters Dec 16, 2020
@glemaitre
Copy link
Member

But when I open the _export.py, all lint errors come up to the editor.
And the screen is full of red or yellow lines.

Yes this is possible. But we only check on the diff so you can ignore those.
This will get better when #18948 will be merged and that we all agree to use black.

@Zeroto521 Zeroto521 closed this Dec 17, 2020
@Zeroto521
Copy link
Contributor Author

I reset the branch and reopen pr this again

@Zeroto521 Zeroto521 reopened this Dec 17, 2020
@Zeroto521 Zeroto521 requested a review from glemaitre December 17, 2020 03:23
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, fontname is created in _BaseTreeExporter, it could be shared with plot_tree (using matplotlib) and export_graphviz (using graphviz).

Could you update the plot_tree signature as well and pass the parameter to the underlying tree exported. It would also required to update the docstring of plot_tree and to change the tests.

@@ -785,6 +785,10 @@ Changelog
- |Enhancement| :func:`tree.plot_tree` now uses colors from the matplotlib
configuration settings. :pr:`17187` by `Andreas Müller`_.

- |Fix| :func:`tree.export_graphviz`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the entry in v1.0.rst (you might need to merge master into your branch first). We already on our way to release 0.24 and we will probably be a bit too late to get this PR in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v1.0.rst file is not in my branch. The v1.0.rst was committed to master two days ago.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you need to merge master into your branch to synchronize with master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine

@@ -785,6 +785,10 @@ Changelog
- |Enhancement| :func:`tree.plot_tree` now uses colors from the matplotlib
configuration settings. :pr:`17187` by `Andreas Müller`_.

- |Fix| :func:`tree.export_graphviz`
Add fontname argument in plot_tree for non-English characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add fontname argument in plot_tree for non-English characters.
Add `fontname` argument in :func:`tree.export_graphviz` for non-English characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Zeroto521 Zeroto521 requested a review from glemaitre December 17, 2020 09:45
@Zeroto521
Copy link
Contributor Author

Since, fontname is created in _BaseTreeExporter, it could be shared with plot_tree (using matplotlib) and export_graphviz (using graphviz).

Could you update the plot_tree signature as well and pass the parameter to the underlying tree exported. It would also required to update the docstring of plot_tree and to change the tests.

My test script as follows:

import graphviz

from sklearn import tree
from sklearn.datasets import load_iris

clf = tree.DecisionTreeClassifier()
iris = load_iris()

names = ['中文', 'にほんご', '조선말', 'ру́сский язы́к']

clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(
    clf, feature_names=names,
    filled=True, leaves_parallel=True, node_ids=True,
    rounded=True, special_characters=True, fontname='sans')
graph = graphviz.Source(dot_data)
graph.render("tree")

As for plot_tree function, I may need to find another example to test it.
By the way, I almost never use plot_tree.

@Zeroto521
Copy link
Contributor Author
Zeroto521 commented Dec 18, 2020

Using matplotlib to plot the tree.
There also needs to add the following codes to make matplotlib know use what font to render.

A related issue, matplotlib/matplotlib#15062

plt.rcParams['font.family'] = [fontname]
plt.rcParams['font.sans-serif'] = [fontname]

@@ -792,6 +792,10 @@ Changelog
- |Enhancement| :func:`tree.plot_tree` now uses colors from the matplotlib
configuration settings. :pr:`17187` by `Andreas Müller`_.

- |Fix| Add `fontname` argument in :func:`tree.export_graphviz` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this entry now.

@glemaitre
Copy link
Member

Regarding your test scrip in #18959 (comment), I think that we should include it and check the content as for the other test. It will allow us to make sure that we overwrite the fontname.

@glemaitre
Copy link
Member

I think that for plot_tree, we can come with a minimal example such

from sklearn.datasets import load_iris
from sklearn import tree
clf = tree.DecisionTreeClassifier(random_state=0)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
tree_mpl = tree.plot_tree(clf)

Then, we can check

for annotation in tree_mpl:
    assert x.get_family() == ["fontname"]

I did not tested it but it should something around that line.

@Zeroto521
Copy link
Contributor Author
Zeroto521 commented Dec 18, 2020

Regarding your test scrip in #18959 (comment), I think that we should include it and check the content as for the other test. It will allow us to make sure that we overwrite the fontname.

For this, there are four different kinds of languages. I couldn't find a font to render them all.
And the result is a picture or pdf. To check the result maybe not well.

@glemaitre
Copy link
Member

For this, there are four different kinds of languages. I couldn't find a font to render them all.
And the result is a picture or pdf. To check the result maybe not well.

What I meant is to check the fontname=xxx in the content of the dot file to check that we overwrite the default helvetica.

@Zeroto521
Copy link
Contributor Author
Zeroto521 commented Dec 23, 2020

I think that for plot_tree, we can come with a minimal example such

from sklearn.datasets import load_iris
from sklearn import tree
clf = tree.DecisionTreeClassifier(random_state=0)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
tree_mpl = tree.plot_tree(clf)

Then, we can check

for annotation in tree_mpl:
    assert x.get_family() == ["fontname"]

I did not tested it but it should something around that line.

When I tested it, I found matplotlib render font via plt.rcParams['font.family'] and plt.rcParams['font.sans-serif'].

So the fontname in tree.plot_tree no use. We should delete fontname for tree.plot_tree.

And my testing codes as follow:

import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.datasets import load_iris

fontname = 'Microsoft YaHei'
plt.rcParams['font.family'] = [fontname]
plt.rcParams['font.sans-serif'] = [fontname]

iris = load_iris()
clf = tree.DecisionTreeClassifier(random_state=0)
clf = clf.fit(iris.data, iris.target)

names = ['中文', 'にほんご', '조선말', 'ру́сский язы́к']

plt.figure()
tree.plot_tree(
    clf, feature_names=names,
    filled=True, node_ids=True,
    rounded=True)
plt.show()

@Zeroto521 Zeroto521 requested a review from glemaitre December 23, 2020 02:18
...................

- |Fix| Add `fontname` argument in :func:`tree.export_graphviz` and
:func:`tree.plot_tree` for non-English characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should remove plot_tree then?

precision=3, ax=None, fontsize=None):
impurity=True, node_ids=False, proportion=False,
rotate='deprecated', rounded=False, precision=3,
ax=None, fontsize=None, fontname='helvetica'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should revert the changes here as well for plot_tree

@Zeroto521 Zeroto521 changed the title FIX add fontname argument in plot_tree for non-English characters FIX add fontname argument in export_graphviz for non-English characters Jan 5, 2021
@Zeroto521
Copy link
Contributor Author

DeprecationWarning: 'wminkowski' metric is deprecated and will be removed in SciPy 1.8.0, use 'minkowski' instead.

This seems to need to update keywords to fix failing checks.

@Zeroto521 Zeroto521 requested a review from glemaitre January 5, 2021 02:21
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. We will need a second review.

@glemaitre
Copy link
Member

@thomasjpfan @rth @jnothman Do you want to have a look since you looked at the former PR.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Zeroto521 !

@Zeroto521 Zeroto521 requested a review from thomasjpfan January 6, 2021 02:10
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM

@glemaitre The most recent commits makes it so that fontname changes the font regardness of rounded.

@thomasjpfan thomasjpfan changed the title FIX add fontname argument in export_graphviz for non-English characters ENH add fontname argument in export_graphviz for non-English characters Jan 6, 2021
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well. Thanks!

@ogrisel ogrisel merged commit 34de1b9 into scikit-learn:master Jan 10, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0