8000 [Doc] alphabetize mathtext symbols by unicode by story645 · Pull Request #26142 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

[Doc] alphabetize mathtext symbols by unicode #26142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2023

Conversation

story645
Copy link
Member

Using render_symbol as the sorting key works well enough for Hebrew, I just flipped it 'cause Hebrew is right to left. It also seems to work pretty OK for the Greek?

8000
@story645 story645 linked an issue Jun 18, 2023 that may be closed by this pull request
@story645 story645 changed the title alphabetize greek and hebrew, closes #26140 alphabetize greek and hebrew in math text docs Jun 18, 2023
@story645 story645 changed the title alphabetize greek and hebrew in math text docs [Doc] alphabetize greek and hebrew in mathtext docs Jun 18, 2023
@story645 story645 force-pushed the mathtext_hebrew branch 2 times, most recently from 86df882 to d83a448 Compare June 18, 2023 16:10
@oscargus
Copy link
Member
oscargus commented Jun 18, 2023

I'm wondering if it may be better to not sort a list, but only sort a set? In that way one can use lists if we want to determine the order and sets when we want to sort. (The info in _mathtext.py is stored as sets.)

(Another reason is that I think it may make sense to actually render using mathtext rather than just inserting the Unicode symbol as text. Right now some arrows are quite ugly looking as Unicode and the font is a bit inconsistent between the examples. This will also be a way of "testing" the rendering so that it looks OK.)

@story645
Copy link
Member Author
story645 commented Jun 18, 2023

I'm wondering if it may be better to not sort a list, but only sort a set? In that way one can use lists if we want to determine the order and sets when we want to sort. (The info in _mathtext.py is stored as sets.)

Not sure what you mean, can you give an example or push to the PR?

Eta: What I mean is not sure where you want me to make this change: in the loop or in the listing

@tacaswell tacaswell added this to the v3.8.0 milestone Jun 18, 2023
@tacaswell
Copy link
Member

I am confused by @oscargus 's comment...

@oscargus
Copy link
Member

Yes, I can see that it is pretty confusing.

  1. Why do we sort the lists when we can just provide them correctly sorted?

  2. Thinking a bit further, the data in _mathtext.py is stored as sets.

Hence: do not sort lists, but sort sets. Convert the current lists that should be sorted to sets.

@oscargus
Copy link
Member

What I mean is not sure where you want me to make this change: in the loop or in the listing

Make the change that the list is correctly sorted and that the loop does not sort lists, only sets.

if isinstance(syms, set):
    syms = sorted(list(syms))

should do the trick. Possibly combined with turning the current lists that should be sorted to sets.

r"""\Delta \Gamma \Lambda \Omega \Phi \Pi \Psi \Sigma \Theta
\Upsilon \Xi \mho \nabla""".split()],
["Hebrew",
6,
r"""\aleph \beth \daleth \gimel""".split()],
r"""\aleph \beth \gimel \daleth""".split()],
["Delimiters",
6,
_mathtext.Parser._delims],
Copy link
Member
@oscargus oscargus Jun 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the context (in case you've not seen it). We're trying to get rid of the explicit lists here that are already defined in _mathtext.py. In that way the documentation will always be "correct".

_mathtext.Parser._delims is a set, so should be sorted.

Copy link
Member Author
@story645 story645 Jun 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it back since the actual sort happens in the loop, but here I'm sorting it this way because in Hebrew it's \aleph \beth \gimel \daleth

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting is OK. I was pointing out line 22 which gets the delimiters from _mathtext rather than explicitly specifying them.

My suggestion is to have the correct order and NOT sort in the loop. (And I have no idea about the correct order anyway, so I fully trust you there.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I made changes here to move the stuff that wasn't greek letters out into misc, taken from #26145, but your point about not knowing the correct order is exactly why I'd rather defer to the Unicode standard for sorting when possible.

@story645
Copy link
Member Author

I'm not keen on manually sorting Greek or any of the longer lists if we don't have to cause the list is long enough that it's too easy to make a mistake when editing. I'm cool with turning stuff into sets and reorganizing where and how the table values are stored, but I think it's out of scope for this PR.

@oscargus
Copy link
Member

So the current idea is to change the sorting of Greek to something that is not correct?

I will not block, but not really clear why special casing that will be removed in the next commit is introduced?

@oscargus oscargus mentioned this pull request Jun 18, 2023
5 tasks
@story645
Copy link
Member Author
story645 commented Jun 18, 2023

So the current idea is to change the sorting of Greek to something that is not correct?

It's to change the sorting of the Greek and and Hebrew to be based on the Greek and Hebrew alphabets rather than on the latex. Which part of the Greek is incorrect?

If \mho and \nabla are in the wrong place, I can move them.

@story645
Copy link
Member Author

So since I may have poorly communicated what I'm doing and why:

The point of this PR is to special case Hebrew and Greek since those are alphabets with their own alphabetization. I'm using render_symbol as the key because these alphabet characters are sorted in Unicode. If I've accidentally left letters that don't belong, please let me know.

@story645
Copy link
Member Author
story645 commented Jun 19, 2023

I will not block, but not really clear why special casing that will be removed in the next commit is introduced?

I'm very confused what you mean by this.

ETA: if this is about #26145, I'm gonna insist there that the special casing be left in b/c I think it's way more robust to have the sorting be done automatically than to have to always remember that the alphabets get sorted by alphabet and not by the latex.

@oscargus
Copy link
Member

My point is that can we be sure that render_symbol gives a correct ordering? I was under the impression that

It also seems to work pretty OK for the Greek?

which I interpreted as "not really correct, but decent" (but didn't check the outcome of it).

Hence, my suggestion to do manual ordering and not sort on anything for those cases (and then a solution based on using sets vs lists to distinguish if something should be sorted or already was sorted). I thought it was easier to just do it in #26145 for reference since I had problems conveying my thoughts.

I guess the main question is if we should rely on manual or automatic sorting when we want sorting other than alphabetical on LaTeX-keyword.

@story645
Copy link
Member Author
story645 commented Jun 19, 2023

My point is that can we be sure that render_symbol gives a correct ordering

I think so, it's essentially using the unicode values for sorting and those are in alphabetical order. What I meant by "OK" was that the sorting was correct up to the not quite Greek characters Anthony mentioned in the issue - it mostly put the variants next to the originals. But honestly, I think I'm just gonna write a little Greek_sorter function to put them in the right place.

Eta: I think something like lambda s: render_symbol(s) if s.startswith(r"\var") else render_symbol(s.replace(r"var", r"") works but I'll check/push when I have working internet again.

I guess the main question is if we should rely on manual or automatic sorting when we want sorting other than alphabetical on LaTeX-keyword.

The reason I'd push for automatic is that I think it'd be safer to maybe change where and how the data is stored if folks don't also have to keep track of how it's sorted. For example, I changed the language string into a set of strings 'cause honestly the giant string makes me tetchy and I'm curious why we don't have a table of the mathtext values. I think if it's something we think is important enough to be sorted than we should code that up.

@story645
Copy link
Member Author
story645 commented Jun 20, 2023

Got lowercase sorting clean. Also pulled render_symbol out into it's own function 'cause that made it easier to mess around with when I was figuring out the sort key -> does it have to be defined inside run? put it back

ETA: this has changed enough that I figure the original approval doesn't count anymore?

@story645 story645 requested a review from tacaswell June 20, 2023 08:13
@story645 story645 force-pushed the mathtext_hebrew branch 3 times, most recently from eadf019 to 9b53070 Compare June 20, 2023 08:49
@tacaswell
Copy link
Member

I think that #26145 is a much simpler approach. We control the data input so we can just fix it rather than trying to sort things after the fact.

@story645
Copy link
Member Author
story645 commented Jun 20, 2023

I think that #26145 is a much simpler approach.

I don't think it's a simpler approach just cause it's being done manually. I think it's just the difference between using the manual sorting table of checking against a rendered latex table vs. an automatic sort using a key that does the same thing. We already sort the entries to the table, this is just sorting some of those entries in Greek or Hebrew. The ord stuff is just a trick to always put the \var character after it's neighbor. But either way, having the sorting means we don't have to worry about making sure enteries are sorted if that code is touched or moved around-for example if the data tables get moved to a different file.

Eta: I guess where I'm coming from is that this is only run on doc builds and it's not really an expensive operation, but if I was messing around with the symbol tables I'd have to remember to manually make sure I hadn't broken the sorting.

@story645
Copy link
Member Author

I'm gonna restate my comment from #26145 (comment) as to the advantage of this approach: printing the table is the only place where it's required that these values be sorted, so putting the sort in the print leaves the value ordering unconstrained elsewhere. Keeping the constraint in the place where it's needed long term is easier to maintain cause you don't really have to think about it - run sorts it's lists how it needs rather then the lists needing to be a sorted a specific way for run.

@oscargus
Copy link
Member

leaves the value ordering unconstrained elsewhere

The thing is that the Greek and Hebrew characters are only listed for printing. If they were used elsewhere, then, sure!

I've checked the outcome of this now and I think the sorting here is more than "pretty OK", so I'd be just as happy to go with this. I was primarily worried about the sorting not being as correct as it could. (Maybe the var-change did that, maybe not. That could need some code comment though.)

Also #26145 (comment) indicates that manual sorting is tricky... (But also that it is not a major effort to correct it.)

r"""\Delta \Gamma \Lambda \Omega \Phi \Pi \Psi \Sigma \Theta
\Upsilon \Xi \mho \nabla""".split()],
4,
{r"\Delta", r"\Gamma", r"\Lambda", r"\Omega", r"\Phi", r"\Pi", r"\Psi",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I would have encouraged this type of change for consistency, but when it is here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, this require automatic sorting, which is maybe not a step in the right direction, despite improving it substantially here.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it back, it was mostly that keeping it in a string confused me.

Copy link
Member Author
@story645 story645 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also is how I parsed your comment about strings/sets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, anything I have said regarding that should ideally have been "do not turn it into a set so that it can be manually sorted", but anyway.

I think that set is probably not the best thing here. It would admittedly remove duplicates, but that is about it. All other benefits (like fast member checking) should not really be a factor. Strictly, I guess a Tuple may be the best for these "static" ones. Not that it matters, but it is sometimes a bit sensitive to sub-optimize these things, so may just as well change to the best if is it to be changed. I'd be happy with the old form as well though.

@story645 story645 force-pushed the mathtext_hebrew branch 2 times, most recently from c0bfbe7 to b45e907 Compare June 21, 2023 16:05
@story645
Copy link
Member Author

The thing is that the Greek and Hebrew characters are only listed for printing. If they were used elsewhere, then, sure!

I was thinking of how lot of the other values are being pulled from mathtext.parser and wasn't sure if down the line more.consolidation was gonna happen.

Copy link
Member
@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.

For example, in 'Big symbols', we have: coprod, iiint -> int, oiiint -> oint, prod, and sum, i.e., the products are split by integrals and the integrals are in reverse order. But with sorting by rendered glyph, we get: prod, coprod, sum, int -> iiint, oint -> oiiint, i.e. two products together, a discrete sum, an integration (the limit of a sum) up to multidimensional contour integrals

@QuLogic
Copy link
Member
QuLogic commented Jun 22, 2023

I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.

By this, I mean something like:

--- doc/sphinxext/math_symbol_table.py  2023-06-21 19:09:38.542507168 -0400
+++ math_symbol_table.py        2023-06-22 04:14:35.748802184 -0400
@@ -94,8 +94,10 @@
 ]


 def run(state_machine):
-    def render_symbol(sym):
+    def render_symbol(sym, ignore_variant=False):
+        if ignore_variant and sym != r"\varnothing":
+            sym = sym.replace(r"\var", "\\")
         if sym.startswith("\\"):
             sym = sym[1:]
             if sym not in (_mathtext.Parser._overunder_functions |
@@ -105,7 +107,11 @@

     lines = []
     for category, columns, syms in symbols:
-        syms = sorted(list(syms))
+        syms = sorted(syms,
+                      # Sort by Unicode and place variants immediately after standard versions.
+                      key=lambda sym: (render_symbol(sym, ignore_variant=True),
+                                       sym.startswith(r"\var")),
+                      reverse=(category == "Hebrew"))  # Hebrew is rtl
         columns = min(columns, len(syms))
         lines.append("**%s**" % category)
         lines.append('')

@oscargus
Copy link
Member

I was thinking of how lot of the other values are being pulled from mathtext.parser and wasn't sure if down the line more.consolidation was gonna happen.

I cannot really see it happen for those (as they are not "special" from a parsing perspective), but indeed there will be consolidations for all the other ones.

(The fact that they wouldn't be consolidated was part of my reasoning with lists, only here, vs sets, from the parser.)

Co-authored-by: Oscar Gustafsson <oscar.gustafsson@gmail.com>
Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
@story645 story645 changed the title [Doc] alphabetize greek and hebrew in mathtext docs [Doc] alphabetize mathtext symbols by unicode Jun 23, 2023
@oscargus
Copy link
Member

I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.

For example, in 'Big symbols', we have: coprod, iiint -> int, oiiint -> oint, prod, and sum, i.e., the products are split by integrals and the integrals are in reverse order. But with sorting by rendered glyph, we get: prod, coprod, sum, int -> iiint, oint -> oiiint, i.e. two products together, a discrete sum, an integration (the limit of a sum) up to multidimensional contour integrals

While it make sense for e.g. Big, it will make it very unintuitive for e.g. operators, where you will get a huge list of seemingly unsorted symbols where there is no simple way to quickly see if we support a given one.

@QuLogic
Copy link
Member
QuLogic commented Jun 23, 2023

I'm not sure I agree. If you know the name, then a Ctrl-F search would be easiest. If you just know what the symbol looks like, I'm not sure there is any useful order for searching those. If you only have a vague idea of the name, then this sorting puts relevant pairs like ≺ \prec and ≻ \succ or ⊢ \vdash, ⊣ \dashv, ⊤ \top, and ⊥ \bot together, when they wouldn't for simple sorting. Admittedly, all of the lt/gt variants are not strictly together though.

@oscargus
Copy link
Member

Fair point. I still find it a bit "random", although I clearly see the grouping as well. Let's try it.

@QuLogic QuLogic merged commit b858e37 into matplotlib:main Jun 23, 2023
@story645 story645 deleted the mathtext_hebrew branch June 25, 2023 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Doc]: Sort greek/hebrew letters in math docs alphabetically
4 participants
0