[Doc] alphabetize mathtext symbols by unicode #26142

story645 · 2023-06-18T07:07:18Z

Using render_symbol as the sorting key works well enough for Hebrew, I just flipped it 'cause Hebrew is right to left. It also seems to work pretty OK for the Greek?

oscargus · 2023-06-18T16:16:07Z

I'm wondering if it may be better to not sort a list, but only sort a set? In that way one can use lists if we want to determine the order and sets when we want to sort. (The info in _mathtext.py is stored as sets.)

(Another reason is that I think it may make sense to actually render using mathtext rather than just inserting the Unicode symbol as text. Right now some arrows are quite ugly looking as Unicode and the font is a bit inconsistent between the examples. This will also be a way of "testing" the rendering so that it looks OK.)

story645 · 2023-06-18T18:04:05Z

I'm wondering if it may be better to not sort a list, but only sort a set? In that way one can use lists if we want to determine the order and sets when we want to sort. (The info in _mathtext.py is stored as sets.)

Not sure what you mean, can you give an example or push to the PR?

Eta: What I mean is not sure where you want me to make this change: in the loop or in the listing

tacaswell · 2023-06-18T18:43:13Z

I am confused by @oscargus 's comment...

oscargus · 2023-06-18T18:58:21Z

Yes, I can see that it is pretty confusing.

Why do we sort the lists when we can just provide them correctly sorted?
Thinking a bit further, the data in _mathtext.py is stored as sets.

Hence: do not sort lists, but sort sets. Convert the current lists that should be sorted to sets.

oscargus · 2023-06-18T19:01:05Z

What I mean is not sure where you want me to make this change: in the loop or in the listing

Make the change that the list is correctly sorted and that the loop does not sort lists, only sets.

if isinstance(syms, set):
    syms = sorted(list(syms))

should do the trick. Possibly combined with turning the current lists that should be sorted to sets.

Here is the context (in case you've not seen it). We're trying to get rid of the explicit lists here that are already defined in _mathtext.py. In that way the documentation will always be "correct".

_mathtext.Parser._delims is a set, so should be sorted.

I can change it back since the actual sort happens in the loop, but here I'm sorting it this way because in Hebrew it's \aleph \beth \gimel \daleth

The sorting is OK. I was pointing out line 22 which gets the delimiters from _mathtext rather than explicitly specifying them.

My suggestion is to have the correct order and NOT sort in the loop. (And I have no idea about the correct order anyway, so I fully trust you there.)

So I made changes here to move the stuff that wasn't greek letters out into misc, taken from #26145, but your point about not knowing the correct order is exactly why I'd rather defer to the Unicode standard for sorting when possible.

story645 · 2023-06-18T19:27:17Z

I'm not keen on manually sorting Greek or any of the longer lists if we don't have to cause the list is long enough that it's too easy to make a mistake when editing. I'm cool with turning stuff into sets and reorganizing where and how the table values are stored, but I think it's out of scope for this PR.

oscargus · 2023-06-18T19:31:04Z

So the current idea is to change the sorting of Greek to something that is not correct?

I will not block, but not really clear why special casing that will be removed in the next commit is introduced?

story645 · 2023-06-18T19:55:55Z

So the current idea is to change the sorting of Greek to something that is not correct?

It's to change the sorting of the Greek and and Hebrew to be based on the Greek and Hebrew alphabets rather than on the latex. Which part of the Greek is incorrect?

If \mho and \nabla are in the wrong place, I can move them.

story645 · 2023-06-18T20:10:14Z

So since I may have poorly communicated what I'm doing and why:

The point of this PR is to special case Hebrew and Greek since those are alphabets with their own alphabetization. I'm using render_symbol as the key because these alphabet characters are sorted in Unicode. If I've accidentally left letters that don't belong, please let me know.

story645 · 2023-06-19T02:09:02Z

I will not block, but not really clear why special casing that will be removed in the next commit is introduced?

I'm very confused what you mean by this.

ETA: if this is about #26145, I'm gonna insist there that the special casing be left in b/c I think it's way more robust to have the sorting be done automatically than to have to always remember that the alphabets get sorted by alphabet and not by the latex.

oscargus · 2023-06-19T09:00:07Z

My point is that can we be sure that render_symbol gives a correct ordering? I was under the impression that

It also seems to work pretty OK for the Greek?

which I interpreted as "not really correct, but decent" (but didn't check the outcome of it).

Hence, my suggestion to do manual ordering and not sort on anything for those cases (and then a solution based on using sets vs lists to distinguish if something should be sorted or already was sorted). I thought it was easier to just do it in #26145 for reference since I had problems conveying my thoughts.

I guess the main question is if we should rely on manual or automatic sorting when we want sorting other than alphabetical on LaTeX-keyword.

story645 · 2023-06-19T14:53:13Z

My point is that can we be sure that render_symbol gives a correct ordering

I think so, it's essentially using the unicode values for sorting and those are in alphabetical order. What I meant by "OK" was that the sorting was correct up to the not quite Greek characters Anthony mentioned in the issue - it mostly put the variants next to the originals. But honestly, I think I'm just gonna write a little Greek_sorter function to put them in the right place.

Eta: I think something like lambda s: render_symbol(s) if s.startswith(r"\var") else render_symbol(s.replace(r"var", r"") works but I'll check/push when I have working internet again.

I guess the main question is if we should rely on manual or automatic sorting when we want sorting other than alphabetical on LaTeX-keyword.

The reason I'd push for automatic is that I think it'd be safer to maybe change where and how the data is stored if folks don't also have to keep track of how it's sorted. For example, I changed the language string into a set of strings 'cause honestly the giant string makes me tetchy and I'm curious why we don't have a table of the mathtext values. I think if it's something we think is important enough to be sorted than we should code that up.

story645 · 2023-06-20T08:13:27Z

Got lowercase sorting clean. ~~Also pulled render_symbol out into it's own function 'cause that made it easier to mess around with when I was figuring out the sort key -> does it have to be defined inside run?~~ put it back

ETA: this has changed enough that I figure the original approval doesn't count anymore?

tacaswell · 2023-06-20T20:20:46Z

I think that #26145 is a much simpler approach. We control the data input so we can just fix it rather than trying to sort things after the fact.

story645 · 2023-06-20T22:00:07Z

I think that #26145 is a much simpler approach.

I don't think it's a simpler approach just cause it's being done manually. I think it's just the difference between using the manual sorting table of checking against a rendered latex table vs. an automatic sort using a key that does the same thing. We already sort the entries to the table, this is just sorting some of those entries in Greek or Hebrew. The ord stuff is just a trick to always put the \var character after it's neighbor. But either way, having the sorting means we don't have to worry about making sure enteries are sorted if that code is touched or moved around-for example if the data tables get moved to a different file.

Eta: I guess where I'm coming from is that this is only run on doc builds and it's not really an expensive operation, but if I was messing around with the symbol tables I'd have to remember to manually make sure I hadn't broken the sorting.

story645 · 2023-06-21T02:31:32Z

I'm gonna restate my comment from #26145 (comment) as to the advantage of this approach: printing the table is the only place where it's required that these values be sorted, so putting the sort in the print leaves the value ordering unconstrained elsewhere. Keeping the constraint in the place where it's needed long term is easier to maintain cause you don't really have to think about it - run sorts it's lists how it needs rather then the lists needing to be a sorted a specific way for run.

oscargus · 2023-06-21T08:33:55Z

leaves the value ordering unconstrained elsewhere

The thing is that the Greek and Hebrew characters are only listed for printing. If they were used elsewhere, then, sure!

I've checked the outcome of this now and I think the sorting here is more than "pretty OK", so I'd be just as happy to go with this. I was primarily worried about the sorting not being as correct as it could. (Maybe the var-change did that, maybe not. That could need some code comment though.)

Also #26145 (comment) indicates that manual sorting is tricky... (But also that it is not a major effort to correct it.)

B3F0

oscargus · 2023-06-21T08:35:15Z

doc/sphinxext/math_symbol_table.py

-     r"""\Delta \Gamma \Lambda \Omega \Phi \Pi \Psi \Sigma \Theta
-     \Upsilon \Xi \mho \nabla""".split()],
+     4,
+     {r"\Delta", r"\Gamma", r"\Lambda", r"\Omega", r"\Phi", r"\Pi", r"\Psi",


Not sure that I would have encouraged this type of change for consistency, but when it is here...

(Also, this require automatic sorting, which is maybe not a step in the right direction, despite improving it substantially here.)

I can change it back, it was mostly that keeping it in a string confused me.

also is how I parsed your comment about strings/sets

Well, anything I have said regarding that should ideally have been "do not turn it into a set so that it can be manually sorted", but anyway.

I think that set is probably not the best thing here. It would admittedly remove duplicates, but that is about it. All other benefits (like fast member checking) should not really be a factor. Strictly, I guess a Tuple may be the best for these "static" ones. Not that it matters, but it is sometimes a bit sensitive to sub-optimize these things, so may just as well change to the best if is it to be changed. I'd be happy with the old form as well though.

story645 · 2023-06-21T22:08:42Z

The thing is that the Greek and Hebrew characters are only listed for printing. If they were used elsewhere, then, sure!

I was thinking of how lot of the other values are being pulled from mathtext.parser and wasn't sure if down the line more.consolidation was gonna happen.

QuLogic

I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.

For example, in 'Big symbols', we have: coprod, iiint -> int, oiiint -> oint, prod, and sum, i.e., the products are split by integrals and the integrals are in reverse order. But with sorting by rendered glyph, we get: prod, coprod, sum, int -> iiint, oint -> oiiint, i.e. two products together, a discrete sum, an integration (the limit of a sum) up to multidimensional contour integrals

doc/sphinxext/math_symbol_table.py

QuLogic · 2023-06-22T08:21:27Z

I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.

By this, I mean something like:

--- doc/sphinxext/math_symbol_table.py  2023-06-21 19:09:38.542507168 -0400
+++ math_symbol_table.py        2023-06-22 04:14:35.748802184 -0400
@@ -94,8 +94,10 @@
 ]


 def run(state_machine):
-    def render_symbol(sym):
+    def render_symbol(sym, ignore_variant=False):
+        if ignore_variant and sym != r"\varnothing":
+            sym = sym.replace(r"\var", "\\")
         if sym.startswith("\\"):
             sym = sym[1:]
             if sym not in (_mathtext.Parser._overunder_functions |
@@ -105,7 +107,11 @@

     lines = []
     for category, columns, syms in symbols:
-        syms = sorted(list(syms))
+        syms = sorted(syms,
+                      # Sort by Unicode and place variants immediately after standard versions.
+                      key=lambda sym: (render_symbol(sym, ignore_variant=True),
+                                       sym.startswith(r"\var")),
+                      reverse=(category == "Hebrew"))  # Hebrew is rtl
         columns = min(columns, len(syms))
         lines.append("**%s**" % category)
         lines.append('')

oscargus · 2023-06-22T12:22:38Z

I was thinking of how lot of the other values are being pulled from mathtext.parser and wasn't sure if down the line more.consolidation was gonna happen.

I cannot really see it happen for those (as they are not "special" from a parsing perspective), but indeed there will be consolidations for all the other ones.

(The fact that they wouldn't be consolidated was part of my reasoning with lists, only here, vs sets, from the parser.)

Co-authored-by: Oscar Gustafsson <oscar.gustafsson@gmail.com> Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

oscargus · 2023-06-23T07:24:39Z

I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.

For example, in 'Big symbols', we have: coprod, iiint -> int, oiiint -> oint, prod, and sum, i.e., the products are split by integrals and the integrals are in reverse order. But with sorting by rendered glyph, we get: prod, coprod, sum, int -> iiint, oint -> oiiint, i.e. two products together, a discrete sum, an integration (the limit of a sum) up to multidimensional contour integrals

While it make sense for e.g. Big, it will make it very unintuitive for e.g. operators, where you will get a huge list of seemingly unsorted symbols where there is no simple way to quickly see if we support a given one.

QuLogic · 2023-06-23T07:57:15Z

I'm not sure I agree. If you know the name, then a Ctrl-F search would be easiest. If you just know what the symbol looks like, I'm not sure there is any useful order for searching those. If you only have a vague idea of the name, then this sorting puts relevant pairs like ≺ \prec and ≻ \succ or ⊢ \vdash, ⊣ \dashv, ⊤ \top, and ⊥ \bot together, when they wouldn't for simple sorting. Admittedly, all of the lt/gt variants are not strictly together though.

oscargus · 2023-06-23T09:12:28Z

Fair point. I still find it a bit "random", although I clearly see the grouping as well. Let's try it.

story645 added Documentation topic: text/mathtext labels Jun 18, 2023

story645 linked an issue Jun 18, 2023 that may be closed by this pull request

[Doc]: Sort greek/hebrew letters in math docs alphabetically #26140

Closed

story645 changed the title ~~alphabetize greek and hebrew, closes #26140~~ alphabetize greek and hebrew in math text docs Jun 18, 2023

story645 changed the title ~~alphabetize greek and hebrew in math text docs~~ [Doc] alphabetize greek and hebrew in mathtext docs Jun 18, 2023

story645 force-pushed the mathtext_hebrew branch 2 times, most recently from 86df882 to d83a448 Compare June 18, 2023 16:10

tacaswell added this to the v3.8.0 milestone Jun 18, 2023

tacaswell approved these changes Jun 18, 2023

View reviewed changes

oscargus reviewed Jun 18, 2023

View reviewed changes

oscargus mentioned this pull request Jun 18, 2023

Sort Greek and Hebrew characters #26145

Closed

5 tasks

story645 force-pushed the mathtext_hebrew branch from d83a448 to 39f4237 Compare June 19, 2023 03:28

story645 requested a review from tacaswell June 20, 2023 08:13

story645 force-pushed the mathtext_hebrew branch 3 times, most recently from eadf019 to 9b53070 Compare June 20, 2023 08:49

oscargus reviewed Jun 21, 2023

View reviewed changes

story645 force-pushed the mathtext_hebrew branch 2 times, most recently from c0bfbe7 to b45e907 Compare June 21, 2023 16:05

QuLogic reviewed Jun 22, 2023

View reviewed changes

doc/sphinxext/math_symbol_table.py Outdated Show resolved Hide resolved

alphabetize greek and hebrew, closes matplotlib#26140

7536890

Co-authored-by: Oscar Gustafsson <oscar.gustafsson@gmail.com> Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

story645 force-pushed the mathtext_hebrew branch from b45e907 to 7536890 Compare June 23, 2023 02:02

story645 changed the title ~~[Doc] alphabetize greek and hebrew in mathtext docs~~ [Doc] alphabetize mathtext symbols by unicode Jun 23, 2023

oscargus approved these changes Jun 23, 2023

View reviewed changes

QuLogic merged commit b858e37 into matplotlib:main Jun 23, 2023

story645 deleted the mathtext_hebrew branch June 25, 2023 02:02

                    r"""\Delta \Gamma \Lambda \Omega \Phi \Pi \Psi \Sigma \Theta
                    \Upsilon \Xi \mho \nabla""".split()],
                   ["Hebrew",
 ,
-                   r"""\aleph \beth \daleth \gimel""".split()],
+                   r"""\aleph \beth \gimel \daleth""".split()],
                   ["Delimiters",
 ,
                    _mathtext.Parser._delims],

Uh oh!

[Doc] alphabetize mathtext symbols by unicode #26142

[Doc] alphabetize mathtext symbols by unicode #26142

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!