-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Doc] alphabetize mathtext symbols by unicode #26142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
86df882
to
d83a448
Compare
I'm wondering if it may be better to not sort a list, but only sort a set? In that way one can use lists if we want to determine the order and sets when we want to sort. (The info in (Another reason is that I think it may make sense to actually render using mathtext rather than just inserting the Unicode symbol as text. Right now some arrows are quite ugly looking as Unicode and the font is a bit inconsistent between the examples. This will also be a way of "testing" the rendering so that it looks OK.) |
Not sure what you mean, can you give an example or push to the PR? Eta: What I mean is not sure where you want me to make this change: in the loop or in the listing |
I am confused by @oscargus 's comment... |
Yes, I can see that it is pretty confusing.
Hence: do not sort lists, but sort sets. Convert the current lists that should be sorted to sets. |
Make the change that the list is correctly sorted and that the loop does not sort lists, only sets. if isinstance(syms, set):
syms = sorted(list(syms)) should do the trick. Possibly combined with turning the current lists that should be sorted to sets. |
r"""\Delta \Gamma \Lambda \Omega \Phi \Pi \Psi \Sigma \Theta | ||
\Upsilon \Xi \mho \nabla""".split()], | ||
["Hebrew", | ||
6, | ||
r"""\aleph \beth \daleth \gimel""".split()], | ||
r"""\aleph \beth \gimel \daleth""".split()], | ||
["Delimiters", | ||
6, | ||
_mathtext.Parser._delims], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the context (in case you've not seen it). We're trying to get rid of the explicit lists here that are already defined in _mathtext.py
. In that way the documentation will always be "correct".
_mathtext.Parser._delims
is a set, so should be sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it back since the actual sort happens in the loop, but here I'm sorting it this way because in Hebrew it's \aleph \beth \gimel \daleth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting is OK. I was pointing out line 22 which gets the delimiters from _mathtext
rather than explicitly specifying them.
My suggestion is to have the correct order and NOT sort in the loop. (And I have no idea about the correct order anyway, so I fully trust you there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I made changes here to move the stuff that wasn't greek letters out into misc, taken from #26145, but your point about not knowing the correct order is exactly why I'd rather defer to the Unicode standard for sorting when possible.
I'm not keen on manually sorting Greek or any of the longer lists if we don't have to cause the list is long enough that it's too easy to make a mistake when editing. I'm cool with turning stuff into sets and reorganizing where and how the table values are stored, but I think it's out of scope for this PR. |
So the current idea is to change the sorting of Greek to something that is not correct? I will not block, but not really clear why special casing that will be removed in the next commit is introduced? |
It's to change the sorting of the Greek and and Hebrew to be based on the Greek and Hebrew alphabets rather than on the latex. Which part of the Greek is incorrect? If \mho and \nabla are in the wrong place, I can move them. |
So since I may have poorly communicated what I'm doing and why: The point of this PR is to special case Hebrew and Greek since those are alphabets with their own alphabetization. I'm using |
I'm very confused what you mean by this. ETA: if this is about #26145, I'm gonna insist there that the special casing be left in b/c I think it's way more robust to have the sorting be done automatically than to have to always remember that the alphabets get sorted by alphabet and not by the latex. |
d83a448
to
39f4237
Compare
My point is that can we be sure that
which I interpreted as "not really correct, but decent" (but didn't check the outcome of it). Hence, my suggestion to do manual ordering and not sort on anything for those cases (and then a solution based on using sets vs lists to distinguish if something should be sorted or already was sorted). I thought it was easier to just do it in #26145 for reference since I had problems conveying my thoughts. I guess the main question is if we should rely on manual or automatic sorting when we want sorting other than alphabetical on LaTeX-keyword. |
I think so, it's essentially using the unicode values for sorting and those are in alphabetical order. What I meant by "OK" was that the sorting was correct up to the not quite Greek characters Anthony mentioned in the issue - it mostly put the variants next to the originals. But honestly, I think I'm just gonna write a little Greek_sorter function to put them in the right place. Eta: I think something like
The reason I'd push for automatic is that I think it'd be safer to maybe change where and how the data is stored if folks don't also have to keep track of how it's sorted. For example, I changed the language string into a set of strings 'cause honestly the giant string makes me tetchy and I'm curious why we don't have a table of the mathtext values. I think if it's something we think is important enough to be sorted than we should code that up. |
39f4237
to
3830616
Compare
Got lowercase sorting clean. ETA: this has changed enough that I figure the original approval doesn't count anymore? |
eadf019
to
9b53070
Compare
I think that #26145 is a much simpler approach. We control the data input so we can just fix it rather than trying to sort things after the fact. |
I don't think it's a simpler approach just cause it's being done manually. I think it's just the difference between using the manual sorting table of checking against a rendered latex table vs. an automatic sort using a key that does the same thing. We already sort the entries to the table, this is just sorting some of those entries in Greek or Hebrew. The ord stuff is just a trick to always put the \var character after it's neighbor. But either way, having the sorting means we don't have to worry about making sure enteries are sorted if that code is touched or moved around-for example if the data tables get moved to a different file. Eta: I guess where I'm coming from is that this is only run on doc builds and it's not really an expensive operation, but if I was messing around with the symbol tables I'd have to remember to manually make sure I hadn't broken the sorting. |
I'm gonna restate my comment from #26145 (comment) as to the advantage of this approach: printing the table is the only place where it's required that these values be sorted, so putting the sort in the print leaves the value ordering unconstrained elsewhere. Keeping the constraint in the place where it's needed long term is easier to maintain cause you don't really have to think about it - |
The thing is that the Greek and Hebrew characters are only listed for printing. If they were used elsewhere, then, sure! I've checked the outcome of this now and I think the sorting here is more than "pretty OK", so I'd be just as happy to go with this. I was primarily worried about the sorting not being as correct as it could. (Maybe the var-change did that, maybe not. That could need some code comment though.) Also #26145 (comment) indicates that manual sorting is tricky... (But also that it is not a major effort to correct it.) |
doc/sphinxext/math_symbol_table.py
Outdated
r"""\Delta \Gamma \Lambda \Omega \Phi \Pi \Psi \Sigma \Theta | ||
\Upsilon \Xi \mho \nabla""".split()], | ||
4, | ||
{r"\Delta", r"\Gamma", r"\Lambda", r"\Omega", r"\Phi", r"\Pi", r"\Psi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that I would have encouraged this type of change for consistency, but when it is here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, this require automatic sorting, which is maybe not a step in the right direction, despite improving it substantially here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it back, it was mostly that keeping it in a string confused me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also is how I parsed your comment about strings/sets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, anything I have said regarding that should ideally have been "do not turn it into a set so that it can be manually sorted", but anyway.
I think that set is probably not the best thing here. It would admittedly remove duplicates, but that is about it. All other benefits (like fast member checking) should not really be a factor. Strictly, I guess a Tuple may be the best for these "static" ones. Not that it matters, but it is sometimes a bit sensitive to sub-optimize these things, so may just as well change to the best if is it to be changed. I'd be happy with the old form as well though.
c0bfbe7
to
b45e907
Compare
I was thinking of how lot of the other values are being pulled from mathtext.parser and wasn't sure if down the line more.consolidation was gonna happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest simplifying to remove the special casing and sorting all symbols, because those orders are somewhat better.
For example, in 'Big symbols', we have: coprod, iiint -> int, oiiint -> oint, prod, and sum, i.e., the products are split by integrals and the integrals are in reverse order. But with sorting by rendered glyph, we get: prod, coprod, sum, int -> iiint, oint -> oiiint, i.e. two products together, a discrete sum, an integration (the limit of a sum) up to multidimensional contour integrals
By this, I mean something like: --- doc/sphinxext/math_symbol_table.py 2023-06-21 19:09:38.542507168 -0400
+++ math_symbol_table.py 2023-06-22 04:14:35.748802184 -0400
@@ -94,8 +94,10 @@
]
def run(state_machine):
- def render_symbol(sym):
+ def render_symbol(sym, ignore_variant=False):
+ if ignore_variant and sym != r"\varnothing":
+ sym = sym.replace(r"\var", "\\")
if sym.startswith("\\"):
sym = sym[1:]
if sym not in (_mathtext.Parser._overunder_functions |
@@ -105,7 +107,11 @@
lines = []
for category, columns, syms in symbols:
- syms = sorted(list(syms))
+ syms = sorted(syms,
+ # Sort by Unicode and place variants immediately after standard versions.
+ key=lambda sym: (render_symbol(sym, ignore_variant=True),
+ sym.startswith(r"\var")),
+ reverse=(category == "Hebrew")) # Hebrew is rtl
columns = min(columns, len(syms))
lines.append("**%s**" % category)
lines.append('') |
I cannot really see it happen for those (as they are not "special" from a parsing perspective), but indeed there will be consolidations for all the other ones. (The fact that they wouldn't be consolidated was part of my reasoning with lists, only here, vs sets, from the parser.) |
Co-authored-by: Oscar Gustafsson <oscar.gustafsson@gmail.com> Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
b45e907
to
7536890
Compare
While it make sense for e.g. Big, it will make it very unintuitive for e.g. operators, where you will get a huge list of seemingly unsorted symbols where there is no simple way to quickly see if we support a given one. |
I'm not sure I agree. If you know the name, then a Ctrl-F search would be easiest. If you just know what the symbol looks like, I'm not sure there is any useful order for searching those. If you only have a vague idea of the name, then this sorting puts relevant pairs like |
Fair point. I still find it a bit "random", although I clearly see the grouping as well. Let's try it. |
Using render_symbol as the sorting key works well enough for Hebrew, I just flipped it 'cause Hebrew is right to left. It also seems to work pretty OK for the Greek?