-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-115077: Argument Clinic: generate better error messages when parsing function declaration #115555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With this experiment, we can in the future make use of shlex's character position, and thus easily provide the position in the line where parsing failed. For example, by providing error messages that look more similar to the familiar Python tracebacks. |
Some examples of improved cases:
UPDATE: after 9b93771, the latter two cases are no longer improved. |
Another positive side effect: previously, the parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use the shell tokenizer to parse Argument Clinic syntax? Isn't it closer to Python syntax?
Because it was the short route to a proof-of-concept PR. We can of course rewrite it to use the Python tokeniser instead. |
Possible gotcha: the Python tokeniser will probably split up the full name (e.g. |
The shell tokenizer has much more gotchas. |
We already use the shell tokenizer for parsing the checksum line. Should we also stop using it there? Let's rewrite it using the Python tokenizer then. If it introduces too much complexity, let's just forget about this experiment and leave the error messages like they are today. |
Could you point to some, so I can add tests for those? |
I expect some surprises in handling quotes and escapes. But for such simple case both look overkill to me. It can be done with regexpes or string methods. What are the problems in the current code? |
For example: m = re.match(r'\s*([\w.]+)\s*', line)
assert m
full_name = m[1]
if not libclinic.is_legal_py_identifier(full_name):
fail(f"Illegal function name: {full_name!r}")
pos = m.end()
m = re.compile(r'\bas\b\s*(?:([^-=\s]+)\s*)?').match(line, pos)
if m:
if not m[1]:
fail(f"No C basename provided for {full_name!r} after 'as' keyword")
c_basename = m[1]
if not libclinic.is_legal_c_identifier(c_basename):
fail(f"Illegal C basename: {c_basename!r}")
pos = m.end()
else:
c_basename = self.generate_c_basename(full_name)
m = re.compile(r'=\s*(?:([^-=\s]+)\s*)?').match(line, pos)
if m:
if not m[1]:
fail(f"No source function provided for {full_name!r} after '=' keyword")
cloned = m[1]
if not libclinic.is_legal_py_identifier(cloned):
fail(f"Illegal source function name: {cloned!r}")
pos = m.end()
m = re.compile(r'->\s*(.*)').match(line, pos)
if m:
if not m[1]:
fail(f"No return annotation provided for {full_name!r} after '->' keyword")
returns = m[1].strip() |
That should be easy to check; I don't expect it to be a problem with our simple syntax; as you can see, the test suite completes without error, and all clinic code in our repo is parsed without problems. No surprises (yet).
It generates very bad error messages in many cases (reflected in the PR title). Also, the parsing failures are scattered around the code, instead of collected in one place as in this PR. See my earlier comments:
IMO, it is worth it to generate better error messages. |
It misses some corner cases, but it is a good alternative; thanks. |
@serhiy-storchaka, I adapted it to fit in commits 9b93771 and 1cc7248. I removed some edge cases1; perhaps it is extreme to check for such cases of invalid syntax anyway 🤷 It is a handful of lines shorter, which is nice. IMO, the shlex approach is more readable, but we don't have to weight that too heavy. What do you think? Footnotes |
RE_C_BASENAME = re.compile(r"\bas\b\s*(?:([^-=\s]+)\s*)?") | ||
RE_CLONE = re.compile(r"=\s*(?:([^-=\s]+)\s*)?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it pass most of your tests, but perhaps \w+
or [\w.]+
is better than [^-=\s]+
. It will produce different error message for foo.bar as ''
, but it may be for good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my test case might also be too contrived.
I don't have the bandwidth to follow this up now; closing the PR but keeping the local branch. Feel free to pick it up. |
Uh oh!
There was an error while loading. Please reload this page.