8000 SyntaxWarning: invalid decimal literal · Issue #114524 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

SyntaxWarning: invalid decimal literal #114524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
exander77 opened this issue Jan 24, 2024 · 15 comments
Closed

SyntaxWarning: invalid decimal literal #114524

exander77 opened this issue Jan 24, 2024 · 15 comments
Labels
topic-parser type-bug An unexpected behavior, bug, or error

Comments

@exander77
Copy link
exander77 commented Jan 24, 2024

Bug description:

In the last two years, Python started issuing superfluous misleading warning for perfectly valid code.

Examples:

[ord(x)>>5for x in h]
b'\x03'if P.y%2else b'\x02'

In the first example, 5for is not an invalid decimal literal, it is a decimal literal 5 followed by keyword for which a perfectly valid sequence of lexical elements. Similar to second example 2else.

An invalid decimal literal would be using a decimal literal that starts with number:

2for = 15

Where 2for is used as a name of variable. I would personally call it an invalid identifier (so even official use is confusing).
But the cases above are not the cases of invalid literal, they are correctly false. This worked due to how lexical analyzers are constructed as way back as C or even longer. Why did Python start issuing this misleading warning out of the blue?
Also note, that b'\x03'if doesn't produce any warning.

Can this nonsense be suppressed?

I would note that 10 years ago I wrote code obfuscation/minimizer and deobfuscator/deminizer that decided when whitespace needs to be introduced between two lexical elements based on the fact that you don't need to introduce if joining of two adjacent lexical elements doesn't produce a new lexical element.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

@exander77 exander77 added the type-bug An unexpected behavior, bug, or error label Jan 24, 2024
@ericvsmith
Copy link
Member

I don't think calling this "nonsense" is helping your argument. There are valid reasons with the new parser on why this is done, I'm not sure how difficult it would be to change.

cc @pablogsal

Copy link
Author
exander77 commented Jan 24, 2024

I don't think calling this "nonsense" is helping your argument. There are valid reasons with the new parser on why this is done, I'm not sure how difficult it would be to change.

cc @pablogsal

If there is no invalid literal in the code, then reporting an invalid literal is nonsense:

[ord(x)>>5for x in h]

This is perfectly valid, and there are no invalid literals.

And as I mentioned even for the cases I assume it was meant:

2for = 15

It is still nonsense as it is not an invalid decimal literal, but invalid identifier.

Another case I can image is:

a = 2else

But that would not be an invalid decimal identifier either, this would be an unexpected lexical element.

But both of these cases already issue SyntaxError, so I also don't understand this warning in the first place:

>>> 2for = 15
<stdin>:1: SyntaxWarning: invalid decimal literal
  File "<stdin>", line 1
    2for = 15
     ^^^
SyntaxError: invalid syntax
>>> a = 2else
<stdin>:1: SyntaxWarning: invalid decimal literal
  File "<stdin>", line 1
    a = 2else
         ^^^^
SyntaxError: invalid syntax

I assume this is badly written check for decimal numbers only containing digits 0..9.

@exander77
Copy link
Author
exander77 commented Jan 24, 2024

The issue is also present in different bases:

>>> 0x0if True else 1
<stdin>:1: SyntaxWarning: invalid hexadecimal literal
0
>>> 0if True else 1
<stdin>:1: SyntaxWarning: invalid decimal literal
0
>>> [ord("a")>>0o5for x in [1,2,3]]
<stdin>:1: SyntaxWarning: invalid octal literal
[3, 3, 3]

@terryjreedy
Copy link
Member

There are people who think it a bug that CPython allows such nonsense as return"a" without a space and have requested a change to require it. They think what you want to be a monstrosity. In any case, Python is not C.

@exander77
Copy link
Author

There are people who think it a bug that CPython allows such nonsense as return"a" without a space and have requested a change to require it. They think what you want to be a monstrosity. In any case, Python is not C.

I don't want to come offensive or something, but this makes no sense. I wrote "as way back as C or even longer". It has no relation to C at all, it just a temporal measure.

return"a" works in Python, JavaScript, Java, C#... I can't even think of language where it doesn't work.

What I am describing is industry standard and predictable behavior.

@mdickinson
Copy link
Member

For some history, see #87999, along with this thread on the python-dev mailing list.

@sunmy2019
Copy link
Member

What's done is done. I am against reverting it.

5for is not an invalid decimal literal, it is a decimal literal 5 followed by keyword for

Lexing comes first. According to the current lexing rules, 5for can be both 5f or or 5 for. You only know it's valid after you apply the grammar rules.

Keeping track of it is a burden on the tokenizer and parser, as well as on their implementors.

Can this nonsense be suppressed?

I would note that 10 years ago I wrote code obfuscation/minimizer and deobfuscator/deminizer that decided when whitespace needs to be introduced between two lexical elements based on the fact that you don't need to introduce if joining of two adjacent lexical elements doesn't produce a new lexical element.

However, I encourage you to fork Python and implement your idea. If that proves to be elegant and works well, people may reconsider your idea.

@exander77
Copy link
Author
exander77 commented Jan 26, 2024

What's done is done. I am against reverting it.

5for is not an invalid decimal literal, it is a decimal literal 5 followed by keyword for

Lexing comes first. According to the current lexing rules, 5for can be both 5f or or 5 for. You only know it's valid after you apply the grammar rules.

No, lexical analysis is greed from left to right. It is definitely 5f and or. You don't need any grammar to decide.

I am not sure if there is anything to implement here, I would just drop the totally misleading warning that serves no purpose.

Even this:

[0x1for x in (1,2)]

is perfectly valid and unambiguous: 0x1f and or.

@exander77
Copy link
Author

For some history, see #87999, along with this thread on the python-dev mailing list.

What kind of confusing me there is, what is the discussion about? This is well understood behavior over basically all programming languages. I mentioned these things regularly in compiler construction courses.

@aroberge
Copy link

I mentioned these things regularly in compiler construction courses.

I believe that the warnings are primarily intended for, and very helpful to beginners.

@exander77
Copy link
Author
exander77 commented Jan 26, 2024
< 8000 task-lists disabled sortable>

I mentioned these things regularly in compiler construction courses.

I believe that the warnings are primarily intended for, and very helpful to beginners.

I would be really interested on what data are you basing it. Does it even affect any beginners? Good IDE will show correct separation that is easily seen, bad IDE will show this as syntax error (even though it is not) and most beginners don't write such a dense code anyway. This really sounds like fixing something what is not broken.

Based on my anecdotal experience, I have never seen anybody hitting this issue in around 10 years teaching various courses, from basic programming in C to compiler construction.

@mdickinson
Copy link
Member

@exander77

What kind of confusing me there is, what is the discussion about?

I'm not quite sure what you're asking here. My point was that this is not a bug or an artifact of the parser implementation - it's a deliberate design decision aimed at human readers rather than parsers.

warning that serves no purpose

The purpose of the SyntaxWarning (which was promoted by @serhiy-storchaka in #91980 from an earlier DeprecationWarning) is to warn developers that this syntax may become an error in a future version of Python. In fact, I believe the original plan was that this should be an error in Python 3.12 and later: see #87999 (comment). Ironically, while the SyntaxWarning is more visible than the original DeprecationWarning, I think it's less clear here that it's intended to indicate a deprecated piece of syntax.

is perfectly valid and unambiguous: 0x1f and or [...]

To a parser, sure, but I'd argue that readability for human parsers should be the primary driver.

For this issue, I think we should close (since it was reported as a bug, but there's no bug here). @exander77 If you want to argue for a change in direction, that's probably a discussion that would be more fruitful on https://discuss.python.org. That could then be turned into a feature request here if there's consensus that a change is needed.

@serhiy-storchaka
Copy link
Member

It is only valid to a parser because it was made to recognize particular ambiguous syntax. For example it has special code to accept x if y>1else z, but it fails to parse x>0or y, because it has no special code for such case. When new keywords or numerical literal syntax be added in future, it can produce more cases that are ambiguous for human or parser.

@exander77
Copy link
Author
exander77 commented Jan 28, 2024

I am not sure that reasoning that the change is deliberate means it is not a bug, as the invalid literal is issued for cases where not invalid literal is present, as I have already described.

The provided sources actually make a tonne of argument against it (and not really any for it):

We tried changing this IIRC and it broke code i 9DFB n the stdlib (now reformatted) so it will break code in the wild. I am not sure the gains are worth it.

This is know behaviour unfortunately and cannot be changed because of backwards compatibility.

This would make Python 3.8 reject code due to stylistic preference. Code that it actually can unambiguously parse today.

I recommend just letting this be. Aside from it allowing for a cute riddle, in the real world seems to be harmless and not worth breaking code.

It will be disappointing if it becomes an error and break many past programs (you can search for phrases like 1and, 0for on https://codegolf.stackexchange.com/search?q=0for for examples).

I am not sure how Python development is done, but this looks to me like serhiy-storchaka decided to just do it.

Even stdlib broke after the change.

It would be useful to first estimate how many projects would be broken by such incompatible change (stricter syntax).

Was any analysis even done?

I assume that this will be changed from warning to syntax error, so We are basically breaking a tonne of legacy code for no reason?

To quote already made point:
> This would make Python 3.8 reject code due to stylistic preference.

I am really interested how this was approved into code when the prevalent responses were negative.

@serhiy-storchaka
Copy link
Member

Closing as suggested by @mdickinson.

@erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-parser type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

8 participants
0