8000 gh-102555 Increase HTML standard compliance for closing comment tags by Privat33r-dev · Pull Request #117406 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-102555 Increase HTML standard compliance for closing comment tags #117406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

Privat33r-dev
Copy link
Contributor
@Privat33r-dev Privat33r-dev commented Mar 31, 2024

@Privat33r-dev
Copy link
Contributor Author
Privat33r-dev commented Mar 31, 2024

We might as well handle the <!--> case (test case: <!--><script>alert(document.domain)</script>) and add some test cases, but it might as well be a subject for a different PR. Let me know how would you prefer it.

Copy link
Member
@ezio-melotti ezio-melotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Before reviewing and merging, tests should be added.

@@ -9,7 +9,7 @@

_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match
_declstringlit_match = re.compile(r'(\'[^\']*\'|"[^"]*")\s*').match
_commentclose = re.compile(r'--\s*>')
_commentclose = re.compile(r'--!?>')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the \s*, even though I should double check what the HTML5 specs say exactly.

Copy link
Contributor Author
@Privat33r-dev Privat33r-dev Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the \s*, even though I should double check what the HTML5 specs say exactly.

I provided the links to HTML5 specification earlier and "\s*" mentioned nowhere, moreover, my tests with latest versions of Firefox and Chrome has shown that it's in fact an incorrect behaviour and is not considered a closing tag by modern browsers. Thus I see no reason in keeping it (nor spec, nor common practice).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://html.spec.whatwg.org/#comment-end-state is the section of the specs I was looking for. It does indeed mention the ! but not the spaces, so updating the code accordingly sounds good to me.

Do you want to add tests to check these (-->, --!>, -- >, --x>, --->, etc.) cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://html.spec.whatwg.org/#comment-end-state is the section of the specs I was looking for. It does indeed mention the ! but not the spaces, so updating the code accordingly sounds good to me.

Do you want to add tests to check these (-->, --!>, -- >, --x>, --->, etc.) cases?

I am thinking about improving the solution to even include <!-->, unexpected EOF and similar other test cases (that were mentioned in a similar PR), but at the moment, unfortunately, I am lacking time to work on this PR. Hopefully, in the week (or at the weekend at worst) I can add the test cases and change a few other parts of the code to handle even wider variety of edge cases.

@Privat33r-dev
Copy link
Contributor Author

@ezio-melotti EOF edge-case (described here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-comment ) appears to be a bit more complicated, I will try to resolve it today as well anyway.

Currently added short comment (<!-->) edge-case handling with tests.

@Privat33r-dev
Copy link
Contributor Author
Privat33r-dev commented Apr 6, 2024

New EOF behaviour seems to be consistent with chromium-based browser.

There is still an edge case with EOF-ending abrupt comment case, but the case is relatively hard to handle and quite rare (html after tag that starts with <! should not contain any closing tags), that's why I decided to skip it, at least for this PR (since the PR already is beyond and above the initial issue).

http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
EOF: <! some comment -> ('comment', ' some comment')


Ready for review.

@Privat33r-dev
Copy link
Contributor Author

@ezio-melotti it would be nice if you can review the change soon :)

@ezio-melotti
Copy link
Member

I would like to give this a proper review before merging, but unfortunately I probably won't have time to look at this until the end of May. If I haven't replied by then, feel free to ping me again, and thanks for working on this!

@Privat33r-dev
Copy link
Contributor Author

I would like to give this a proper review before merging, but unfortunately I probably won't have time to look at this until the end of May. If I haven't replied by then, feel free to ping me again, and thanks for working on this!

Thanks for updating me on the review status, glad to know that it's planned 👍🏻

@Privat33r-dev
Copy link
Contributor Author

I would like to give this a proper review before merging, but unfortunately I probably won't have time to look at this until the end of May. If I haven't replied by then, feel free to ping me again, and thanks for working on this!

Hi. It's the last day of May, so I decided to ping you :)

@Privat33r-dev
Copy link
Contributor Author

@ezio-melotti I wonder if you might have some time this time? :)

@Privat33r-dev
Copy link
Contributor Author
Privat33r-dev commented Feb 13, 2025

@ezio-melotti I appreciate your earlier willingness to review the PR, and I understand that sometimes time is tight. Would you like to review it now? :)

@serhiy-storchaka serhiy-storchaka self-requested a review May 7, 2025 11:38
Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely forgot about this issue and this PR and created #135664.

The part related to unclosed comments was already included in #135464.

There are few flaws in this PR.

  • <!--!> and <!---!> should not be parsed as abruptly ended empty comments. --!> does not work here.
  • When unclosed comment ends with -, -- or --! just before EOF, they should not be appended to the comment token's data.

You can try to fix these issues in this PR, but I'll just add you as co-author of #135664 -- this will be simpler.

@Privat33r-dev
Copy link
Contributor Author

I completely forgot about this issue and this PR and created #135664.

The part related to unclosed comments was already included in #135464.

There are few flaws in this PR.

  • <!--!> and <!---!> should not be parsed as abruptly ended empty comments. --!> does not work here.
  • When unclosed comment ends with -, -- or --! just before EOF, they should not be appended to the comment token's data.

You can try to fix these issues in this PR, but I'll just add you as co-author of #135664 -- this will be simpler.

Thanks for deep dive. I double-checked the specs and confirm that you are right. I think that we can just go with your PR as it would be much simpler. Thanks for the honor of co-authorship :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0