8000 HTMLParser stops parsing upon encountering `<style>` tag · Issue #118350 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content
HTMLParser stops parsing upon encountering <style> tag #118350
Open
@savchenko

Description

@savchenko

Bug report

Bug description:

An example where parsing stops after the <style color="red">:

from html.parser import HTMLParser
from io import StringIO

class HTML2text(HTMLParser):
    def __init__(self):
        super().__init__()
        self.data = StringIO()
    def handle_data(self, html):
        self.data.write(html)
    def get_data(self):
        return self.data.getvalue().strip()

html_test = '''
<!DOCTYPE html>
<head><title>Glued</title></head><body><some><style color="red">title</bar>
<h1>Spacious             </h1><a href="https://heading.net">heading.net</a>
<span>not<a href="https://www.arpa.home">my.home.arpa</a><p>        URL</p>
</body></html>
'''

parser = HTML2text()
parser.feed(html_test)
print(parser.get_data())

Changing a single character in the word "style" restores the normal functionality.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

Labels

type-bugAn unexpected behavior, bug, or error

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0