8000 Merge branch 'master' into kuvandjiev/master · html5lib/html5lib-python@9a5b127 · GitHub
[go: up one dir, main page]

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 9a5b127

Browse files
committed
Merge branch 'master' into kuvandjiev/master
2 parents dae6201 + f0bb2a6 commit 9a5b127

File tree

94 files changed

+9670
-1197
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+9670
-1197
lines changed

.appveyor.yml

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,18 @@
1-
# To activate, change the Appveyor settings to use `.appveyor.yml`.
1+
image: Visual Studio 2019
22
environment:
33
global:
44
PATH: "C:\\Python27\\Scripts\\;%PATH%"
5-
PYTEST_COMMAND: "coverage run -m pytest"
65
matrix:
76
- TOXENV: py27-base
87
- TOXENV: py27-optional
9-
- TOXENV: py34-base
10-
- TOXENV: py34-optional
118
- TOXENV: py35-base
129
- TOXENV: py35-optional
1310
- TOXENV: py36-base
1411
- TOXENV: py36-optional
1512

1613
install:
1714
- git submodule update --init --recursive
18-
- python -m pip install tox codecov
15+
- python -m pip install tox
1916

2017
build: off
2118

@@ -24,6 +21,3 @@ test_script:
2421

2522
after_test:
2623
- python debug-info.py
27-
28-
on_success:
29-
- codecov

.github/workflows/python-tox.yml

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
on: [pull_request, push]
2+
jobs:
3+
build:
4+
# Prevent duplicate builds for 'internal' pull requests on existing commits
5+
# Credit: https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012
6+
if: github.event.push || github.event.pull_request.head.repo.full_name != github.repository
7+
strategy:
8+
fail-fast: false
9+
matrix:
10+
# 2.7, 3.5, and 3.6 run on Windows via AppVeyor
11+
python: ["3.7", "3.8", "3.9", "3.10", "3.11"]
12+
os: [ubuntu-latest, windows-latest]
13+
deps: [base, optional]
14+
include:
15+
- python: "pypy-2.7"
16+
os: ubuntu-latest
17+
deps: base
18+
- python: "pypy-3.8"
19+
os: ubuntu-latest
20+
deps: base
21+
- python: "2.7"
22+
os: ubuntu-latest
23+
deps: oldest
24+
- python: "3.7"
25+
os: ubuntu-latest
26+
deps: oldest
27+
runs-on: ${{ matrix.os }}
28+
steps:
29+
- uses: actions/checkout@v3
30+
with:
31+
submodules: true
32+
- if: ${{ matrix.deps == 'base' }}
33+
uses: actions/setup-python@v4
34+
with:
35+
python-version: ${{ matrix.python }}
36+
cache: pip
37+
cache-dependency-path: |
38+
requirements.txt
39+
requirements-test.txt
40+
- if: ${{ matrix.deps == 'optional' }}
41+
uses: actions/setup-python@v4
42+
with:
43+
python-version: ${{ matrix.python }}
44+
cache: pip
45+
cache-dependency-path: |
46+
requirements.txt
47+
requirements-optional.txt
48+
requirements-test.txt
49+
- if: ${{ matrix.deps == 'oldest' }}
50+
uses: actions/setup-python@v4
51+
with:
52+
python-version: ${{ matrix.python }}
53+
cache: pip
54+
cache-dependency-path: |
55+
requirements-oldest.txt
56+
- if: ${{ matrix.os == 'windows-latest' }}
57+
name: Determine environment name for Tox (PowerShell)
58+
run: python toxver.py ${{ matrix.python }} ${{ matrix.deps }} >> $env:GITHUB_ENV
59+
- if: ${{ matrix.os == 'ubuntu-latest' }}
60+
name: Determine environment name for Tox (Bash)
61+
run: python toxver.py ${{ matrix.python }} ${{ matrix.deps }} >> $GITHUB_ENV
62+
- run: pip install tox
63+
- run: tox
64+
- if: ${{ always() }}
65+
run: python debug-info.py

.pytest.expect

Lines changed: 149 additions & 116 deletions
Large diffs are not rendered by default.

.travis.yml

Lines changed: 0 additions & 31 deletions
This file was deleted.

AUTHORS.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Credits
44
``html5lib`` is written and maintained by:
55

66
- James Graham
7-
- Geoffrey Sneddon
7+
- Sam Sneddon
88
- Łukasz Langa
99
- Will Kahn-Greene
1010

CHANGES.rst

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,49 @@
11
Change Log
22
----------
33

4+
1.2
5+
~~~
6+
7+
Unreleased yet
8+
9+
Features:
10+
11+
* Add support for the ``<wbr>`` element in the sanitizer, `which indicates
12+
a line break opportunity <https://html.spec.whatwg.org/#the-wbr-element>`_.
13+
This element is allowed by default. (#395) (Thank you, Tom Most!)
14+
* Add support for serializing the ``<ol reversed>`` boolean attribute. (Thank
15+
you, Tom Most!) (#396)
16+
* The ``<ol reversed>`` and ``<ol start>`` attributes are now permitted by the
17+
sanitizer. (#321) (Thank you, Tom Most!)
18+
19+
Bug fixes:
20+
21+
* The sanitizer now permits ``<summary>`` tags. It used to allow ``<details>``
22+
already. (#423)
23+
24+
1.1
25+
~~~
26+
27+
Released on June 23, 2020
28+
29+
Breaking changes:
30+
31+
* Drop support for Python 3.3. (#358)
32+
* Drop support for Python 3.4. (#421)
33+
34+
Deprecations:
35+
36+
* Deprecate the ``html5lib`` sanitizer (``html5lib.serialize(sanitize=True)`` and
37+
``html5lib.filters.sanitizer``). We recommend users migrate to `Bleach
38+
<https://github.com/mozilla/bleach>`. Please let us know if Bleach doesn't suffice for your
39+
use. (#443)
40+
41+
Other changes:
42+
43+
* Try to import from ``collections.abc`` to remove DeprecationWarning and ensure
44+
``html5lib`` keeps working in future Python versions. (#403)
45+
* Drop optional ``datrie`` dependency. (#442)
46+
447
1.0.1
548
~~~~~
649

@@ -20,7 +63,7 @@ Features:
2063
* Support Python 3.6. (#333) (Thank you, Jon Dufresne!)
2164
* Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!)
2265
* Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon
23-
Dufresne, John Vandenberg, Geoffrey Sneddon, Will Kahn-Greene!)
66+
Dufresne, John Vandenberg, Sam Sneddon, Will Kahn-Greene!)
2467
* Semver-compliant version number.
2568

2669
Bug fixes:
@@ -73,7 +116,7 @@ Released on July 14, 2016
73116
tested, doesn't entirely work, and as far as I can tell is
74117
completely unused by anyone.**
75118

76-
* Move testsuite to ``py.test``.
119+
* Move testsuite to ``pytest``.
77120

78121
* **Fix #124: move to webencodings for decoding the input byte stream;
79122
this makes html5lib compliant with the Encoding Standard, and

CONTRIBUTING.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ documentation. Some useful information:
1616
- We keep the master branch passing all tests at all times on all
1717
supported versions.
1818

19-
`Travis CI <https://travis-ci.org/html5lib/html5lib-python/>`_ is run
19+
`GitHub Actions <https://github.com/html5lib/html5lib-python/actions>`_ is run
2020
against all pull requests and should enforce all of the above.
2121

2222
We use `Opera Critic <https://critic.hoppipolla.co.uk/>`_ as an external

README.rst

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
html5lib
22
========
33

4-
.. image:: https://travis-ci.org/html5lib/html5lib-python.svg?branch=master
5-
:target: https://travis-ci.org/html5lib/html5lib-python
6-
4+
.. image:: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml/badge.svg
5+
:target: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml
76

87
html5lib is a pure-python library for parsing HTML. It is designed to
98
conform to the WHATWG HTML specification, as is implemented by all major
@@ -91,23 +90,22 @@ More documentation is available at https://html5lib.readthedocs.io/.
9190
Installation
9291
------------
9392

94-
html5lib works on CPython 2.7+, CPython 3.4+ and PyPy. To install it,
95-
use:
93+
html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install:
9694

9795
.. code-block:: bash
9896
9997
$ pip install html5lib
10098
99+
The goal is to support a (non-strict) superset of the versions that `pip
100+
supports
101+
<https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_.
101102

102103
Optional Dependencies
103104
---------------------
104105

105106
The following third-party libraries may be used for additional
106107
functionality:
107108

108-
- ``datrie`` can be used under CPython to improve parsing performance
109-
(though in almost all cases the improvement is marginal);
110-
111109
- ``lxml`` is supported as a tree format (for both building and
112110
walking) under CPython (but *not* PyPy where it is known to cause
113111
segfaults);
@@ -129,7 +127,7 @@ Tests
129127
-----
130128

131129
Unit tests require the ``pytest`` and ``mock`` libraries and can be
132-
run using the ``py.test`` command in the root directory.
130+
run using the ``pytest`` command in the root directory.
133131

134132
Test data are contained in a separate `html5lib-tests
135133
<https://github.com/html5lib/html5lib-tests>`_ repository and included
@@ -146,7 +144,9 @@ which can be found on PyPI.
146144
Questions?
147145
----------
148146

149-
There's a mailing list available for support on Google Groups,
150-
`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
151-
though you may get a quicker response asking on IRC in `#whatwg on
152-
irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.
147+
Check out `the docs <https://html5lib.readthedocs.io/en/latest/>`_. Still
148+
need help? Go to our `GitHub Discussions
149+
<https://github.com/html5lib/html5lib-python/discussions>`_.
150+
151+
You can also browse the archives of the `html5lib-discuss mailing list
152+
<https://www.mail-archive.com/html5lib-discuss@googlegroups.com/>`_.

benchmarks/bench_html.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import io
2+
import os
3+
import sys
4+
5+
import pyperf
6+
7+
sys.path[0:0] = [os.path.join(os.path.dirname(__file__), "..")]
8+
import html5lib # noqa: E402
9+
10+
11+
def bench_parse(fh, treebuilder):
12+
fh.seek(0)
13+
html5lib.parse(fh, treebuilder=treebuilder, useChardet=False)
14+
15+
16+
def bench_serialize(loops, fh, treebuilder):
17+
fh.seek(0)
18+
doc = html5lib.parse(fh, treebuilder=treebuilder, useChardet=False)
19+
20+
range_it = range(loops)
21+
t0 = pyperf.perf_counter()
22+
23+
for loops in range_it:
24+
html5lib.serialize(doc, tree=treebuilder, encoding="ascii", inject_meta_charset=False)
25+
26+
return pyperf.perf_counter() - t0
27+
28+
29+
BENCHMARKS = ["parse", "serialize"]
30+
31+
32+
def add_cmdline_args(cmd, args):
33+
if args.benchmark:
34+
cmd.append(args.benchmark)
35+
36+
37+
if __name__ == "__main__":
38+
runner = pyperf.Runner(add_cmdline_args=add_cmdline_args)
39+
runner.metadata["description"] = "Run benchmarks based on Anolis"
40+
runner.argparser.add_argument("benchmark", nargs="?", choices=BENCHMARKS)
41+
42+
args = runner.parse_args()
43+
if args.benchmark:
44+
benchmarks = (args.benchmark,)
45+
else:
46+
benchmarks = BENCHMARKS
47+
48+
with open(os.path.join(os.path.dirname(__file__), "data", "html.html"), "rb") as fh:
49+
source = io.BytesIO(fh.read())
50+
51+
if "parse" in benchmarks:
52+
for tb in ("etree", "dom", "lxml"):
53+
runner.bench_func("html_parse_%s" % tb, bench_parse, source, tb)
54+
55+
if "serialize" in benchmarks:
56+
for tb in ("etree", "dom", "lxml"):
57+
runner.bench_time_func("html_serialize_%s" % tb, bench_serialize, source, tb)

benchmarks/bench_wpt.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import io
2+
import os
3+
import sys
4+
5+
import pyperf
6+
7+
sys.path[0:0] = [os.path.join(os.path.dirname(__file__), "..")]
8+
import html5lib # noqa: E402
9+
10+
11+
def bench_html5lib(fh):
12+
fh.seek(0)
13+
html5lib.parse(fh, treebuilder="etree", useChardet=False)
14+
15+
16+
def add_cmdline_args(cmd, args):
17+
if args.benchmark:
18+
cmd.append(args.benchmark)
19+
20+
21+
BENCHMARKS = {}
22+
for root, dirs, files in os.walk(os.path.join(os.path.dirname(os.path.abspath(__file__)), "data", "wpt")):
23+
for f in files:
24+
if f.endswith(".html"):
25+
BENCHMARKS[f[: -len(".html")]] = os.path.join(root, f)
26+
27+
28+
if __name__ == "__main__":
29+
runner = pyperf.Runner(add_cmdline_args=add_cmdline_args)
30+
runner.metadata["description"] = "Run parser benchmarks from WPT"
31+
runner.argparser.add_argument("benchmark", nargs="?", choices=sorted(BENCHMARKS))
32+
33+
args = runner.parse_args()
34+
if args.benchmark:
35+
benchmarks = (args.benchmark,)
36+
else:
37+
benchmarks = sorted(BENCHMARKS)
38+
39+
for bench in benchmarks:
40+
name = "wpt_%s" % bench
41+
path = BENCHMARKS[bench]
42+
with open(path, "rb") as fh:
43+
fh2 = io.BytesIO(fh.read())
44+
45+
runner.bench_func(name, bench_html5lib, < 3322 span class=pl-s1>fh2)

0 commit comments

Comments
 (0)
0