From de70ae4ca71b15bc8c77f54c4e60ed345214c4df Mon Sep 17 00:00:00 2001
From: inikulin <ifaaan@gmail.com>
Date: Fri, 23 Jun 2017 21:50:04 +0300
Subject: [PATCH 01/68] Fix malformed JSON from previous commit

---
 tokenizer/entities.test | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tokenizer/entities.test b/tokenizer/entities.test
index 1daff254..7c514563 100644
--- a/tokenizer/entities.test
+++ b/tokenizer/entities.test
@@ -17,14 +17,14 @@
 
 {"description": "Semicolonless named entity 'not' followed by 'i;' in body",
 "input":"&noti;",
-"output": [["Character", "\u00ACi;"]]},
+"output": [["Character", "\u00ACi;"]],
 "errors":[
     { "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
 ]},
 
 {"description": "Very long undefined named entity in body",
 "input":"&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;",
-"output": [["Character", "&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;"]]},
+"output": [["Character", "&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;"]],
 "errors":[
     { "code": "unknown-named-character-reference", "line": 1, "col": 950 }
 ]},

From a5c88a483e4f643a5446ecca579ce344e6bd6d8a Mon Sep 17 00:00:00 2001
From: Ingvar Stepanyan <me@rreverser.com>
Date: Wed, 12 Jul 2017 16:53:16 +0100
Subject: [PATCH 02/68] Remove `ignoreErrorOrder` option from docs

It's not used anymore with changes in #92.
---
 tokenizer/README.md | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/tokenizer/README.md b/tokenizer/README.md
index 56956369..50ba680f 100644
--- a/tokenizer/README.md
+++ b/tokenizer/README.md
@@ -84,14 +84,6 @@ If `test.doubleEscaped` is present and `true`, then every string within
 `test.output` must be further unescaped (as described above) before
 comparing with the tokenizer's output.
 
-`test.ignoreErrorOrder` is a boolean value indicating that the order of
-`ParseError` tokens relative to other tokens in the output stream is
-unimportant, and implementations should ignore such differences between
-their output and `expected_output_tokens`. (This is used for errors
-emitted by the input stream preprocessing stage, since it is useful to
-test that code but it is undefined when the errors occur). If it is
-omitted, it defaults to `false`.
-
 xmlViolation tests
 ------------------
 

From 8e19e7ad29473842154977d7624aee0097a6def2 Mon Sep 17 00:00:00 2001
From: Ingvar Stepanyan <me@rreverser.com>
Date: Mon, 17 Jul 2017 15:56:04 +0100
Subject: [PATCH 03/68] Concatenate character tokens

Looks like these few places were missed when ParseError token type was removed.

This PR fixes them to restore the state promised in the README:

> All adjacent character tokens are coalesced into a single ["Character", data] token.
---
 tokenizer/test1.test                   | 4 ++--
 tokenizer/test2.test                   | 6 +++---
 tokenizer/test3.test                   | 4 ++--
 tokenizer/test4.test                   | 6 +++---
 tokenizer/unicodeCharsProblematic.test | 4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tokenizer/test1.test b/tokenizer/test1.test
index 09d15024..8b85050f 100644
--- a/tokenizer/test1.test
+++ b/tokenizer/test1.test
@@ -182,14 +182,14 @@
 
 {"description":"Entity without trailing semicolon (1)",
 "input":"I'm &notit",
-"output":[["Character","I'm "], ["Character", "\u00ACit"]],
+"output":[["Character","I'm \u00ACit"]],
 "errors": [
     {"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
 ]},
 
 {"description":"Entity without trailing semicolon (2)",
 "input":"I'm &notin",
-"output":[["Character","I'm "], ["Character", "\u00ACin"]],
+"output":[["Character","I'm \u00ACin"]],
 "errors": [
     {"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
 ]},
diff --git a/tokenizer/test2.test b/tokenizer/test2.test
index 73f0421d..521694ca 100644
--- a/tokenizer/test2.test
+++ b/tokenizer/test2.test
@@ -119,7 +119,7 @@
 
 {"description":"Hexadecimal entity pair representing a surrogate pair",
 "input":"&#xD869;&#xDED6;",
-"output":[["Character", "\uFFFD"], ["Character", "\uFFFD"]],
+"output":[["Character", "\uFFFD\uFFFD"]],
 "errors":[
     { "code": "surrogate-character-reference", "line": 1, "col": 9 },
     { "code": "surrogate-character-reference", "line": 1, "col": 17 }
@@ -195,7 +195,7 @@
 
 {"description":"Unescaped <",
 "input":"foo < bar",
-"output":[["Character", "foo "], ["Character", "< bar"]],
+"output":[["Character", "foo < bar"]],
 "errors":[
     { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 6 }
 ]},
@@ -242,7 +242,7 @@
 
 {"description":"Empty end tag with following characters",
 "input":"a</>bc",
-"output":[["Character", "a"], ["Character", "bc"]],
+"output":[["Character", "abc"]],
 "errors":[
     { "code": "missing-end-tag-name", "line": 1, "col": 4 }
 ]},
diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index ba3c15b3..85139d4d 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -88,7 +88,7 @@
 
 {"description":"<\\u0000",
 "input":"<\u0000",
-"output":[["Character", "<"], ["Character", "\u0000"]],
+"output":[["Character", "<\u0000"]],
 "errors":[
     { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 2 },
     { "code": "unexpected-null-character", "line": 1, "col": 2 }
@@ -8415,7 +8415,7 @@
 
 {"description":"<<",
 "input":"<<",
-"output":[["Character", "<"], ["Character", "<"]],
+"output":[["Character", "<<"]],
 "errors":[
     { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 2 },
     { "code": "eof-before-tag-name", "line": 1, "col": 3 }
diff --git a/tokenizer/test4.test b/tokenizer/test4.test
index 8e55e767..dd247d54 100644
--- a/tokenizer/test4.test
+++ b/tokenizer/test4.test
@@ -190,7 +190,7 @@
 
 {"description":"Empty hex numeric entities",
 "input":"&#x &#X ",
-"output":[["Character", "&#x "], ["Character", "&#X "]],
+"output":[["Character", "&#x &#X "]],
 "errors":[
     { "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 },
     { "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 8 }
@@ -205,7 +205,7 @@
 
 {"description":"Empty decimal numeric entities",
 "input":"&# &#; ",
-"output":[["Character", "&# "], ["Character", "&#; "]],
+"output":[["Character", "&# &#; "]],
 "errors":[
     { "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 },
     { "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 6 }
@@ -274,7 +274,7 @@
 
 {"description":"Surrogate code point edge cases",
 "input":"&#xD7FF;&#xD800;&#xD801;&#xDFFE;&#xDFFF;&#xE000;",
-"output":[["Character", "\uD7FF"], ["Character", "\uFFFD"], ["Character", "\uFFFD"], ["Character", "\uFFFD"], ["Character", "\uFFFD\uE000"]],
+"output":[["Character", "\uD7FF\uFFFD\uFFFD\uFFFD\uFFFD\uE000"]],
 "errors":[
     { "code": "surrogate-character-reference", "line": 1, "col": 17 },
     { "code": "surrogate-character-reference", "line": 1, "col": 25 },
diff --git a/tokenizer/unicodeCharsProblematic.test b/tokenizer/unicodeCharsProblematic.test
index 346cad17..3ddb96c0 100644
--- a/tokenizer/unicodeCharsProblematic.test
+++ b/tokenizer/unicodeCharsProblematic.test
@@ -18,7 +18,7 @@
 {"description": "Invalid Unicode character U+DFFF with valid preceding character",
 "doubleEscaped":true,
 "input": "a\\uDFFF",
-"output":[["Character", "a"], ["Character", "\\uDFFF"]],
+"output":[["Character", "a\\uDFFF"]],
 "errors":[
     { "code": "surrogate-in-input-stream", "line": 1, "col": 2 }
 ]},
@@ -33,7 +33,7 @@
 
 {"description":"CR followed by U+0000",
 "input":"\r\u0000",
-"output":[["Character", "\n"], ["Character", "\u0000"]],
+"output":[["Character", "\n\u0000"]],
 "errors":[
     { "code": "unexpected-null-character", "line": 2, "col": 1 }
 ]}

From 9314ef76ec48af7fe89aba23e754d47df6bb8a4b Mon Sep 17 00:00:00 2001
From: Ingvar Stepanyan <me@rreverser.com>
Date: Tue, 25 Jul 2017 22:36:23 +0100
Subject: [PATCH 04/68] Add a list of currently allowed initial states (#101)

Fixes #99
---
 tokenizer/README.md | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tokenizer/README.md b/tokenizer/README.md
index 50ba680f..66b81e8f 100644
--- a/tokenizer/README.md
+++ b/tokenizer/README.md
@@ -45,9 +45,18 @@ into the corresponding Unicode code point. (Note that this option also
 affects the interpretation of `test.output`.)
 
 `test.initialStates` is a list of strings, each being the name of a
-tokenizer state. The test should be run once for each string, using it
+tokenizer state which can be one of the following:
+
+-   `Data state`
+-   `PLAINTEXT state`
+-   `RCDATA state`
+-   `RAWTEXT state`
+-   `Script data state`
+-   `CDATA section state`
+
+ The test should be run once for each string, using it
 to set the tokenizer's initial state for that run. If
-`test.initialStates` is omitted, it defaults to `["data state"]`.
+`test.initialStates` is omitted, it defaults to `["Data state"]`.
 
 `test.lastStartTag` is a lowercase string that should be used as "the
 tag name of the last start tag to have been emitted from this

From cbafeba94586a1ade00d55e600fc52da8f849986 Mon Sep 17 00:00:00 2001
From: Simon Pieters <zcorpan@gmail.com>
Date: Tue, 22 Aug 2017 11:34:03 +0200
Subject: [PATCH 05/68] Test U+0000 in bogus comment and bogus doctype states

Follows https://github.com/whatwg/html/pull/2939
---
 tokenizer/test3.test                    | 161 +++++++++++++++++++++---
 tokenizer/test4.test                    |   3 +-
 tree-construction/plain-text-unsafe.dat | Bin 9291 -> 9388 bytes
 3 files changed, 148 insertions(+), 16 deletions(-)

diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index 85139d4d..cb04d037 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -141,7 +141,8 @@
 "input":"<!\u0000",
 "output":[["Comment", "\uFFFD"]],
 "errors":[
-    { "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
+    { "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
+    { "code": "unexpected-null-character", "line": 1, "col": 3 }
 ]},
 
 {"description":"<!\\u0009",
@@ -180,6 +181,14 @@
     { "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
 ]},
 
+{"description":"<! \\u0000",
+"input":"<! \u0000",
+"output":[["Comment", " \uFFFD"]],
+"errors":[
+    { "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
+    { "code": "unexpected-null-character", "line": 1, "col": 4 }
+]},
+
 {"description":"<!!",
 "input":"<!!",
 "output":[["Comment", "!"]],
@@ -1887,7 +1896,8 @@
 "input":"<!DOCTYPE a \u0000",
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors":[
-    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 13 }
+    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 13 },
+    { "code": "unexpected-null-character", "line": 1, "col": 13 }
 ]},
 
 {"description":"<!DOCTYPE a \\u0008",
@@ -2069,7 +2079,8 @@
 "input":"<!DOCTYPE a PUBLIC\u0000",
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors": [
-    { "code": "missing-quote-before-doctype-public-identifier", "col": 19, "line": 1 }
+    { "code": "missing-quote-before-doctype-public-identifier", "col": 19, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 19 }
 ]},
 
 {"description":"<!DOCTYPE a PUBLIC\\u0008",
@@ -2213,6 +2224,24 @@
     { "code": "eof-in-doctype", "col": 21, "line": 1 }
 ]},
 
+{"description":"<!DOCTYPE a PUBLIC\"\"\\u0000",
+"input":"<!DOCTYPE a PUBLIC\"\"\u0000",
+"output":[["DOCTYPE", "a", "", null, false]],
+"errors": [
+    { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
+    { "code": "missing-quote-before-doctype-system-identifier", "col": 21, "line": 1 },
+    { "code": "unexpected-null-character", "col": 21, "line": 1 }
+]},
+
+{"description":"<!DOCTYPE a PUBLIC\"\" \\u0000",
+"input":"<!DOCTYPE a PUBLIC\"\" \u0000",
+"output":[["DOCTYPE", "a", "", null, false]],
+"errors": [
+    { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
+    { "code": "missing-quote-before-doctype-system-identifier", "col": 22, "line": 1 },
+    { "code": "unexpected-null-character", "col": 22, "line": 1 }
+]},
+
 {"description":"<!DOCTYPE a PUBLIC\"#",
 "input":"<!DOCTYPE a PUBLIC\"#",
 "output":[["DOCTYPE", "a", "#", null, false]],
@@ -2514,7 +2543,8 @@
 "output":[["DOCTYPE", "a", "", null, false]],
 "errors": [
     { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
-    { "code": "missing-quote-before-doctype-system-identifier", "col": 21, "line": 1 }
+    { "code": "missing-quote-before-doctype-system-identifier", "col": 21, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 21 }
 ]},
 
 {"description":"<!DOCTYPE a PUBLIC''\\u0008",
@@ -2626,6 +2656,46 @@
     { "code": "eof-in-doctype", "col": 22, "line": 1 }
 ]},
 
+{"description":"<!DOCTYPE a PUBLIC''''\\u0000",
+"input":"<!DOCTYPE a PUBLIC''''\u0000",
+"output":[["DOCTYPE", "a", "", "", true]],
+"errors": [
+    { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
+    { "code": "missing-whitespace-between-doctype-public-and-system-identifiers", "col": 21, "line": 1 },
+    { "code": "unexpected-character-after-doctype-system-identifier", "line": 1, "col": 23 },
+    { "code": "unexpected-null-character", "line": 1, "col": 23 }
+]},
+
+{"description":"<!DOCTYPE a PUBLIC''''x\\u0000",
+"input":"<!DOCTYPE a PUBLIC''''x\u0000",
+"output":[["DOCTYPE", "a", "", "", true]],
+"errors": [
+    { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
+    { "code": "missing-whitespace-between-doctype-public-and-system-identifiers", "col": 21, "line": 1 },
+    { "code": "unexpected-character-after-doctype-system-identifier", "line": 1, "col": 23 },
+    { "code": "unexpected-null-character", "line": 1, "col": 24 }
+]},
+
+{"description":"<!DOCTYPE a PUBLIC'''' \\u0000",
+"input":"<!DOCTYPE a PUBLIC'''' \u0000",
+"output":[["DOCTYPE", "a", "", "", true]],
+"errors": [
+    { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
+    { "code": "missing-whitespace-between-doctype-public-and-system-identifiers", "col": 21, "line": 1 },
+    { "code": "unexpected-character-after-doctype-system-identifier", "line": 1, "col": 24 },
+    { "code": "unexpected-null-character", "line": 1, "col": 24 }
+]},
+
+{"description":"<!DOCTYPE a PUBLIC'''' x\\u0000",
+"input":"<!DOCTYPE a PUBLIC'''' x\u0000",
+"output":[["DOCTYPE", "a", "", "", true]],
+"errors": [
+    { "code": "missing-whitespace-after-doctype-public-keyword", "col": 19, "line": 1 },
+    { "code": "missing-whitespace-between-doctype-public-and-system-identifiers", "col": 21, "line": 1 },
+    { "code": "unexpected-character-after-doctype-system-identifier", "line": 1, "col": 24 },
+    { "code": "unexpected-null-character", "line": 1, "col": 25 }
+]},
+
 {"description":"<!DOCTYPE a PUBLIC''(",
 "input":"<!DOCTYPE a PUBLIC''(",
 "output":[["DOCTYPE", "a", "", null, false]],
@@ -3142,7 +3212,24 @@
 "input":"<!DOCTYPE a SYSTEM\u0000",
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors": [
-    { "code": "missing-quote-before-doctype-system-identifier", "col": 19, "line": 1 }
+    { "code": "missing-quote-before-doctype-system-identifier", "col": 19, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 19 }
+]},
+
+{"description":"<!DOCTYPE a SYSTEM \\u0000",
+"input":"<!DOCTYPE a SYSTEM \u0000",
+"output":[["DOCTYPE", "a", null, null, false]],
+"errors": [
+    { "code": "missing-quote-before-doctype-system-identifier", "col": 20, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 20 }
+]},
+
+{"description":"<!DOCTYPE a SYSTEM x\\u0000",
+"input":"<!DOCTYPE a SYSTEM \u0000",
+"output":[["DOCTYPE", "a", null, null, false]],
+"errors": [
+    { "code": "missing-quote-before-doctype-system-identifier", "col": 20, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 20 }
 ]},
 
 {"description":"<!DOCTYPE a SYSTEM\\u0008",
@@ -3586,7 +3673,8 @@
 "output":[["DOCTYPE", "a", null, "", true]],
 "errors":[
     { "code": "missing-whitespace-after-doctype-system-keyword", "line": 1, "col": 19 },
-    { "code": "unexpected-character-after-doctype-system-identifier", "col": 21, "line": 1 }
+    { "code": "unexpected-character-after-doctype-system-identifier", "col": 21, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 21 }
 ]},
 
 {"description":"<!DOCTYPE a SYSTEM''\\u0008",
@@ -3656,6 +3744,24 @@
     { "code": "eof-in-doctype", "col": 22, "line": 1 }
 ]},
 
+{"description":"<!DOCTYPE a SYSTEM'' \\u0000",
+"input":"<!DOCTYPE a SYSTEM'' \u0000",
+"output":[["DOCTYPE", "a", null, "", true]],
+"errors":[
+    { "code": "missing-whitespace-after-doctype-system-keyword", "line": 1, "col": 19 },
+    { "code": "unexpected-character-after-doctype-system-identifier", "col": 22, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 22 }
+]},
+
+{"description":"<!DOCTYPE a SYSTEM'' x\\u0000",
+"input":"<!DOCTYPE a SYSTEM'' x\u0000",
+"output":[["DOCTYPE", "a", null, "", true]],
+"errors":[
+    { "code": "missing-whitespace-after-doctype-system-keyword", "line": 1, "col": 19 },
+    { "code": "unexpected-character-after-doctype-system-identifier", "col": 22, "line": 1 },
+    { "code": "unexpected-null-character", "line": 1, "col": 23 }
+]},
+
 {"description":"<!DOCTYPE a SYSTEM''!",
 "input":"<!DOCTYPE a SYSTEM''!",
 "output":[["DOCTYPE", "a", null, "", true]],
@@ -4217,7 +4323,8 @@
 "input":"<!DOCTYPE a a\u0000",
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors":[
-    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 13 }
+    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 13 },
+    { "code": "unexpected-null-character", "line": 1, "col": 14 }
 ]},
 
 {"description":"<!DOCTYPE a a\\u0009",
@@ -4920,7 +5027,8 @@
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors":[
     { "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 },
-    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 12 }
+    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 12 },
+    { "code": "unexpected-null-character", "line": 1, "col": 12 }
 ]},
 
 {"description":"<!DOCTYPEa \\u0008",
@@ -5130,7 +5238,8 @@
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors":[
     { "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 },
-    { "code": "missing-quote-before-doctype-public-identifier", "line": 1, "col": 18 }
+    { "code": "missing-quote-before-doctype-public-identifier", "line": 1, "col": 18 },
+    { "code": "unexpected-null-character", "line": 1, "col": 18 }
 ]},
 
 {"description":"<!DOCTYPEa PUBLIC\\u0008",
@@ -5632,7 +5741,8 @@
 "errors":[
     { "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 },
     { "code": "missing-whitespace-after-doctype-public-keyword", "line": 1, "col": 18 },
-    { "code": "missing-quote-before-doctype-system-identifier", "line": 1, "col": 20 }
+    { "code": "missing-quote-before-doctype-system-identifier", "line": 1, "col": 20 },
+    { "code": "unexpected-null-character", "line": 1, "col": 20 }
 ]},
 
 {"description":"<!DOCTYPEa PUBLIC''\\u0008",
@@ -6341,7 +6451,8 @@
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors":[
     { "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 },
-    { "code": "missing-quote-before-doctype-system-identifier", "line": 1, "col": 18 }
+    { "code": "missing-quote-before-doctype-system-identifier", "line": 1, "col": 18 },
+    { "code": "unexpected-null-character", "line": 1, "col": 18 }
 ]},
 
 {"description":"<!DOCTYPEa SYSTEM\\u0008",
@@ -6842,7 +6953,8 @@
 "errors":[
     { "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 },
     { "code": "missing-whitespace-after-doctype-system-keyword", "line": 1, "col": 18 },
-    { "code": "unexpected-character-after-doctype-system-identifier", "line": 1, "col": 20 }
+    { "code": "unexpected-character-after-doctype-system-identifier", "line": 1, "col": 20 },
+    { "code": "unexpected-null-character", "line": 1, "col": 20 }
 ]},
 
 {"description":"<!DOCTYPEa SYSTEM''\\u0008",
@@ -7555,7 +7667,8 @@
 "output":[["DOCTYPE", "a", null, null, false]],
 "errors":[
     { "code": "missing-whitespace-before-doctype-name", "line": 1, "col": 10 },
-    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 12 }
+    { "code": "invalid-character-sequence-after-doctype-name", "line": 1, "col": 12 },
+    { "code": "unexpected-null-character", "line": 1, "col": 13 }
 ]},
 
 {"description":"<!DOCTYPEa a\\u0009",
@@ -8195,7 +8308,8 @@
 "input":"</\u0000",
 "output":[["Comment", "\uFFFD"]],
 "errors":[
-    { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 3 }
+    { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 3 },
+    { "code": "unexpected-null-character", "line": 1, "col": 3 }
 ]},
 
 {"description":"</\\u0009",
@@ -8234,6 +8348,14 @@
     { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 3 }
 ]},
 
+{"description":"</ \\u0000",
+"input":"</ \u0000",
+"output":[["Comment", " \uFFFD"]],
+"errors":[
+    { "code": "invalid-first-character-of-tag-name", "line": 1, "col": 3 },
+    { "code": "unexpected-null-character", "line": 1, "col": 4 }
+]},
+
 {"description":"</!",
 "input":"</!",
 "output":[["Comment", "!"]],
@@ -8446,7 +8568,8 @@
 "input":"<?\u0000",
 "output":[["Comment", "?\uFFFD"]],
 "errors":[
-    { "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
+    { "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 },
+    { "code": "unexpected-null-character", "line": 1, "col": 3 }
 ]},
 
 {"description":"<?\\u0009",
@@ -8485,6 +8608,14 @@
     { "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
 ]},
 
+{"description":"<? \\u0000",
+"input":"<? \u0000",
+"output":[["Comment", "? \uFFFD"]],
+"errors":[
+    { "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 },
+    { "code": "unexpected-null-character", "line": 1, "col": 4 }
+]},
+
 {"description":"<?!",
 "input":"<?!",
 "output":[["Comment", "?!"]],
diff --git a/tokenizer/test4.test b/tokenizer/test4.test
index dd247d54..77706b72 100644
--- a/tokenizer/test4.test
+++ b/tokenizer/test4.test
@@ -364,7 +364,8 @@
 "input":"<!doc\u0000",
 "output":[["Comment", "doc\uFFFD"]],
 "errors":[
-    { "code": "incorrectly-opened-comment", "line": 1, "col": 3 }
+    { "code": "incorrectly-opened-comment", "line": 1, "col": 3 },
+    { "code": "unexpected-null-character", "line": 1, "col": 6 }
 ]},
 
 {"description":"U+0080 in lookahead region",
diff --git a/tree-construction/plain-text-unsafe.dat b/tree-construction/plain-text-unsafe.dat
index d2050e36c49a407b0db4f82e66945d5720556def..dfb5cb6329222da4ca6f465bf761f324943ae4c6 100644
GIT binary patch
delta 43
rcmX@@vBq;lG!Ki0p_S$2jeI_nyLnc@I5HrH;pByazME5dpNRngKP(O#

delta 24
ecmZ4EdD>$`G|yxwc7w^Qc~*hw&Bu73i2(q7H3_Z&


From be9fb2431d679e4e0c4a9db5f350cf0686a729b1 Mon Sep 17 00:00:00 2001
From: Henri Sivonen <hsivonen@hsivonen.fi>
Date: Tue, 23 Jan 2018 17:33:37 +0200
Subject: [PATCH 06/68] Move `#script-off` to the usual place relative to the
 other sections of a test

---
 tree-construction/tests18.dat | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tree-construction/tests18.dat b/tree-construction/tests18.dat
index 3ce39fc6..05363b39 100644
--- a/tree-construction/tests18.dat
+++ b/tree-construction/tests18.dat
@@ -51,11 +51,11 @@
 
 #data
 <!doctype html><html><noscript><plaintext></plaintext>
-#script-off
 #errors
 42: Bad start tag in “plaintext” in “head”.
 54: End of file seen and there were open elements.
 42: Unclosed element “plaintext”.
+#script-off
 #document
 | <!DOCTYPE html>
 | <html>

From a1dcff0af57a0842d8ddc8c12d81181388a203f6 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 21 Aug 2018 14:15:02 -0400
Subject: [PATCH 07/68] Move #script-off to the right location.

---
 tree-construction/noscript01.dat | 36 ++++++++++++++++----------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/tree-construction/noscript01.dat b/tree-construction/noscript01.dat
index f11eca54..ec3496ce 100644
--- a/tree-construction/noscript01.dat
+++ b/tree-construction/noscript01.dat
@@ -1,9 +1,9 @@
 #data
 <head><noscript><!doctype html><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 31 Unexpected DOCTYPE. Ignored.
+#script-off
 #document
 | <html>
 |   <head>
@@ -13,10 +13,10 @@ Line: 1 Col: 31 Unexpected DOCTYPE. Ignored.
 
 #data
 <head><noscript><html class="foo"><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 34 html needs to be the first start tag.
+#script-off
 #document
 | <html>
 |   class="foo"
@@ -27,9 +27,9 @@ Line: 1 Col: 34 html needs to be the first start tag.
 
 #data
 <head><noscript></noscript>
-#script-off
 #errors
 (1,6): expected-doctype-but-got-tag
+#script-off
 #document
 | <html>
 |   <head>
@@ -38,9 +38,9 @@ Line: 1 Col: 34 html needs to be the first start tag.
 
 #data
 <head><noscript>   </noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -50,9 +50,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><!--foo--></noscript>
-#script-off
 #errors
 (1,6): expected-doctype-but-got-tag
+#script-off
 #document
 | <html>
 |   <head>
@@ -62,9 +62,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><basefont><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -75,9 +75,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><bgsound><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -88,9 +88,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><link><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -101,9 +101,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><meta><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -114,9 +114,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><noframes>XXX</noscript></noframes></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -127,9 +127,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript><style>XXX</style></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
 #document
 | <html>
 |   <head>
@@ -140,12 +140,12 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 
 #data
 <head><noscript></br><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 21 Element br not allowed in a inhead-noscript context
 Line: 1 Col: 21 Unexpected end tag (br). Treated as br element.
 Line: 1 Col: 42 Unexpected end tag (noscript). Ignored.
+#script-off
 #document
 | <html>
 |   <head>
@@ -156,10 +156,10 @@ Line: 1 Col: 42 Unexpected end tag (noscript). Ignored.
 
 #data
 <head><noscript><head class="foo"><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 34 Unexpected start tag (head).
+#script-off
 #document
 | <html>
 |   <head>
@@ -169,10 +169,10 @@ Line: 1 Col: 34 Unexpected start tag (head).
 
 #data
 <head><noscript><noscript class="foo"><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 34 Unexpected start tag (noscript).
+#script-off
 #document
 | <html>
 |   <head>
@@ -182,10 +182,10 @@ Line: 1 Col: 34 Unexpected start tag (noscript).
 
 #data
 <head><noscript></p><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 20 Unexpected end tag (p). Ignored.
+#script-off
 #document
 | <html>
 |   <head>
@@ -195,11 +195,11 @@ Line: 1 Col: 20 Unexpected end tag (p). Ignored.
 
 #data
 <head><noscript><p><!--foo--></noscript>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 19 Element p not allowed in a inhead-noscript context
 Line: 1 Col: 40 Unexpected end tag (noscript). Ignored.
+#script-off
 #document
 | <html>
 |   <head>
@@ -210,12 +210,12 @@ Line: 1 Col: 40 Unexpected end tag (noscript). Ignored.
 
 #data
 <head><noscript>XXX<!--foo--></noscript></head>
-#script-off
 #errors
 Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
 Line: 1 Col: 19 Unexpected non-space character. Expected inhead-noscript content
 Line: 1 Col: 30 Unexpected end tag (noscript). Ignored.
 Line: 1 Col: 37 Unexpected end tag (head). Ignored.
+#script-off
 #document
 | <html>
 |   <head>
@@ -226,10 +226,10 @@ Line: 1 Col: 37 Unexpected end tag (head). Ignored.
 
 #data
 <head><noscript>
-#script-off
 #errors
 (1,6): expected-doctype-but-got-tag
 (1,6): eof-in-head-noscript
+#script-off
 #document
 | <html>
 |   <head>

From 4fffa16ca4c5643cfd438729b3e8c13714721819 Mon Sep 17 00:00:00 2001
From: Henri Sivonen <hsivonen@hsivonen.fi>
Date: Thu, 4 Apr 2019 15:05:20 +0300
Subject: [PATCH 08/68] Add tests for line breaks in the comment end bang state

---
 tokenizer/test3.test             | 28 ++++++++++++++++++++++++++++
 tree-construction/README.md      | 10 +++++++---
 tree-construction/comments01.dat | 28 ++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index cb04d037..2fd93049 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -954,6 +954,34 @@
     { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
 ]},
 
+{"description":"<!----! >",
+"input":"<!----! >",
+"output":[["Comment", "--! >"]],
+"errors":[
+    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+]},
+
+{"description":"<!----!LF>",
+"input":"<!----!\n>",
+"output":[["Comment", "--!\n>"]],
+"errors":[
+    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+]},
+
+{"description":"<!----!CR>",
+"input":"<!----!\r>",
+"output":[["Comment", "--!\n>"]],
+"errors":[
+    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+]},
+
+{"description":"<!----!CRLF>",
+"input":"<!----!\r\n>",
+"output":[["Comment", "--!\n>"]],
+"errors":[
+    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+]},
+
 {"description":"<!----!a",
 "input":"<!----!a",
 "output":[["Comment", "--!a"]],
diff --git a/tree-construction/README.md b/tree-construction/README.md
index 18a85ecf..4737a3a8 100644
--- a/tree-construction/README.md
+++ b/tree-construction/README.md
@@ -21,9 +21,13 @@ final newline (on the last line) removed.
 Then there must be a line that says "\#errors". It must be followed by
 one line per parse error that a conformant checker would return. It
 doesn't matter what those lines are, although they can't be
-"\#document-fragment", "\#document", "\#script-off", "\#script-on", or
-empty, the only thing that matters is that there be the right number
-of parse errors.
+"\#new-errors", "\#document-fragment", "\#document", "\#script-off",
+"\#script-on", or empty, the only thing that matters is that there be
+the right number of parse errors.
+
+Then there \*may\* be a line that says "\#new-errors", which works like
+the "\#errors" section adding more errors to the expected number of
+errors.
 
 Then there \*may\* be a line that says "\#document-fragment", which must
 be followed by a newline (LF), followed by a string of characters that
diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat
index 15d52e6b..f632de03 100644
--- a/tree-construction/comments01.dat
+++ b/tree-construction/comments01.dat
@@ -25,6 +25,34 @@ FOO<!-- BAR --!>BAZ
 |     <!--  BAR  -->
 |     "BAZ"
 
+#data
+FOO<!-- BAR --! >BAZ
+#errors
+(1,3): expected-doctype-but-got-chars
+#new-errors
+(1:20) incorrectly-closed-comment
+#document
+| <html>
+|   <head>
+|   <body>
+|     "FOO"
+|     <!--  BAR --! >BAZ -->
+
+#data
+FOO<!-- BAR --!
+>BAZ
+#errors
+(1,3): expected-doctype-but-got-chars
+#new-errors
+(1:20) incorrectly-closed-comment
+#document
+| <html>
+|   <head>
+|   <body>
+|     "FOO"
+|     <!--  BAR --!
+>BAZ -->
+
 #data
 FOO<!-- BAR --   >BAZ
 #errors

From 0f4dee51f8b1ac9e62d31d1fbdae95de5489c7af Mon Sep 17 00:00:00 2001
From: Henri Sivonen <hsivonen@hsivonen.fi>
Date: Thu, 4 Apr 2019 15:11:26 +0300
Subject: [PATCH 09/68] Adjust the error ids for space or line break in comment
 end bang state

---
 tokenizer/test3.test             | 8 ++++----
 tree-construction/comments01.dat | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index 2fd93049..721f21de 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -958,28 +958,28 @@
 "input":"<!----! >",
 "output":[["Comment", "--! >"]],
 "errors":[
-    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+    { "code": "eof-in-comment", "line": 1, "col": 9 }
 ]},
 
 {"description":"<!----!LF>",
 "input":"<!----!\n>",
 "output":[["Comment", "--!\n>"]],
 "errors":[
-    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+    { "code": "eof-in-comment", "line": 1, "col": 9 }
 ]},
 
 {"description":"<!----!CR>",
 "input":"<!----!\r>",
 "output":[["Comment", "--!\n>"]],
 "errors":[
-    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+    { "code": "eof-in-comment", "line": 1, "col": 9 }
 ]},
 
 {"description":"<!----!CRLF>",
 "input":"<!----!\r\n>",
 "output":[["Comment", "--!\n>"]],
 "errors":[
-    { "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
+    { "code": "eof-in-comment", "line": 1, "col": 9 }
 ]},
 
 {"description":"<!----!a",
diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat
index f632de03..fa79c2b1 100644
--- a/tree-construction/comments01.dat
+++ b/tree-construction/comments01.dat
@@ -30,7 +30,7 @@ FOO<!-- BAR --! >BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
 #new-errors
-(1:20) incorrectly-closed-comment
+(1:20) eof-in-comment
 #document
 | <html>
 |   <head>
@@ -44,7 +44,7 @@ FOO<!-- BAR --!
 #errors
 (1,3): expected-doctype-but-got-chars
 #new-errors
-(1:20) incorrectly-closed-comment
+(1:20) eof-in-comment
 #document
 | <html>
 |   <head>

From a439a5b65e8213154e71644017122f435d815cce Mon Sep 17 00:00:00 2001
From: Samuel May <ag.eitilt@gmail.com>
Date: Thu, 16 May 2019 01:13:46 -0700
Subject: [PATCH 10/68] Name unnamed tests for ease of discovery in test
 harnesses

---
 tokenizer/test3.test | 2 +-
 tokenizer/test4.test | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index 721f21de..d1b323a5 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -1,6 +1,6 @@
 {"tests": [
 
-{"description":"",
+{"description":"[empty]",
 "input":"",
 "output":[]},
 
diff --git a/tokenizer/test4.test b/tokenizer/test4.test
index 77706b72..8963c747 100644
--- a/tokenizer/test4.test
+++ b/tokenizer/test4.test
@@ -8,7 +8,7 @@
     { "code": "unexpected-character-in-attribute-name", "line": 1, "col": 7 }
 ]},
 
-{"description":"",
+{"description":"< in unquoted attribute value",
 "input":"<z x=<>",
 "output":[["StartTag", "z", {"x": "<"}]],
 "errors":[

From 1ddd636281ec97705faf74a021071f483ce3f941 Mon Sep 17 00:00:00 2001
From: Samuel May <ag.eitilt@gmail.com>
Date: Thu, 16 May 2019 01:15:31 -0700
Subject: [PATCH 11/68] Improve coverage of the Tokenizer tests

Slight overkill in places, but I figured it's better to err on the side
of too many tests than too little.
---
 tokenizer/contentModelFlags.test |   6 +
 tokenizer/domjs.test             | 150 ++++++++-
 tokenizer/entities.test          |  38 ++-
 tokenizer/test1.test             |  96 +++++-
 tokenizer/test2.test             |   4 +
 tokenizer/test3.test             | 504 ++++++++++++++++++++++++++++++-
 6 files changed, 778 insertions(+), 20 deletions(-)

diff --git a/tokenizer/contentModelFlags.test b/tokenizer/contentModelFlags.test
index 5197b68e..9cf7c8bd 100644
--- a/tokenizer/contentModelFlags.test
+++ b/tokenizer/contentModelFlags.test
@@ -6,6 +6,12 @@
 "input":"<head>&body;",
 "output":[["Character", "<head>&body;"]]},
 
+{"description":"PLAINTEXT with seeming close tag",
+"initialStates":["PLAINTEXT state"],
+"lastStartTag":"plaintext",
+"input":"</plaintext>&body;",
+"output":[["Character", "</plaintext>&body;"]]},
+
 {"description":"End tag closing RCDATA or RAWTEXT",
 "initialStates":["RCDATA state", "RAWTEXT state"],
 "lastStartTag":"xmp",
diff --git a/tokenizer/domjs.test b/tokenizer/domjs.test
index b17a5df5..1373b27f 100644
--- a/tokenizer/domjs.test
+++ b/tokenizer/domjs.test
@@ -25,7 +25,7 @@
             ]
         },
         {
-            "description":"NUL in RCDATA, RAWTEXT, PLAINTEXT and Script data",
+            "description":"Raw NUL replacement",
             "doubleEscaped":true,
             "initialStates":["RCDATA state", "RAWTEXT state", "PLAINTEXT state", "Script data state"],
             "input":"\\u0000",
@@ -34,6 +34,13 @@
                 { "code": "unexpected-null-character", "line": 1, "col": 1 }
             ]
         },
+        {
+            "description":"NUL in CDATA section",
+            "doubleEscaped":true,
+            "initialStates":["CDATA section state"],
+            "input":"\\u0000]]>",
+            "output":[["Character", "\\u0000"]]
+        },
         {
            "description":"NUL in script HTML comment",
            "doubleEscaped":true,
@@ -112,20 +119,95 @@
                { "code": "eof-in-script-html-comment-like-text", "line": 1, "col": 13 }
            ]
         },
+        {
+            "description":"Dash in script HTML comment",
+            "initialStates":["Script data state"],
+            "input":"<!-- - -->",
+            "output":[["Character", "<!-- - -->"]]
+        },
+        {
+            "description":"Dash less-than in script HTML comment",
+            "initialStates":["Script data state"],
+            "input":"<!-- -< -->",
+            "output":[["Character", "<!-- -< -->"]]
+        },
+        {
+            "description":"Dash at end of script HTML comment",
+            "initialStates":["Script data state"],
+            "input":"<!--test--->",
+            "output":[["Character", "<!--test--->"]]
+        },
+        {
+            "description":"</script> in script HTML comment",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!-- </script> --></script>",
+            "output":[["Character", "<!-- "], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
+        },
+        {
+            "description":"</script> in script HTML comment - double escaped",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!-- <script></script> --></script>",
+            "output":[["Character", "<!-- <script></script> -->"], ["EndTag", "script"]]
+        },
+        {
+            "description":"</script> in script HTML comment - double escaped with nested <script>",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!-- <script><script></script></script> --></script>",
+            "output":[["Character", "<!-- <script><script></script>"], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
+        },
+        {
+            "description":"</script> in script HTML comment - double escaped with abrupt end",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!-- <script>--></script> --></script>",
+            "output":[["Character", "<!-- <script>-->"], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
+        },
+        {
+            "description":"Incomplete start tag in script HTML comment double escaped",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!--<scrip></script>-->",
+            "output":[["Character", "<!--<scrip>"], ["EndTag", "script"], ["Character", "-->"]]
+        },
+        {
+            "description":"Unclosed start tag in script HTML comment double escaped",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!--<script</script>-->",
+            "output":[["Character", "<!--<script"], ["EndTag", "script"], ["Character", "-->"]]
+        },
+        {
+            "description":"Incomplete end tag in script HTML comment double escaped",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!--<script></scrip>-->",
+            "output":[["Character", "<!--<script></scrip>-->"]]
+        },
+        {
+            "description":"Unclosed end tag in script HTML comment double escaped",
+            "initialStates":["Script data state"],
+            "lastStartTag":"script",
+            "input":"<!--<script></script-->",
+            "output":[["Character", "<!--<script></script-->"]]
+        },
         {
             "description":"leading U+FEFF must pass through",
+            "initialStates":["Data state", "RCDATA state", "RAWTEXT state", "Script data state"],
             "doubleEscaped":true,
             "input":"\\uFEFFfoo\\uFEFFbar",
             "output":[["Character", "\\uFEFFfoo\\uFEFFbar"]]
         },
         {
-            "description":"Non BMP-charref in in RCDATA",
+            "description":"Non BMP-charref in RCDATA",
             "initialStates":["RCDATA state"],
             "input":"&NotEqualTilde;",
             "output":[["Character", "\u2242\u0338"]]
         },
         {
-            "description":"Bad charref in in RCDATA",
+            "description":"Bad charref in RCDATA",
             "initialStates":["RCDATA state"],
             "input":"&NotEqualTild;",
             "output":[["Character", "&NotEqualTild;"]],
@@ -134,36 +216,36 @@
             ]
         },
         {
-            "description":"lowercase endtags in RCDATA and RAWTEXT",
-            "initialStates":["RCDATA state", "RAWTEXT state"],
+            "description":"lowercase endtags",
+            "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
             "lastStartTag":"xmp",
             "input":"</XMP>",
             "output":[["EndTag","xmp"]]
         },
         {
-            "description":"bad endtag in RCDATA and RAWTEXT",
-            "initialStates":["RCDATA state", "RAWTEXT state"],
+            "description":"bad endtag (space before name)",
+            "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
             "lastStartTag":"xmp",
             "input":"</ XMP>",
             "output":[["Character","</ XMP>"]]
         },
         {
-            "description":"bad endtag in RCDATA and RAWTEXT",
-            "initialStates":["RCDATA state", "RAWTEXT state"],
+            "description":"bad endtag (not matching last start tag)",
+            "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
             "lastStartTag":"xmp",
             "input":"</xm>",
             "output":[["Character","</xm>"]]
         },
         {
-            "description":"bad endtag in RCDATA and RAWTEXT",
-            "initialStates":["RCDATA state", "RAWTEXT state"],
+            "description":"bad endtag (without close bracket)",
+            "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
             "lastStartTag":"xmp",
             "input":"</xm ",
             "output":[["Character","</xm "]]
         },
         {
-            "description":"bad endtag in RCDATA and RAWTEXT",
-            "initialStates":["RCDATA state", "RAWTEXT state"],
+            "description":"bad endtag (trailing solidus)",
+            "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
             "lastStartTag":"xmp",
             "input":"</xm/",
             "output":[["Character","</xm/"]]
@@ -200,11 +282,47 @@
         },
         {
             "description":"CDATA content",
-            "input":"foo&bar",
+            "input":"foo&#32;]]>",
+            "initialStates":["CDATA section state"],
+            "output":[["Character", "foo&#32;"]]
+        },
+        {
+            "description":"CDATA followed by HTML content",
+            "input":"foo&#32;]]>&#32;",
+            "initialStates":["CDATA section state"],
+            "output":[["Character", "foo&#32; "]]
+        },
+        {
+            "description":"CDATA with extra bracket",
+            "input":"foo]]]>",
+            "initialStates":["CDATA section state"],
+            "output":[["Character", "foo]"]]
+        },
+        {
+            "description":"CDATA without end marker",
+            "input":"foo",
+            "initialStates":["CDATA section state"],
+            "output":[["Character", "foo"]],
+            "errors":[
+                { "code": "eof-in-cdata", "line": 1, "col": 4 }
+            ]
+        },
+        {
+            "description":"CDATA with single bracket ending",
+            "input":"foo]",
+            "initialStates":["CDATA section state"],
+            "output":[["Character", "foo]"]],
+            "errors":[
+                { "code": "eof-in-cdata", "line": 1, "col": 5 }
+            ]
+        },
+        {
+            "description":"CDATA with two brackets ending",
+            "input":"foo]]",
             "initialStates":["CDATA section state"],
-            "output":[["Character", "foo&bar"]],
+            "output":[["Character", "foo]]"]],
             "errors":[
-                { "code": "eof-in-cdata", "line": 1, "col": 8 }
+                { "code": "eof-in-cdata", "line": 1, "col": 6 }
             ]
         }
 
diff --git a/tokenizer/entities.test b/tokenizer/entities.test
index 7c514563..a6469cd0 100644
--- a/tokenizer/entities.test
+++ b/tokenizer/entities.test
@@ -1,13 +1,47 @@
 {"tests": [
 
-{"description": "Undefined named entity in attribute value ending in semicolon and whose name starts with a known entity name.",
+{"description": "Undefined named entity in a double-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
+"input":"<h a=\"&noti;\">",
+"output": [["StartTag", "h", {"a": "&noti;"}]]},
+
+{"description": "Entity name requiring semicolon instead followed by the equals sign in a double-quoted attribute value.",
+"input":"<h a=\"&lang=\">",
+"output": [["StartTag", "h", {"a": "&lang="}]]},
+
+{"description": "Valid entity name followed by the equals sign in a double-quoted attribute value.",
+"input":"<h a=\"&not=\">",
+"output": [["StartTag", "h", {"a": "&not="}]]},
+
+{"description": "Undefined named entity in a single-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
 "input":"<h a='&noti;'>",
 "output": [["StartTag", "h", {"a": "&noti;"}]]},
 
-{"description": "Entity name followed by the equals sign in an attribute value.",
+{"description": "Entity name requiring semicolon instead followed by the equals sign in a single-quoted attribute value.",
 "input":"<h a='&lang='>",
 "output": [["StartTag", "h", {"a": "&lang="}]]},
 
+{"description": "Valid entity name followed by the equals sign in a single-quoted attribute value.",
+"input":"<h a='&not='>",
+"output": [["StartTag", "h", {"a": "&not="}]]},
+
+{"description": "Undefined named entity in an unquoted attribute value ending in semicolon and whose name starts with a known entity name.",
+"input":"<h a=&noti;>",
+"output": [["StartTag", "h", {"a": "&noti;"}]]},
+
+{"description": "Entity name requiring semicolon instead followed by the equals sign in an unquoted attribute value.",
+"input":"<h a=&lang=>",
+"output": [["StartTag", "h", {"a": "&lang="}]],
+"errors":[
+    { "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 11 }
+]},
+
+{"description": "Valid entity name followed by the equals sign in an unquoted attribute value.",
+"input":"<h a=&not=>",
+"output": [["StartTag", "h", {"a": "&not="}]],
+"errors":[
+    { "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 10 }
+]},
+
 {"description": "Ambiguous ampersand.",
 "input":"&rrrraannddom;",
 "output": [["Character", "&rrrraannddom;"]],
diff --git a/tokenizer/test1.test b/tokenizer/test1.test
index 8b85050f..cb0eb48a 100644
--- a/tokenizer/test1.test
+++ b/tokenizer/test1.test
@@ -102,6 +102,10 @@
 "input":"<!-- --comment -->",
 "output":[["Comment", " --comment "]]},
 
+{"description":"Comment, central less-than bang",
+"input":"<!--<!-->",
+"output":[["Comment", "<!"]]},
+
 {"description":"Unfinished comment",
 "input":"<!--comment",
 "output":[["Comment", "comment"]],
@@ -109,6 +113,13 @@
     { "code": "eof-in-comment", "line": 1, "col": 12 }
 ]},
 
+{"description":"Unfinished comment after start of nested comment",
+"input":"<!-- <!--",
+"output":[["Comment", " <!"]],
+"errors":[
+    { "code": "eof-in-comment", "line": 1, "col": 10 }
+]},
+
 {"description":"Start of a comment",
 "input":"<!-",
 "output":[["Comment", "-"]],
@@ -123,7 +134,6 @@
     { "code": "abrupt-closing-of-empty-comment", "line": 1, "col": 5 }
 ]},
 
-
 {"description":"Short comment two",
 "input":"<!--->",
 "output":[["Comment", ""]],
@@ -135,6 +145,18 @@
  "input":"<!---->",
  "output":[["Comment", ""]]},
 
+{"description":"< in comment",
+"input":"<!-- <test-->",
+"output":[["Comment", " <test"]]},
+
+{"description":"<! in comment",
+"input":"<!-- <!test-->",
+"output":[["Comment", " <!test"]]},
+
+{"description":"<!- in comment",
+"input":"<!-- <!-test-->",
+"output":[["Comment", " <!-test"]]},
+
 {"description":"Nested comment",
 "input":"<!-- <!--test-->",
 "output":[["Comment", " <!--test"]],
@@ -142,6 +164,78 @@
     { "code": "nested-comment", "line": 1, "col": 10 }
 ]},
 
+{"description":"Nested comment with extra <",
+"input":"<!-- <<!--test-->",
+"output":[["Comment", " <<!--test"]],
+"errors":[
+    { "code": "nested-comment", "line": 1, "col": 11 }
+]},
+
+{"description":"< in script data",
+"initialStates":["Script data state"],
+"input":"<test-->",
+"output":[["Character", "<test-->"]]},
+
+{"description":"<! in script data",
+"initialStates":["Script data state"],
+"input":"<!test-->",
+"output":[["Character", "<!test-->"]]},
+
+{"description":"<!- in script data",
+"initialStates":["Script data state"],
+"input":"<!-test-->",
+"output":[["Character", "<!-test-->"]]},
+
+{"description":"Escaped script data",
+"initialStates":["Script data state"],
+"input":"<!--test-->",
+"output":[["Character", "<!--test-->"]]},
+
+{"description":"< in script HTML comment",
+"initialStates":["Script data state"],
+"input":"<!-- < test -->",
+"output":[["Character", "<!-- < test -->"]]},
+
+{"description":"</ in script HTML comment",
+"initialStates":["Script data state"],
+"input":"<!-- </ test -->",
+"output":[["Character", "<!-- </ test -->"]]},
+
+{"description":"Start tag in script HTML comment",
+"initialStates":["Script data state"],
+"input":"<!-- <test> -->",
+"output":[["Character", "<!-- <test> -->"]]},
+
+{"description":"End tag in script HTML comment",
+"initialStates":["Script data state"],
+"input":"<!-- </test> -->",
+"output":[["Character", "<!-- </test> -->"]]},
+
+{"description":"- in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"<!--<script>-</script>-->",
+"output":[["Character", "<!--<script>-</script>-->"]]},
+
+{"description":"-- in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"<!--<script>--</script>-->",
+"output":[["Character", "<!--<script>--</script>-->"]]},
+
+{"description":"--- in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"<!--<script>---</script>-->",
+"output":[["Character", "<!--<script>---</script>-->"]]},
+
+{"description":"- spaced in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"<!--<script> - </script>-->",
+"output":[["Character", "<!--<script> - </script>-->"]]},
+
+{"description":"-- spaced in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"<!--<script> -- </script>-->",
+"output":[["Character", "<!--<script> -- </script>-->"]]},
+
 {"description":"Ampersand EOF",
 "input":"&",
 "output":[["Character", "&"]]},
diff --git a/tokenizer/test2.test b/tokenizer/test2.test
index 521694ca..f80f27d1 100644
--- a/tokenizer/test2.test
+++ b/tokenizer/test2.test
@@ -50,6 +50,10 @@
 "input":"<!DOCTYPE html SYSTEM \"-//W3C//DTD HTML Transitional 4.01//EN\">",
 "output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
 
+{"description":"DOCTYPE with single-quoted systemId",
+"input":"<!DOCTYPE html SYSTEM '-//W3C//DTD HTML Transitional 4.01//EN'>",
+"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
+
 {"description":"DOCTYPE with publicId and systemId",
 "input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\" \"-//W3C//DTD HTML Transitional 4.01//EN\">",
 "output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index d1b323a5..814482c4 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -1,84 +1,451 @@
 {"tests": [
 
 {"description":"[empty]",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"",
 "output":[]},
 
+{"description":"[empty]",
+"initialStates":["CDATA section state"],
+"input":"",
+"output":[],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"\\u0009",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"\u0009",
 "output":[["Character", "\u0009"]]},
 
+{"description":"\\u0009",
+"initialStates":["CDATA section state"],
+"input":"\u0009",
+"output":[["Character", "\u0009"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"\\u000A",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"\u000A",
 "output":[["Character", "\u000A"]]},
 
+{"description":"\\u000A",
+"initialStates":["CDATA section state"],
+"input":"\u000A",
+"output":[["Character", "\u000A"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"\\u000B",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"\u000B",
 "output":[["Character", "\u000B"]],
 "errors":[
     { "code": "control-character-in-input-stream", "line": 1, "col": 1 }
 ]},
 
+{"description":"\\u000B",
+"initialStates":["CDATA section state"],
+"input":"\u000B",
+"output":[["Character", "\u000B"]],
+"errors":[
+    { "code": "control-character-in-input-stream", "line": 1, "col": 1 },
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"\\u000C",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"\u000C",
 "output":[["Character", "\u000C"]]},
 
+{"description":"\\u000C",
+"initialStates":["CDATA section state"],
+"input":"\u000C",
+"output":[["Character", "\u000C"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":" ",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":" ",
 "output":[["Character", " "]]},
 
+{"description":" ",
+"initialStates":["CDATA section state"],
+"input":" ",
+"output":[["Character", " "]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"!",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"!",
 "output":[["Character", "!"]]},
 
+{"description":"!",
+"initialStates":["CDATA section state"],
+"input":"!",
+"output":[["Character", "!"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"\"",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"\"",
 "output":[["Character", "\""]]},
 
+{"description":"\"",
+"initialStates":["CDATA section state"],
+"input":"\"",
+"output":[["Character", "\""]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"%",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"%",
 "output":[["Character", "%"]]},
 
+{"description":"%",
+"initialStates":["CDATA section state"],
+"input":"%",
+"output":[["Character", "%"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"&",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"&",
 "output":[["Character", "&"]]},
 
+{"description":"&",
+"initialStates":["CDATA section state"],
+"input":"&",
+"output":[["Character", "&"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"'",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"'",
 "output":[["Character", "'"]]},
 
+{"description":"'",
+"initialStates":["CDATA section state"],
+"input":"'",
+"output":[["Character", "'"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":",",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":",",
 "output":[["Character", ","]]},
 
+{"description":",",
+"initialStates":["CDATA section state"],
+"input":",",
+"output":[["Character", ","]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"-",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"-",
 "output":[["Character", "-"]]},
 
+{"description":"-",
+"initialStates":["CDATA section state"],
+"input":"-",
+"output":[["Character", "-"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":".",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":".",
 "output":[["Character", "."]]},
 
+{"description":".",
+"initialStates":["CDATA section state"],
+"input":".",
+"output":[["Character", "."]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"/",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"/",
 "output":[["Character", "/"]]},
 
+{"description":"/",
+"initialStates":["CDATA section state"],
+"input":"/",
+"output":[["Character", "/"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"0",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"0",
 "output":[["Character", "0"]]},
 
+{"description":"0",
+"initialStates":["CDATA section state"],
+"input":"0",
+"output":[["Character", "0"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"1",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"1",
 "output":[["Character", "1"]]},
 
+{"description":"1",
+"initialStates":["CDATA section state"],
+"input":"1",
+"output":[["Character", "1"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"9",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"9",
 "output":[["Character", "9"]]},
 
+{"description":"9",
+"initialStates":["CDATA section state"],
+"input":"9",
+"output":[["Character", "9"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":";",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":";",
 "output":[["Character", ";"]]},
 
+{"description":";",
+"initialStates":["CDATA section state"],
+"input":";",
+"output":[["Character", ";"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
+{"description":";=",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";=",
+"output":[["Character", ";="]]},
+
+{"description":";=",
+"initialStates":["CDATA section state"],
+"input":";=",
+"output":[["Character", ";="]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";>",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";>",
+"output":[["Character", ";>"]]},
+
+{"description":";>",
+"initialStates":["CDATA section state"],
+"input":";>",
+"output":[["Character", ";>"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";?",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";?",
+"output":[["Character", ";?"]]},
+
+{"description":";?",
+"initialStates":["CDATA section state"],
+"input":";?",
+"output":[["Character", ";?"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";@",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";@",
+"output":[["Character", ";@"]]},
+
+{"description":";@",
+"initialStates":["CDATA section state"],
+"input":";@",
+"output":[["Character", ";@"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";A",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";A",
+"output":[["Character", ";A"]]},
+
+{"description":";A",
+"initialStates":["CDATA section state"],
+"input":";A",
+"output":[["Character", ";A"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";B",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";B",
+"output":[["Character", ";B"]]},
+
+{"description":";B",
+"initialStates":["CDATA section state"],
+"input":";B",
+"output":[["Character", ";B"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";Y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";Y",
+"output":[["Character", ";Y"]]},
+
+{"description":";Y",
+"initialStates":["CDATA section state"],
+"input":";Y",
+"output":[["Character", ";Y"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";Z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";Z",
+"output":[["Character", ";Z"]]},
+
+{"description":";Z",
+"initialStates":["CDATA section state"],
+"input":";Z",
+"output":[["Character", ";Z"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";`",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";`",
+"output":[["Character", ";`"]]},
+
+{"description":";`",
+"initialStates":["CDATA section state"],
+"input":";`",
+"output":[["Character", ";`"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";a",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";a",
+"output":[["Character", ";a"]]},
+
+{"description":";a",
+"initialStates":["CDATA section state"],
+"input":";a",
+"output":[["Character", ";a"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";b",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";b",
+"output":[["Character", ";b"]]},
+
+{"description":";b",
+"initialStates":["CDATA section state"],
+"input":";b",
+"output":[["Character", ";b"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";y",
+"output":[["Character", ";y"]]},
+
+{"description":";y",
+"initialStates":["CDATA section state"],
+"input":";y",
+"output":[["Character", ";y"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";z",
+"output":[["Character", ";z"]]},
+
+{"description":";z",
+"initialStates":["CDATA section state"],
+"input":";z",
+"output":[["Character", ";z"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";{",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";{",
+"output":[["Character", ";{"]]},
+
+{"description":";{",
+"initialStates":["CDATA section state"],
+"input":";{",
+"output":[["Character", ";{"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";\\uDBC0\\uDC00",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";\uDBC0\uDC00",
+"output":[["Character", ";\uDBC0\uDC00"]]},
+
+{"description":";\\uDBC0\\uDC00",
+"initialStates":["CDATA section state"],
+"input":";\uDBC0\uDC00",
+"output":[["Character", ";\uDBC0\uDC00"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
 {"description":"<",
 "input":"<",
 "output":[["Character", "<"]],
@@ -10669,63 +11036,198 @@
 ]},
 
 {"description":"=",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"=",
 "output":[["Character", "="]]},
 
+{"description":"=",
+"initialStates":["CDATA section state"],
+"input":"=",
+"output":[["Character", "="]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":">",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":">",
 "output":[["Character", ">"]]},
 
+{"description":">",
+"initialStates":["CDATA section state"],
+"input":">",
+"output":[["Character", ">"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"?",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"?",
 "output":[["Character", "?"]]},
 
+{"description":"?",
+"initialStates":["CDATA section state"],
+"input":"?",
+"output":[["Character", "?"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"@",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"@",
 "output":[["Character", "@"]]},
 
+{"description":"@",
+"initialStates":["CDATA section state"],
+"input":"@",
+"output":[["Character", "@"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"A",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"A",
 "output":[["Character", "A"]]},
 
+{"description":"A",
+"initialStates":["CDATA section state"],
+"input":"A",
+"output":[["Character", "A"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"B",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"B",
 "output":[["Character", "B"]]},
 
+{"description":"B",
+"initialStates":["CDATA section state"],
+"input":"B",
+"output":[["Character", "B"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"Y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"Y",
 "output":[["Character", "Y"]]},
 
+{"description":"Y",
+"initialStates":["CDATA section state"],
+"input":"Y",
+"output":[["Character", "Y"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"Z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"Z",
 "output":[["Character", "Z"]]},
 
+{"description":"Z",
+"initialStates":["CDATA section state"],
+"input":"Z",
+"output":[["Character", "Z"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"`",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"`",
 "output":[["Character", "`"]]},
 
+{"description":"`",
+"initialStates":["CDATA section state"],
+"input":"`",
+"output":[["Character", "`"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"a",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"a",
 "output":[["Character", "a"]]},
 
+{"description":"a",
+"initialStates":["CDATA section state"],
+"input":"a",
+"output":[["Character", "a"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"b",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"b",
 "output":[["Character", "b"]]},
 
+{"description":"b",
+"initialStates":["CDATA section state"],
+"input":"b",
+"output":[["Character", "b"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"y",
 "output":[["Character", "y"]]},
 
+{"description":"y",
+"initialStates":["CDATA section state"],
+"input":"y",
+"output":[["Character", "y"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"z",
 "output":[["Character", "z"]]},
 
+{"description":"z",
+"initialStates":["CDATA section state"],
+"input":"z",
+"output":[["Character", "z"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"{",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"{",
 "output":[["Character", "{"]]},
 
+{"description":"{",
+"initialStates":["CDATA section state"],
+"input":"{",
+"output":[["Character", "{"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
 {"description":"\\uDBC0\\uDC00",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
 "input":"\uDBC0\uDC00",
-"output":[["Character", "\uDBC0\uDC00"]]}
+"output":[["Character", "\uDBC0\uDC00"]]},
+
+{"description":"\\uDBC0\\uDC00",
+"initialStates":["CDATA section state"],
+"input":"\uDBC0\uDC00",
+"output":[["Character", "\uDBC0\uDC00"]],
+"errors":[
+    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]}
 
 ]}

From 71eebd59772d1d39aced0c0582ae9c09acf3ce6e Mon Sep 17 00:00:00 2001
From: Sam Sneddon <me@gsnedders.com>
Date: Tue, 26 May 2020 23:28:15 +0100
Subject: [PATCH 12/68] Add a test for order of comments after </html>

Notably, html5lib-python's lxml treebuilder gets this wrong:
https://github.com/html5lib/html5lib-python/issues/488
---
 tree-construction/webkit01.dat | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tree-construction/webkit01.dat b/tree-construction/webkit01.dat
index b5fafdc7..2127cfe1 100644
--- a/tree-construction/webkit01.dat
+++ b/tree-construction/webkit01.dat
@@ -307,6 +307,20 @@ console.log("FOO<span>BAR</span>BAZ");
 |   <body>
 | <!--  Hi there  -->
 
+#data
+<html><body></body></html><!-- Comment A --><!-- Comment B --><!-- Comment C --><!-- Comment D --><!-- Comment E -->
+#errors
+(1,6): expected-doctype-but-got-start-tag
+#document
+| <html>
+|   <head>
+|   <body>
+| <!--  Comment A  -->
+| <!--  Comment B  -->
+| <!--  Comment C  -->
+| <!--  Comment D  -->
+| <!--  Comment E  -->
+
 #data
 <html><body></body></html>x<!-- Hi there -->
 #errors

From bef9ad1e6ffe8ed6a084be7fbc3ba521eab70844 Mon Sep 17 00:00:00 2001
From: "Michael[tm] Smith" <mike@w3.org>
Date: Thu, 13 Aug 2020 11:23:40 +0900
Subject: [PATCH 13/68] Test SVG fragment parsing w/ td/tr/tbody context
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This change adds a new tree-construction/svg.dat file that’s essentially an
analogue of the existing tree-construction/math.dat file. It contains tests
for fragment parsing of SVG content with td, tr, and tbody/thead/tfoot
context elements.
---
 tree-construction/svg.dat | 81 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)
 create mode 100644 tree-construction/svg.dat

diff --git a/tree-construction/svg.dat b/tree-construction/svg.dat
new file mode 100644
index 00000000..8e9a2bbb
--- /dev/null
+++ b/tree-construction/svg.dat
@@ -0,0 +1,81 @@
+#data
+<svg><tr><td><title><tr>
+#errors
+#document-fragment
+td
+#document
+| <svg svg>
+|   <svg tr>
+|     <svg td>
+|       <svg title>
+
+#data
+<svg><tr><td><title><tr>
+#errors
+#document-fragment
+tr
+#document
+| <svg svg>
+|   <svg tr>
+|     <svg td>
+|       <svg title>
+
+#data
+<svg><thead><title><tbody>
+#errors
+#document-fragment
+thead
+#document
+| <svg svg>
+|   <svg thead>
+|     <svg title>
+
+#data
+<svg><tfoot><title><tbody>
+#errors
+#document-fragment
+tfoot
+#document
+| <svg svg>
+|   <svg tfoot>
+|     <svg title>
+
+#data
+<svg><tbody><title><tfoot>
+#errors
+#document-fragment
+tbody
+#document
+| <svg svg>
+|   <svg tbody>
+|     <svg title>
+
+#data
+<svg><tbody><title></table>
+#errors
+#document-fragment
+tbody
+#document
+| <svg svg>
+|   <svg tbody>
+|     <svg title>
+
+#data
+<svg><thead><title></table>
+#errors
+#document-fragment
+tbody
+#document
+| <svg svg>
+|   <svg thead>
+|     <svg title>
+
+#data
+<svg><tfoot><title></table>
+#errors
+#document-fragment
+tbody
+#document
+| <svg svg>
+|   <svg tfoot>
+|     <svg title>

From 0e9ed8efea41217469502098aa448327868c38a9 Mon Sep 17 00:00:00 2001
From: "Michael[tm] Smith" <mike@w3.org>
Date: Fri, 21 Aug 2020 13:05:17 +0900
Subject: [PATCH 14/68] Test scripted encoding support

This change adds a `scripted` subdirectory in the `encoding` directory,
with tests for which the expected results require a system with
scripting support.

The change removes an existing test from the `encoding/tests1.dat` file,
and moves it to the `encoding/scripted/tests1.dat` file.
---
 encoding/scripted/tests1.dat | 5 +++++
 encoding/tests1.dat          | 6 ------
 2 files changed, 5 insertions(+), 6 deletions(-)
 create mode 100644 encoding/scripted/tests1.dat

diff --git a/encoding/scripted/tests1.dat b/encoding/scripted/tests1.dat
new file mode 100644
index 00000000..04d18bb9
--- /dev/null
+++ b/encoding/scripted/tests1.dat
@@ -0,0 +1,5 @@
+#data
+<!DOCTYPE HTML>
+<script>document.write('<meta charset="ISO-8859-' + '2">')</script>
+#encoding
+iso-8859-2
diff --git a/encoding/tests1.dat b/encoding/tests1.dat
index 77b0e41d..7aa9586d 100644
--- a/encoding/tests1.dat
+++ b/encoding/tests1.dat
@@ -356,12 +356,6 @@ iso-8859-2
 #encoding
 iso-8859-2
 
-#data
-<!DOCTYPE HTML>
-<script>document.write('<meta charset="ISO-8859-' + '2">')</script>
-#encoding
-iso-8859-2
-
 #data
 <!DOCTYPE HTML>
 <script>document.write('<meta charset="iso8859-2">')</script>

From accc80388699156bed78de98d1e885068efd6b1b Mon Sep 17 00:00:00 2001
From: Simon Pieters <zcorpan@gmail.com>
Date: Fri, 26 Feb 2021 13:46:34 +0100
Subject: [PATCH 15/68] Update existing tests per spec change

See https://github.com/whatwg/html/pull/6399
---
 tree-construction/foreign-fragment.dat | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tree-construction/foreign-fragment.dat b/tree-construction/foreign-fragment.dat
index c81ae817..3f7b2063 100644
--- a/tree-construction/foreign-fragment.dat
+++ b/tree-construction/foreign-fragment.dat
@@ -7,7 +7,7 @@
 #document-fragment
 svg path
 #document
-| <svg nobr>
+| <nobr>
 |   "X"
 
 #data
@@ -17,7 +17,7 @@ svg path
 #document-fragment
 svg path
 #document
-| <svg font>
+| <font>
 |   color=""
 | "X"
 
@@ -390,7 +390,7 @@ math mtext
 #document-fragment
 math annotation-xml
 #document
-| <math div>
+| <div>
 
 #data
 <figure></figure>
@@ -407,7 +407,7 @@ math annotation-xml
 #document-fragment
 math math
 #document
-| <math div>
+| <div>
 
 #data
 <figure></figure>
@@ -461,12 +461,11 @@ svg desc
 <div><h1>X</h1></div>
 #errors
 5: HTML start tag “div” in a foreign namespace context.
-9: HTML start tag “h1” in a foreign namespace context.
 #document-fragment
 svg svg
 #document
-| <svg div>
-|   <svg h1>
+| <div>
+|   <h1>
 |     "X"
 
 #data
@@ -476,7 +475,7 @@ svg svg
 #document-fragment
 svg svg
 #document
-| <svg div>
+| <div>
 
 #data
 <div></div>

From 1a26b47a4cafc918a4d85428e6d0c3f5cfdb04cf Mon Sep 17 00:00:00 2001
From: Simon Pieters <zcorpan@gmail.com>
Date: Fri, 26 Feb 2021 14:04:13 +0100
Subject: [PATCH 16/68] Add new tests

These currently pass in Chromium and Webkit, and fail in Gecko
---
 tree-construction/foreign-fragment.dat | 48 ++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tree-construction/foreign-fragment.dat b/tree-construction/foreign-fragment.dat
index 3f7b2063..7aff409c 100644
--- a/tree-construction/foreign-fragment.dat
+++ b/tree-construction/foreign-fragment.dat
@@ -556,3 +556,51 @@ svg desc
 svg desc
 #document
 | "X"
+
+#data
+<svg><p>
+#errors
+8: HTML start tag “p” in a foreign namespace context.
+#document-fragment
+div
+#document
+| <svg svg>
+| <p>
+
+#data
+<p>
+#errors
+3: HTML start tag “p” in a foreign namespace context.
+#document-fragment
+svg svg
+#document
+| <p>
+
+#data
+<body><foo>
+#errors
+6: HTML start tag “body” in a foreign namespace context.
+#document-fragment
+svg svg
+#document
+| <svg foo>
+
+#data
+<p><foo>
+#errors
+3: HTML start tag “p” in a foreign namespace context.
+#document-fragment
+svg svg
+#document
+| <p>
+|   <foo>
+
+#data
+<p></p><foo>
+#errors
+3: HTML start tag “p” in a foreign namespace context.
+#document-fragment
+svg svg
+#document
+| <p>
+| <svg foo>

From 9b4a29c943b3c905e46b26569bae16de8b373516 Mon Sep 17 00:00:00 2001
From: Simon Pieters <zcorpan@gmail.com>
Date: Fri, 11 Jun 2021 13:23:50 +0200
Subject: [PATCH 17/68] Test </p> and </br> in SVG (#135)

See https://github.com/whatwg/html/pull/6736
---
 tree-construction/foreign-fragment.dat | 42 ++++++++++++++++++++++
 tree-construction/tests26.dat          | 48 ++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/tree-construction/foreign-fragment.dat b/tree-construction/foreign-fragment.dat
index 7aff409c..d5bc22e5 100644
--- a/tree-construction/foreign-fragment.dat
+++ b/tree-construction/foreign-fragment.dat
@@ -576,6 +576,48 @@ svg svg
 #document
 | <p>
 
+#data
+<svg></p><foo>
+#errors
+9: HTML end tag “p” in a foreign namespace context.
+#document-fragment
+div
+#document
+| <svg svg>
+| <p>
+| <foo>
+
+#data
+<svg></br><foo>
+#errors
+10: HTML end tag “br” in a foreign namespace context.
+#document-fragment
+div
+#document
+| <svg svg>
+| <br>
+| <foo>
+
+#data
+</p><foo>
+#errors
+4: HTML end tag “p” in a foreign namespace context.
+#document-fragment
+svg svg
+#document
+| <p>
+| <svg foo>
+
+#data
+</br><foo>
+#errors
+5: HTML end tag “br” in a foreign namespace context.
+#document-fragment
+svg svg
+#document
+| <br>
+| <svg foo>
+
 #data
 <body><foo>
 #errors
diff --git a/tree-construction/tests26.dat b/tree-construction/tests26.dat
index de453b9c..e6f71f6a 100644
--- a/tree-construction/tests26.dat
+++ b/tree-construction/tests26.dat
@@ -391,3 +391,51 @@ Line 1 Col 19 Expected closing tag. Unexpected end of file.
 |     <button>
 |       <p>
 |     <button>
+
+#data
+<svg></p><foo>
+#errors
+9: HTML end tag “p” in a foreign namespace context.
+#document
+| <html>
+|   <head>
+|   <body>
+|     <svg svg>
+|     <p>
+|     <foo>
+
+#data
+<svg></br><foo>
+#errors
+10: HTML end tag “br” in a foreign namespace context.
+#document
+| <html>
+|   <head>
+|   <body>
+|     <svg svg>
+|     <br>
+|     <foo>
+
+#data
+<math></p><foo>
+#errors
+10: HTML end tag “p” in a foreign namespace context.
+#document
+| <html>
+|   <head>
+|   <body>
+|     <math math>
+|     <p>
+|     <foo>
+
+#data
+<math></br><foo>
+#errors
+11: HTML end tag “br” in a foreign namespace context.
+#document
+| <html>
+|   <head>
+|   <body>
+|     <math math>
+|     <br>
+|     <foo>

From b5c31b78eb3532f9563917f50041dca774ad4387 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 14:20:28 -0400
Subject: [PATCH 18/68] -- in a comment isn't an error

`<!-- -- -->` is not an error. The middle `--` cause the tokenizer to
enter the comment end state. The "anything else" clause appends two `-`
to the comment token's data and the current input character is
reconsumed in the comment state.
---
 tree-construction/comments01.dat    | 8 --------
 tree-construction/html5test-com.dat | 1 -
 2 files changed, 9 deletions(-)

diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat
index fa79c2b1..e0619028 100644
--- a/tree-construction/comments01.dat
+++ b/tree-construction/comments01.dat
@@ -57,7 +57,6 @@ FOO<!-- BAR --!
 FOO<!-- BAR --   >BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
-(1,15): unexpected-char-in-comment
 (1,21): eof-in-comment
 #new-errors
 (1:22) eof-in-comment
@@ -72,8 +71,6 @@ FOO<!-- BAR --   >BAZ
 FOO<!-- BAR -- <QUX> -- MUX -->BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
-(1,15): unexpected-char-in-comment
-(1,24): unexpected-char-in-comment
 #document
 | <html>
 |   <head>
@@ -86,8 +83,6 @@ FOO<!-- BAR -- <QUX> -- MUX -->BAZ
 FOO<!-- BAR -- <QUX> -- MUX --!>BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
-(1,15): unexpected-char-in-comment
-(1,24): unexpected-char-in-comment
 (1,31): unexpected-bang-after-double-dash-in-comment
 #new-errors
 (1:32) incorrectly-closed-comment
@@ -103,9 +98,6 @@ FOO<!-- BAR -- <QUX> -- MUX --!>BAZ
 FOO<!-- BAR -- <QUX> -- MUX -- >BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
-(1,15): unexpected-char-in-comment
-(1,24): unexpected-char-in-comment
-(1,31): unexpected-char-in-comment
 (1,35): eof-in-comment
 #new-errors
 (1:36) eof-in-comment
diff --git a/tree-construction/html5test-com.dat b/tree-construction/html5test-com.dat
index f7380101..48d0bf95 100644
--- a/tree-construction/html5test-com.dat
+++ b/tree-construction/html5test-com.dat
@@ -142,7 +142,6 @@
 #data
 <!--foo--bar-->
 #errors
-(1,10): unexpected-char-in-comment
 (1,15): expected-doctype-but-got-eof
 #document
 | <!-- foo--bar -->

From a94f95eba91123d30d068c3e83373b63377adf2b Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 14:09:43 -0400
Subject: [PATCH 19/68] <!-----> is not an error

Starting from the data state, we have
1. `<` switches to the tag open state
2. `!` switches to the markup declaration open state
3. `--` switches to the comment start state
4. `-` (the third one) switches to the comment start dash state
5. `-` (the fourth one) switches to the comment end state
6. `-` (the fifth one) appends `-` to the comment and does not change
   state
7. `>` emits the comment token and switches to the data state.
---
 tree-construction/comments01.dat | 1 -
 tree-construction/tests1.dat     | 1 -
 2 files changed, 2 deletions(-)

diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat
index e0619028..cb508b01 100644
--- a/tree-construction/comments01.dat
+++ b/tree-construction/comments01.dat
@@ -194,7 +194,6 @@ FOO<!-->BAZ
 FOO<!----->BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
-(1,10): unexpected-dash-after-double-dash-in-comment
 #document
 | <html>
 |   <head>
diff --git a/tree-construction/tests1.dat b/tree-construction/tests1.dat
index 1c36c1b8..86632deb 100644
--- a/tree-construction/tests1.dat
+++ b/tree-construction/tests1.dat
@@ -425,7 +425,6 @@ Line1<br>Line2<br>Line3<br>Line4
 #data
 <!-----><font><div>hello<table>excite!<b>me!<th><i>please!</tr><!--X-->
 #errors
-(1,7): unexpected-dash-after-double-dash-in-comment
 (1,14): expected-doctype-but-got-start-tag
 (1,41): unexpected-start-tag-implies-table-voodoo
 (1,48): foster-parenting-character-in-table

From b64b4fa89a0adf86ccbed0fc54032d776117844a Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Wed, 3 Oct 2018 02:18:50 -0400
Subject: [PATCH 20/68] Add DOCTYPE errors (and remove one)

A DOCTYPE token is an error in one of three cases:
1. The token's name is not `html`;
2. The token's public identifier is not missing;
3. The token's system identifier is not missing and the token's system
   identifier isn't `about:legacy-compat`.

This appears to have changed at some point from a much more complex set
of conditions.
---
 tree-construction/doctype01.dat | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tree-construction/doctype01.dat b/tree-construction/doctype01.dat
index c845becf..9efdaf70 100644
--- a/tree-construction/doctype01.dat
+++ b/tree-construction/doctype01.dat
@@ -34,7 +34,6 @@
 #data
 <!DOCTYPE>Hello
 #errors
-(1,9): need-space-after-doctype
 (1,10): expected-doctype-name-but-got-right-bracket
 (1,10): unknown-doctype
 #new-errors
@@ -337,6 +336,7 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">Hello
 #errors
+(2,43): unknown-doctype
 #document
 | <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
 | <html>
@@ -421,6 +421,7 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
 #errors
 (1,50): unexpected-char-in-doctype
+(1,89): unknown-doctype
 #new-errors
 (1:50) missing-whitespace-between-doctype-public-and-system-identifiers
 #document
@@ -433,6 +434,7 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"'http://www.w3.org/TR/html4/strict.dtd'>
 #errors
 (1,50): unexpected-char-in-doctype
+(1,89): unknown-doctype
 #new-errors
 (1:50) missing-whitespace-between-doctype-public-and-system-identifiers
 #document
@@ -446,6 +448,7 @@
 #errors
 (1,21): unexpected-char-in-doctype
 (1,49): unexpected-char-in-doctype
+(1,88): unknown-doctype
 #new-errors
 (1:22) missing-whitespace-after-doctype-public-keyword
 (1:49) missing-whitespace-between-doctype-public-and-system-identifiers
@@ -460,6 +463,7 @@
 #errors
 (1,21): unexpected-char-in-doctype
 (1,49): unexpected-char-in-doctype
+(1,88): unknown-doctype
 #new-errors
 (1:22) missing-whitespace-after-doctype-public-keyword
 (1:49) missing-whitespace-between-doctype-public-and-system-identifiers

From 1421b7fb448225deb23752cdc4a2d262ad2a1bdf Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 15:49:49 -0400
Subject: [PATCH 21/68] Fix errors in doctypes

A `>` after `DOCTYPE` is a missing-doctype-name parse error but it is
not also a missing-whitespace-before-doctype-name parse error.

Doctypes with a public identifier is a (currently unnamed) parse error.
---
 tree-construction/tests6.dat | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tree-construction/tests6.dat b/tree-construction/tests6.dat
index f3991232..8c36dd3d 100644
--- a/tree-construction/tests6.dat
+++ b/tree-construction/tests6.dat
@@ -48,7 +48,6 @@
 #data
 <!doctype>
 #errors
-(1,9): need-space-after-doctype
 (1,10): expected-doctype-name-but-got-right-bracket
 (1,10): unknown-doctype
 #new-errors
@@ -604,6 +603,7 @@ html
 #data
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html></html>
 #errors
+(1,50): doctype-has-public-identifier
 #document
 | <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "">
 | <html>

From ec7d0433260da13f1c539d15f6b1faa7a62a12b6 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 22:02:07 -0400
Subject: [PATCH 22/68] Fix entity errors

Named character references in attributes whose last character is not `;`
and for which the next input character is `=` (or ASCII alphanumeric,
but this isn't tested here), flushes the code points consumed as a
character reference _without_ adding a parse error.

Named character references not in attributes whose last character is not
`;` are errors, regardless of the following character as noted in the
`#new-errors` section but without an entry in `#errors`, the number of
errors are wrong. (See
https://github.com/html5lib/html5lib-tests/issues/107).

Separately, this adds the missing expected-doctype-but-got-start-tag
error.
---
 tree-construction/entities02.dat | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tree-construction/entities02.dat b/tree-construction/entities02.dat
index 0c6e898c..74965a35 100644
--- a/tree-construction/entities02.dat
+++ b/tree-construction/entities02.dat
@@ -45,7 +45,6 @@
 #data
 <div bar="ZZ&gt=YY"></div>
 #errors
-(1,15): named-entity-without-semicolon
 (1,20): expected-doctype-but-got-start-tag
 #document
 | <html>
@@ -204,7 +203,6 @@
 #data
 <div bar="ZZ&pound=23"></div>
 #errors
-(1,18): named-entity-without-semicolon
 (1,23): expected-doctype-but-got-start-tag
 #document
 | <html>
@@ -299,6 +297,8 @@
 #data
 <div>ZZ&AElig=</div>
 #errors
+(1,5): expected-doctype-but-got-start-tag
+(1:14) missing-semicolon-after-character-reference
 #new-errors
 (1:14) missing-semicolon-after-character-reference
 #document

From 3facb049d8c6604ccd6f6ab061c18a95b96eb92d Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 14:52:27 -0400
Subject: [PATCH 23/68] `<!doctype html><script><!` has only one error

Here are the (abbreviated) steps (intermingling the tree-construction
and tokenizer).

1. `<script>` token causes `html` and `head` elements to be inserted and
   is processed in the "in head" insertion mode
2. `<script>` token switches the tokenizer to the script data state and
   switches to the "text" insertion mode
3. `<` switches to the script data less-than sign state
4. `!` switches to the script data escape start state and emits `<!`
5. EOF is reconsumed in the script data state
6. EOF emits an EOF token
7. EOF token (in the "text" insertion mode) is a parse error,
   `<script>` is popped off the stack of open elements, switches back to
   the "in head" insertion mode and reprocesses the token
8. EOF token (in "in head") triggers the "anything else clause" which
   pops the `head` element, switches to "after head", inserts a `body`
   token, switches to "in body", and reprocesses
9. EOF token (in "in body") stops parsing with no error because the
   stack of open elements contains `html` and `body`

Only step 7 adds a parse error.
---
 tree-construction/tests16.dat | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tree-construction/tests16.dat b/tree-construction/tests16.dat
index cea7340a..05f34c13 100644
--- a/tree-construction/tests16.dat
+++ b/tree-construction/tests16.dat
@@ -221,7 +221,6 @@
 <!doctype html><script><!
 #errors
 (1,25): expected-script-data-but-got-eof
-(1,25): expected-named-closing-tag-but-got-eof
 #document
 | <!DOCTYPE html>
 | <html>
@@ -1525,7 +1524,6 @@
 #errors
 (1,8): expected-doctype-but-got-start-tag
 (1,10): expected-script-data-but-got-eof
-(1,10): expected-named-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>

From da0a52caa016caf657c4d0cba7a7060057b09e1c Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Wed, 3 Oct 2018 00:54:08 -0400
Subject: [PATCH 24/68] Include the new error in `#errors`

Without it, the number of errors is incorrect.
---
 tree-construction/tests21.dat | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tree-construction/tests21.dat b/tree-construction/tests21.dat
index 1e2af7c1..d52ab8cc 100644
--- a/tree-construction/tests21.dat
+++ b/tree-construction/tests21.dat
@@ -41,6 +41,7 @@
 <svg><![CDATA[foo
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:18) eof-in-cdata
 (1,17): expected-closing-tag-but-got-eof
 #new-errors
 (1:18) eof-in-cdata
@@ -55,6 +56,7 @@
 <svg><![CDATA[foo
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:18) eof-in-cdata
 (1,17): expected-closing-tag-but-got-eof
 #new-errors
 (1:18) eof-in-cdata
@@ -69,6 +71,7 @@
 <svg><![CDATA[
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:15) eof-in-cdata
 (1,14): expected-closing-tag-but-got-eof
 #new-errors
 (1:15) eof-in-cdata
@@ -117,6 +120,7 @@
 <svg><![CDATA[]]
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:17) eof-in-cdata
 (1,16): expected-closing-tag-but-got-eof
 #new-errors
 (1:17) eof-in-cdata
@@ -131,6 +135,7 @@
 <svg><![CDATA[]
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:16) eof-in-cdata
 (1,15): expected-closing-tag-but-got-eof
 #new-errors
 (1:16) eof-in-cdata
@@ -145,6 +150,7 @@
 <svg><![CDATA[]>a
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:16) eof-in-cdata
 (1,17): expected-closing-tag-but-got-eof
 #new-errors
 (1:18) eof-in-cdata
@@ -236,6 +242,7 @@
 <svg><![CDATA[<svg>a
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:21) eof-in-cdata
 (1,20): expected-closing-tag-but-got-eof
 #new-errors
 (1:21) eof-in-cdata
@@ -250,6 +257,7 @@
 <svg><![CDATA[</svg>a
 #errors
 (1,5): expected-doctype-but-got-start-tag
+(1:22) eof-in-cdata
 (1,21): expected-closing-tag-but-got-eof
 #new-errors
 (1:22) eof-in-cdata

From 8cf56abe9bdf221be5155f4e13cc2c4d466a8836 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 17:39:24 -0400
Subject: [PATCH 25/68] Add missing foster-parenting errors; remove duplicate
 errors

---
 tree-construction/tests18.dat | 80 +++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 28 deletions(-)

diff --git a/tree-construction/tests18.dat b/tree-construction/tests18.dat
index 05363b39..0b6d5dc4 100644
--- a/tree-construction/tests18.dat
+++ b/tree-construction/tests18.dat
@@ -3,7 +3,6 @@
 #errors
 11: Start tag seen without seeing a doctype first. Expected “<!DOCTYPE html>”.
 23: End of file seen and there were open elements.
-11: Unclosed element “plaintext”.
 #document
 | <html>
 |   <head>
@@ -27,7 +26,6 @@
 <!doctype html><html><plaintext></plaintext>
 #errors
 44: End of file seen and there were open elements.
-32: Unclosed element “plaintext”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -40,7 +38,6 @@
 <!doctype html><head><plaintext></plaintext>
 #errors
 44: End of file seen and there were open elements.
-32: Unclosed element “plaintext”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -54,7 +51,6 @@
 #errors
 42: Bad start tag in “plaintext” in “head”.
 54: End of file seen and there were open elements.
-42: Unclosed element “plaintext”.
 #script-off
 #document
 | <!DOCTYPE html>
@@ -69,7 +65,6 @@
 <!doctype html></head><plaintext></plaintext>
 #errors
 45: End of file seen and there were open elements.
-33: Unclosed element “plaintext”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -82,7 +77,6 @@
 <!doctype html><body><plaintext></plaintext>
 #errors
 44: End of file seen and there were open elements.
-32: Unclosed element “plaintext”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -95,8 +89,19 @@
 <!doctype html><table><plaintext></plaintext>
 #errors
 (1,33): foster-parenting-start-tag
-(1,45): foster-parenting-character
-(1,45): eof-in-table
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): foster-parenting-character
+(1,46): eof-in-table
 #document
 | <!DOCTYPE html>
 | <html>
@@ -110,8 +115,19 @@
 <!doctype html><table><tbody><plaintext></plaintext>
 #errors
 (1,40): foster-parenting-start-tag
-(1,41): foster-parenting-character
-(1,52): eof-in-table
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): foster-parenting-character
+(1,53): eof-in-table
 #document
 | <!DOCTYPE html>
 | <html>
@@ -126,8 +142,19 @@
 <!doctype html><table><tbody><tr><plaintext></plaintext>
 #errors
 (1,44): foster-parenting-start-tag
-(1,56): foster-parenting-character
-(1,56): eof-in-table
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): foster-parenting-character
+(1,57): eof-in-table
 #document
 | <!DOCTYPE html>
 | <html>
@@ -173,11 +200,20 @@
 #data
 <!doctype html><table><colgroup><plaintext></plaintext>
 #errors
-43: Start tag “plaintext” seen in “table”.
-55: Misplaced non-space characters inside a table.
+(1,43): foster-parenting-start-tag
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
+(1,56): foster-parenting-character
 55: End of file seen and there were open elements.
-43: Unclosed element “plaintext”.
-22: Unclosed element “table”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -194,7 +230,6 @@
 34: Stray start tag “plaintext”.
 46: Stray end tag “plaintext”.
 47: End of file seen and there were open elements.
-23: Unclosed element “select”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -210,8 +245,6 @@
 41: Stray start tag “plaintext”.
 51: “caption” start tag with “select” open.
 52: End of file seen and there were open elements.
-51: Unclosed element “caption”.
-22: Unclosed element “table”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -227,8 +260,6 @@
 <!doctype html><template><plaintext>a</template>b
 #errors
 49: End of file seen and there were open elements.
-36: Unclosed element “plaintext”.
-25: Unclosed element “template”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -244,7 +275,6 @@
 #errors
 39: Stray start tag “plaintext”.
 51: End of file seen and there were open elements.
-39: Unclosed element “plaintext”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -259,7 +289,6 @@
 36: Stray start tag “plaintext”.
 48: Stray end tag “plaintext”.
 48: End of file seen and there were open elements.
-25: Unclosed element “frameset”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -282,7 +311,6 @@
 #errors
 46: Stray start tag “plaintext”.
 58: End of file seen and there were open elements.
-46: Unclosed element “plaintext”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -306,7 +334,6 @@
 <!doctype html><svg><plaintext>a</plaintext>b
 #errors
 45: End of file seen and there were open elements.
-20: Unclosed element “svg”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -321,9 +348,6 @@
 <!doctype html><svg><title><plaintext>a</plaintext>b
 #errors
 52: End of file seen and there were open elements.
-38: Unclosed element “plaintext”.
-27: Unclosed element “title”.
-20: Unclosed element “svg”.
 #document
 | <!DOCTYPE html>
 | <html>

From 63a678891013bff91826d242dc2c8e450a6a2f25 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 14:37:03 -0400
Subject: [PATCH 26/68] The space is not foster parented

```
A<table><tr> B</tr> </em>C</table>
```

The second space isn't foster parented. The `</tr>` triggers the
"anything else" clause of the in table text insertion mode which causes
` B` to be foster parented. The following space clears the pending table
character tokens list and thus the `</em>` inserts the space without
foster parenting.

This is also clear from the text node `A BC` in the result. A foster
parented second space would result in `A B C`.
---
 tree-construction/tests7.dat | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tree-construction/tests7.dat b/tree-construction/tests7.dat
index 395dc72b..8c5596b0 100644
--- a/tree-construction/tests7.dat
+++ b/tree-construction/tests7.dat
@@ -391,7 +391,6 @@ A<table><tr> B</tr> </em>C</table>
 (1,1): expected-doctype-but-got-chars
 (1,13): foster-parenting-character
 (1,14): foster-parenting-character
-(1,20): foster-parenting-character
 (1,25): unexpected-end-tag
 (1,25): unexpected-end-tag-in-special-element
 (1,26): foster-parenting-character

From 6da558398883d3418b0938d4c38de74e3d27c183 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 16:43:33 -0400
Subject: [PATCH 27/68] End tag causes two parse errors

```
<math><annotation-xml></svg>x
```

The `</svg>` is parsed in foreign context because
1. The stack of open elements is not empty;
2. The adjusted current node (`annotation-xml` element) is not in the
   HTML namespace;
3. The adjusted current node is not a MathML text integration point;
4. The adjusted current node is not a MathML text integration point
   (separate condition from 3);
5. The adjusted current node is a MathML `annotation-xml` element but
   the token (`</svg>`) is not a start tag;
6. The token is not a start tag (also the `annotation-xml` isn't an HTML
   integration point because it lacks a particular attribute);
7. The token isn't a character token (also the `annotation-xml` isn't an
   HTML integration point).

Thus the "any other end tag" clause of 12.2.6.5 "The rules for parsing
tokens in foreign context" applies.

The current node's tag name (`annotation-xml`) is not the same as the
tag name of the token (`svg`) so this is a parse error.

Since there's no `svg` element in the stack of open elements, the loop
in step 3 through 6 of the "any other end tag" clause exits when the
`body` element is reached. and the token is processed according to the
current insertion mode, "in body."

The `</svg>` matches the "any other end tag" of "in body." Since the
MathML `annotation-xml` element is not an HTML element but is special
which is a second parse error.
---
 tree-construction/tests20.dat | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tree-construction/tests20.dat b/tree-construction/tests20.dat
index afdae743..1bfd130e 100644
--- a/tree-construction/tests20.dat
+++ b/tree-construction/tests20.dat
@@ -557,6 +557,7 @@
 <math><annotation-xml></svg>x
 #errors
 (1,6): expected-doctype-but-got-start-tag
+(1,28): unexpected-end-tag-in-math
 (1,28): unexpected-end-tag
 (1,29): expected-closing-tag-but-got-eof
 #document

From 68ee04021f187564f885d3a9bb9836da02df1122 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 18:03:21 -0400
Subject: [PATCH 28/68] Add missing errors

These are tricky. A `<math>` tag as the first token in a document
fragment whose context node is `td` (or `th`, but this isn't tested
here) is fine. One in a document fragment whose context node is `tr`,
`thead`, `tbody`, or `tfoot` is a parse error and the elements are
foster parented.

The table element start tags after the `<mo>` tag are parsed as html and
there are no `tr` (for `tr` elements) elements or `thead`, `tbody`, or
`tfoot` (for the others) elements in table scope which is a parse error.
I just invented some sames for those.

The `</table>` tag after the `<mo>` is parsed as foreign but it doesn't
match `<mo>` so it's a parse error.

Finally, the EOF occurs while a bunch of elements are open which is a
parse error.
---
 tree-construction/math.dat | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tree-construction/math.dat b/tree-construction/math.dat
index ae9cd7c6..d6a8ae56 100644
--- a/tree-construction/math.dat
+++ b/tree-construction/math.dat
@@ -1,6 +1,8 @@
 #data
 <math><tr><td><mo><tr>
 #errors
+(1,22): unexpected-start-tag
+(1,23): expected-closing-tag-but-got-eof
 #document-fragment
 td
 #document
@@ -12,6 +14,9 @@ td
 #data
 <math><tr><td><mo><tr>
 #errors
+(1,6): foster-parenting-start-tag
+(1,22): expected-tr-in-table-scope
+(1,23): expected-closing-tag-but-got-eof
 #document-fragment
 tr
 #document
@@ -23,6 +28,9 @@ tr
 #data
 <math><thead><mo><tbody>
 #errors
+(1,6): foster-parenting-start-tag
+(1,24): expected-table-part-in-table-scope
+(1,25): expected-closing-tag-but-got-eof
 #document-fragment
 thead
 #document
@@ -33,6 +41,9 @@ thead
 #data
 <math><tfoot><mo><tbody>
 #errors
+(1,6): foster-parenting-start-tag
+(1,24): expected-table-part-in-table-scope
+(1,25): expected-closing-tag-but-got-eof
 #document-fragment
 tfoot
 #document
@@ -43,6 +54,9 @@ tfoot
 #data
 <math><tbody><mo><tfoot>
 #errors
+(1,6): foster-parenting-start-tag
+(1,24): expected-table-part-in-table-scope
+(1,25): expected-closing-tag-but-got-eof
 #document-fragment
 tbody
 #document
@@ -53,6 +67,9 @@ tbody
 #data
 <math><tbody><mo></table>
 #errors
+(1,6): foster-parenting-start-tag
+(1,25): unexpected-end-tag-in-math
+(1,26): expected-closing-tag-but-got-eof
 #document-fragment
 tbody
 #document
@@ -63,6 +80,9 @@ tbody
 #data
 <math><thead><mo></table>
 #errors
+(1,6): foster-parenting-start-tag
+(1,25): unexpected-end-tag-in-math
+(1,26): expected-closing-tag-but-got-eof
 #document-fragment
 tbody
 #document
@@ -73,6 +93,9 @@ tbody
 #data
 <math><tfoot><mo></table>
 #errors
+(1,6): foster-parenting-start-tag
+(1,25): unexpected-end-tag-in-math
+(1,26): expected-closing-tag-but-got-eof
 #document-fragment
 tbody
 #document

From fe972fa6dd3f06e34523fb10ebd403fc9d7fd107 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 17:24:47 -0400
Subject: [PATCH 29/68] Add missing errors

The `</td>` is parsed in the "in cell" insertion mode and is an error
because it doesn't match the open `span` element. Each of the three
characters in `Foo` is reparented and is an error.
---
 tree-construction/namespace-sensitivity.dat | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tree-construction/namespace-sensitivity.dat b/tree-construction/namespace-sensitivity.dat
index ca35c0e7..050dca75 100644
--- a/tree-construction/namespace-sensitivity.dat
+++ b/tree-construction/namespace-sensitivity.dat
@@ -1,6 +1,12 @@
 #data
 <body><table><tr><td><svg><td><foreignObject><span></td>Foo
 #errors
+(1,6): expected-doctype-but-got-start-tag
+(1,56): unexpected-end-tag
+(1,60): foster-parenting-character
+(1,60): foster-parenting-character
+(1,60): foster-parenting-character
+(1,60): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>

From 0989b6b7f2813509651c28076129f9a242694d24 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 00:11:19 -0400
Subject: [PATCH 30/68] More missing errors

---
 tree-construction/webkit01.dat |  4 ++++
 tree-construction/webkit02.dat | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/tree-construction/webkit01.dat b/tree-construction/webkit01.dat
index 2127cfe1..3bb3bb90 100644
--- a/tree-construction/webkit01.dat
+++ b/tree-construction/webkit01.dat
@@ -687,6 +687,10 @@ console.log("FOO<span>BAR</span>BAZ");
 #data
 <table><tr><td><svg><desc><td></desc><circle>
 #errors
+(1,7): expected-doctype-but-got-start-tag
+(1,30): unexpected-start-tag
+(1,37): unexpected-end-tag
+(1,22): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index 791991d2..dddfe2a7 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -138,6 +138,7 @@
 #data
 <legend>test</legend>
 #errors
+(1,7): expected-doctype-but-got-start-tag
 #document
 | <html>
 |   <head>
@@ -148,6 +149,9 @@
 #data
 <table><input>
 #errors
+(1,7): expected-doctype-but-got-start-tag
+(1,14): foster-parenting-start-tag
+(1,15): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
@@ -158,6 +162,9 @@
 #data
 <b><em><foo><foo><aside></b>
 #errors
+(1,3): expected-doctype-but-got-start-tag
+(1,28): adoption-agency-9
+(1,29): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
@@ -173,6 +180,10 @@
 #data
 <b><em><foo><foo><aside></b></em>
 #errors
+(1,3): expected-doctype-but-got-start-tag
+(1,28): adoption-agency-9
+(1,33): adoption-agency-9
+(1,34): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
@@ -189,6 +200,9 @@
 #data
 <b><em><foo><foo><foo><aside></b>
 #errors
+(1,3): expected-doctype-but-got-start-tag
+(1,33): adoption-agency-9
+(1,34): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
@@ -204,6 +218,10 @@
 #data
 <b><em><foo><foo><foo><aside></b></em>
 #errors
+(1,3): expected-doctype-but-got-start-tag
+(1,33): adoption-agency-9
+(1,38): adoption-agency-9
+(1,39): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
@@ -219,6 +237,9 @@
 #data
 <b><em><foo><foo><foo><foo><foo><foo><foo><foo><foo><foo><aside></b></em>
 #errors
+(1,68): adoption-agency-9
+(1,73): adoption-agency-9
+(1,74): expected-closing-tag-but-got-eof
 #document-fragment
 div
 #document
@@ -240,6 +261,9 @@ div
 #data
 <b><em><foo><foob><foob><foob><foob><fooc><fooc><fooc><fooc><food><aside></b></em>
 #errors
+(1,77): adoption-agency-9
+(1,82): adoption-agency-9
+(1,83): expected-closing-tag-but-got-eof
 #document-fragment
 div
 #document
@@ -261,6 +285,8 @@ div
 #data
 <option><XH<optgroup></optgroup>
 #errors
+(1,21): unexpected-start-tag-in-select
+(1,32): unexpected-end-tag-in-select
 #document-fragment
 select
 #document
@@ -269,6 +295,8 @@ select
 #data
 <svg><foreignObject><div>foo</div><plaintext></foreignObject></svg><div>bar</div>
 #errors
+(1,5): expected-doctype-but-got-start-tag
+(1,82): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>
@@ -283,6 +311,8 @@ select
 #data
 <svg><foreignObject></foreignObject><title></svg>foo
 #errors
+(1,5): expected-doctype-but-got-start-tag
+(1,49): expected-one-end-tag-but-got-another
 #document
 | <html>
 |   <head>
@@ -295,6 +325,9 @@ select
 #data
 </foreignObject><plaintext><div>foo</div>
 #errors
+(1,16): expected-doctype-but-got-end-tag
+(1,16): unexpected-end-tag-before-html
+(1,42): expected-closing-tag-but-got-eof
 #document
 | <html>
 |   <head>

From 7707e38234aabbbb0eed2a91c1f48ac3c255c9b2 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 12:50:44 -0400
Subject: [PATCH 31/68] missing DOCTYPE and two foster parented elements

---
 tree-construction/tests8.dat | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tree-construction/tests8.dat b/tree-construction/tests8.dat
index ba2e63dd..d532801e 100644
--- a/tree-construction/tests8.dat
+++ b/tree-construction/tests8.dat
@@ -90,6 +90,9 @@ x"
 #data
 <table><li><li></table>
 #errors
+(1,7): expected-doctype-but-got-start-tag
+(1,11): foster-parenting-start-tag
+(1,15): foster-parenting-start-tag
 #document
 | <html>
 |   <head>

From 8d18a37ac88b3bda433f3ce0897b3ab96c6bdaf2 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 12:44:04 -0400
Subject: [PATCH 32/68] ruby closed with span still open

---
 tree-construction/ruby.dat | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tree-construction/ruby.dat b/tree-construction/ruby.dat
index 696782f0..f4e5e4e4 100644
--- a/tree-construction/ruby.dat
+++ b/tree-construction/ruby.dat
@@ -203,6 +203,7 @@
 <html><ruby>a<rtc>b<span></ruby></html>
 #errors
 (1,6): expected-doctype-but-got-start-tag
+(1,32): unexpected-end-tag
 #document
 | <html>
 |   <head>

From 8d9655a071bbb9f3a67ef3e1f02c20204a9de122 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 17:08:07 -0400
Subject: [PATCH 33/68] Add new errors to errors

If the `#errors` section should have the same number of lines as errors
(see https://github.com/html5lib/html5lib-tests/issues/107), then the
NULL-character errors need to be accounted for.
---
 tree-construction/plain-text-unsafe.dat | Bin 9388 -> 9486 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/tree-construction/plain-text-unsafe.dat b/tree-construction/plain-text-unsafe.dat
index dfb5cb6329222da4ca6f465bf761f324943ae4c6..e904eff0b73aab1f2611501a76bb865e4b086d60 100644
GIT binary patch
delta 44
zcmZ4E+2^%EkB3deP{&f!YVtv5naTP*KUp*ktt=;h<ddIV!7C3G@Y!6-Yc37|HXaRb

delta 32
lcmeD4TI0Dvk7u$!&o3bMot(+5G&x@x$Y)pHtikt83;@O03n%~p


From 6a0611e4618ee6de5b917e0ab2552d49d1afceb5 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 17:14:17 -0400
Subject: [PATCH 34/68] Remove duplicated error messages

---
 tree-construction/menuitem-element.dat | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/tree-construction/menuitem-element.dat b/tree-construction/menuitem-element.dat
index 43aa0c67..fb13c3c3 100644
--- a/tree-construction/menuitem-element.dat
+++ b/tree-construction/menuitem-element.dat
@@ -3,7 +3,6 @@
 #errors
 10: Start tag seen without seeing a doctype first. Expected “<!DOCTYPE html>”.
 10: End of file seen and there were open elements.
-10: Unclosed element “menuitem”.
 #document
 | <html>
 |   <head>
@@ -24,7 +23,6 @@
 <!DOCTYPE html><body><menuitem>A
 #errors
 32: End of file seen and there were open elements.
-31: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -37,8 +35,6 @@
 <!DOCTYPE html><body><menuitem>A<menuitem>B
 #errors
 43: End of file seen and there were open elements.
-42: Unclosed element “menuitem”.
-31: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -53,7 +49,6 @@
 <!DOCTYPE html><body><menuitem>A<menu>B</menu>
 #errors
 46: End of file seen and there were open elements.
-31: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -68,7 +63,6 @@
 <!DOCTYPE html><body><menuitem>A<hr>B
 #errors
 37: End of file seen and there were open elements.
-31: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -83,7 +77,6 @@
 <!DOCTYPE html><li><menuitem><li>
 #errors
 33: End tag “li” implied, but there were open elements.
-29: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -98,7 +91,6 @@
 #errors
 39: Stray end tag “menuitem”.
 40: End of file seen and there were open elements.
-25: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -112,9 +104,7 @@
 <!DOCTYPE html><p><b></p><menuitem>
 #errors
 25: End tag “p” seen, but there were open elements.
-21: Unclosed element “b”.
 35: End of file seen and there were open elements.
-35: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -129,7 +119,6 @@
 <!DOCTYPE html><menuitem><asdf></menuitem>x
 #errors
 42: End tag “menuitem” seen, but there were open elements.
-31: Unclosed element “asdf”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -184,7 +173,6 @@
 <!DOCTYPE html><option><menuitem>
 #errors
 33: End of file seen and there were open elements.
-33: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -197,7 +185,6 @@
 <!DOCTYPE html><menuitem><option>
 #errors
 33: End of file seen and there were open elements.
-25: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -210,7 +197,6 @@
 <!DOCTYPE html><menuitem></body>
 #errors
 32: End tag for  “body” seen, but there were unclosed elements.
-25: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -222,7 +208,6 @@
 <!DOCTYPE html><menuitem></html>
 #errors
 32: End tag for  “html” seen, but there were unclosed elements.
-25: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -234,7 +219,6 @@
 <!DOCTYPE html><menuitem><p>
 #errors
 28: End of file seen and there were open elements.
-25: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -247,7 +231,6 @@
 <!DOCTYPE html><menuitem><li>
 #errors
 29: End of file seen and there were open elements.
-25: Unclosed element “menuitem”.
 #document
 | <!DOCTYPE html>
 | <html>

From e0007e645c09c4e971950a6d073b7652b81ed2b7 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Tue, 2 Oct 2018 15:17:07 -0400
Subject: [PATCH 35/68] Remove duplicate errors

Also fix up the `#new-errors` section to make the line and column
numbers match the others.
---
 tree-construction/foreign-fragment.dat | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/tree-construction/foreign-fragment.dat b/tree-construction/foreign-fragment.dat
index d5bc22e5..cc1e9121 100644
--- a/tree-construction/foreign-fragment.dat
+++ b/tree-construction/foreign-fragment.dat
@@ -3,7 +3,6 @@
 #errors
 6: HTML start tag “nobr” in a foreign namespace context.
 7: End of file seen and there were open elements.
-6: Unclosed element “nobr”.
 #document-fragment
 svg path
 #document
@@ -35,7 +34,6 @@ svg path
 #errors
 10: End tag “path” did not match the name of the current open element (“g”).
 11: End of file seen and there were open elements.
-3: Unclosed element “g”.
 #document-fragment
 svg path
 #document
@@ -173,7 +171,6 @@ math ms
 #errors
 51: Self-closing syntax (“/>”) used on a non-void HTML element. Ignoring the slash and treating as a start tag.
 52: End of file seen and there were open elements.
-51: Unclosed element “ms”.
 #new-errors
 (1:44-1:49) non-void-html-element-start-tag-with-trailing-solidus
 #document-fragment
@@ -216,7 +213,6 @@ math ms
 #errors
 51: Self-closing syntax (“/>”) used on a non-void HTML element. Ignoring the slash and treating as a start tag.
 52: End of file seen and there were open elements.
-51: Unclosed element “mn”.
 #new-errors
 (1:44-1:49) non-void-html-element-start-tag-with-trailing-solidus
 #document-fragment
@@ -259,7 +255,6 @@ math mn
 #errors
 51: Self-closing syntax (“/>”) used on a non-void HTML element. Ignoring the slash and treating as a start tag.
 52: End of file seen and there were open elements.
-51: Unclosed element “mo”.
 #new-errors
 (1:44-1:49) non-void-html-element-start-tag-with-trailing-solidus
 #document-fragment
@@ -302,7 +297,6 @@ math mo
 #errors
 51: Self-closing syntax (“/>”) used on a non-void HTML element. Ignoring the slash and treating as a start tag.
 52: End of file seen and there were open elements.
-51: Unclosed element “mi”.
 #new-errors
 (1:44-1:49) non-void-html-element-start-tag-with-trailing-solidus
 #document-fragment
@@ -345,7 +339,6 @@ math mi
 #errors
 51: Self-closing syntax (“/>”) used on a non-void HTML element. Ignoring the slash and treating as a start tag.
 52: End of file seen and there were open elements.
-51: Unclosed element “mtext”.
 #new-errors
 (1:44-1:52) non-void-html-element-start-tag-with-trailing-solidus
 #document-fragment

From 79154f9b86cfa1ac8d045d700bf81ad86d883f7b Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 23:22:08 -0400
Subject: [PATCH 36/68] Remove duplicate errors

These all appear to be duplicated errors (or more like an explanation of
the error).
---
 tree-construction/blocks.dat | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/tree-construction/blocks.dat b/tree-construction/blocks.dat
index 5d3871ea..a1a9c752 100644
--- a/tree-construction/blocks.dat
+++ b/tree-construction/blocks.dat
@@ -2,7 +2,6 @@
 <!doctype html><p>foo<address>bar<p>baz
 #errors
 (1,39): expected-closing-tag-but-got-eof
-30: Unclosed element “address”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -32,7 +31,6 @@
 <!doctype html><p>foo<article>bar<p>baz
 #errors
 (1,39): expected-closing-tag-but-got-eof
-30: Unclosed element “article”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -62,7 +60,6 @@
 <!doctype html><p>foo<aside>bar<p>baz
 #errors
 (1,37): expected-closing-tag-but-got-eof
-28: Unclosed element “aside”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -92,7 +89,6 @@
 <!doctype html><p>foo<blockquote>bar<p>baz
 #errors
 (1,42): expected-closing-tag-but-got-eof
-33: Unclosed element “blockquote”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -122,7 +118,6 @@
 <!doctype html><p>foo<center>bar<p>baz
 #errors
 (1,38): expected-closing-tag-but-got-eof
-29: Unclosed element “center”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -152,7 +147,6 @@
 <!doctype html><p>foo<details>bar<p>baz
 #errors
 (1,39): expected-closing-tag-but-got-eof
-30: Unclosed element “details”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -182,7 +176,6 @@
 <!doctype html><p>foo<dialog>bar<p>baz
 #errors
 (1,38): expected-closing-tag-but-got-eof
-29: Unclosed element “dialog”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -212,7 +205,6 @@
 <!doctype html><p>foo<dir>bar<p>baz
 #errors
 (1,35): expected-closing-tag-but-got-eof
-26: Unclosed element “dir”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -242,7 +234,6 @@
 <!doctype html><p>foo<div>bar<p>baz
 #errors
 (1,35): expected-closing-tag-but-got-eof
-26: Unclosed element “div”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -272,7 +263,6 @@
 <!doctype html><p>foo<dl>bar<p>baz
 #errors
 (1,34): expected-closing-tag-but-got-eof
-25: Unclosed element “dl”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -302,7 +292,6 @@
 <!doctype html><p>foo<fieldset>bar<p>baz
 #errors
 (1,40): expected-closing-tag-but-got-eof
-31: Unclosed element “fieldset”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -332,7 +321,6 @@
 <!doctype html><p>foo<figcaption>bar<p>baz
 #errors
 (1,42): expected-closing-tag-but-got-eof
-33: Unclosed element “figcaption”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -362,7 +350,6 @@
 <!doctype html><p>foo<figure>bar<p>baz
 #errors
 (1,38): expected-closing-tag-but-got-eof
-29: Unclosed element “figure”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -392,7 +379,6 @@
 <!doctype html><p>foo<footer>bar<p>baz
 #errors
 (1,38): expected-closing-tag-but-got-eof
-29: Unclosed element “footer”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -422,7 +408,6 @@
 <!doctype html><p>foo<header>bar<p>baz
 #errors
 (1,38): expected-closing-tag-but-got-eof
-29: Unclosed element “header”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -452,7 +437,6 @@
 <!doctype html><p>foo<hgroup>bar<p>baz
 #errors
 (1,38): expected-closing-tag-but-got-eof
-29: Unclosed element “hgroup”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -482,7 +466,6 @@
 <!doctype html><p>foo<listing>bar<p>baz
 #errors
 (1,39): expected-closing-tag-but-got-eof
-30: Unclosed element “listing”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -512,7 +495,6 @@
 <!doctype html><p>foo<menu>bar<p>baz
 #errors
 (1,36): expected-closing-tag-but-got-eof
-27: Unclosed element “menu”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -542,7 +524,6 @@
 <!doctype html><p>foo<nav>bar<p>baz
 #errors
 (1,35): expected-closing-tag-but-got-eof
-26: Unclosed element “nav”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -572,7 +553,6 @@
 <!doctype html><p>foo<ol>bar<p>baz
 #errors
 (1,34): expected-closing-tag-but-got-eof
-25: Unclosed element “ol”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -602,7 +582,6 @@
 <!doctype html><p>foo<pre>bar<p>baz
 #errors
 (1,35): expected-closing-tag-but-got-eof
-26: Unclosed element “pre”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -632,7 +611,6 @@
 <!doctype html><p>foo<section>bar<p>baz
 #errors
 (1,39): expected-closing-tag-but-got-eof
-30: Unclosed element “section”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -662,7 +640,6 @@
 <!doctype html><p>foo<summary>bar<p>baz
 #errors
 (1,39): expected-closing-tag-but-got-eof
-30: Unclosed element “summary”.
 #document
 | <!DOCTYPE html>
 | <html>
@@ -692,7 +669,6 @@
 <!doctype html><p>foo<ul>bar<p>baz
 #errors
 (1,34): expected-closing-tag-but-got-eof
-25: Unclosed element “ul”.
 #document
 | <!DOCTYPE html>
 | <html>

From 88655184c68700cc298239f5daccb14ccfa56981 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Mon, 1 Oct 2018 23:09:31 -0400
Subject: [PATCH 37/68] Fix template errors

Each character in the table (caused by the `<col>`) gets reparented and
causes an error.

The second `<a>` gets reparented but there's already one open, so that's
a second error but the open one is not in scope so that's a third error.
---
 tree-construction/template.dat | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index b38d4f58..2d97183e 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1089,7 +1089,11 @@ eof in template
 <body><template><col>Hello
 #errors
 no doctype
-unexpected text
+(1,27): foster-parenting-character
+(1,27): foster-parenting-character
+(1,27): foster-parenting-character
+(1,27): foster-parenting-character
+(1,27): foster-parenting-character
 eof in template
 #document
 | <html>
@@ -1593,6 +1597,11 @@ eof table
 #data
 <template><a><table><a>
 #errors
+(1,10): expected-doctype-but-got-start-tag
+(1,23): foster-parenting-start-tag
+(1,23): unexpected-start-tag
+(1,23): formatting-element-not-in-scope
+(1,24): eof-in-template
 #document
 | <html>
 |   <head>

From 3438ae3074452d53d9062e9b67cf3d98f68d69a5 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Sat, 26 Jun 2021 17:37:24 -0400
Subject: [PATCH 38/68] Fix EOF error line/columns, add to #errors

Fix the line and column numbers.

Currently, the number of errors in the `#errors` section of the test is
the correct number of errors. Counting the ones in `#new-errors` breaks
hundreds of tests because they contain an equivalent error in `#errors`.
So this adds the eof in comment error to `#errors` as well.
---
 tree-construction/comments01.dat | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat
index cb508b01..4b9ff957 100644
--- a/tree-construction/comments01.dat
+++ b/tree-construction/comments01.dat
@@ -29,8 +29,9 @@ FOO<!-- BAR --!>BAZ
 FOO<!-- BAR --! >BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
+(1:21) eof-in-comment
 #new-errors
-(1:20) eof-in-comment
+(1:21) eof-in-comment
 #document
 | <html>
 |   <head>
@@ -43,8 +44,9 @@ FOO<!-- BAR --!
 >BAZ
 #errors
 (1,3): expected-doctype-but-got-chars
+(2:5) eof-in-comment
 #new-errors
-(1:20) eof-in-comment
+(2:5) eof-in-comment
 #document
 | <html>
 |   <head>

From 7bb533c171eb6e953264f243e728806569ae00fd Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Sat, 26 Jun 2021 17:40:34 -0400
Subject: [PATCH 39/68] Add missing errors

Adds the other spec-mandated errors. When an error is caused by a tag,
the line and column numbers are the line and column the tag starts in.
---
 tree-construction/foreign-fragment.dat | 12 ++++++++++++
 tree-construction/svg.dat              | 23 +++++++++++++++++++++++
 tree-construction/tests26.dat          | 12 ++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/tree-construction/foreign-fragment.dat b/tree-construction/foreign-fragment.dat
index cc1e9121..448d9c8e 100644
--- a/tree-construction/foreign-fragment.dat
+++ b/tree-construction/foreign-fragment.dat
@@ -573,6 +573,8 @@ svg svg
 <svg></p><foo>
 #errors
 9: HTML end tag “p” in a foreign namespace context.
+(1:6) Unexpected </p> from in body insertion mode
+(1:15) Unexpected EOF
 #document-fragment
 div
 #document
@@ -584,6 +586,8 @@ div
 <svg></br><foo>
 #errors
 10: HTML end tag “br” in a foreign namespace context.
+(1:6) Unexpected </br> from in body insertion mode
+(1:16) Unexpected EOF
 #document-fragment
 div
 #document
@@ -595,6 +599,8 @@ div
 </p><foo>
 #errors
 4: HTML end tag “p” in a foreign namespace context.
+(1:1) Unexpected </p> from in body insertion mode
+(1:10) Unexpected EOF
 #document-fragment
 svg svg
 #document
@@ -605,6 +611,8 @@ svg svg
 </br><foo>
 #errors
 5: HTML end tag “br” in a foreign namespace context.
+(1:1) Unexpected </br> from in body insertion mode
+(1:11) Unexpected EOF
 #document-fragment
 svg svg
 #document
@@ -615,6 +623,8 @@ svg svg
 <body><foo>
 #errors
 6: HTML start tag “body” in a foreign namespace context.
+(1:1) Unexpected <body> from in body insertion mode
+(1:12) Unexpected EOF
 #document-fragment
 svg svg
 #document
@@ -624,6 +634,7 @@ svg svg
 <p><foo>
 #errors
 3: HTML start tag “p” in a foreign namespace context.
+(1:9) Unexpected EOF
 #document-fragment
 svg svg
 #document
@@ -634,6 +645,7 @@ svg svg
 <p></p><foo>
 #errors
 3: HTML start tag “p” in a foreign namespace context.
+(1:13) Unexpected EOF
 #document-fragment
 svg svg
 #document
diff --git a/tree-construction/svg.dat b/tree-construction/svg.dat
index 8e9a2bbb..a452e7af 100644
--- a/tree-construction/svg.dat
+++ b/tree-construction/svg.dat
@@ -1,6 +1,8 @@
 #data
 <svg><tr><td><title><tr>
 #errors
+(1:21) Unexpected <tr> tag
+(1:25) Unexpected EOF
 #document-fragment
 td
 #document
@@ -12,6 +14,9 @@ td
 #data
 <svg><tr><td><title><tr>
 #errors
+(1:1) Unexpected <svg> tag
+(1:21) Unexpected <tr> tag
+(1:25) Unexpected EOF
 #document-fragment
 tr
 #document
@@ -23,6 +28,9 @@ tr
 #data
 <svg><thead><title><tbody>
 #errors
+(1:1) Unexpected <svg> tag
+(1:20) Unexpected <tbody> tag
+(1:27) Unexpected EOF
 #document-fragment
 thead
 #document
@@ -33,6 +41,9 @@ thead
 #data
 <svg><tfoot><title><tbody>
 #errors
+(1:1) Unexpected <svg> tag
+(1:20) Unexpected <tbody> tag
+(1:27) Unexpected EOF
 #document-fragment
 tfoot
 #document
@@ -43,6 +54,9 @@ tfoot
 #data
 <svg><tbody><title><tfoot>
 #errors
+(1:1) Unexpected <svg> tag
+(1:20) Unexpected <tfoot> tag
+(1:27) Unexpected EOF
 #document-fragment
 tbody
 #document
@@ -53,6 +67,9 @@ tbody
 #data
 <svg><tbody><title></table>
 #errors
+(1:1) Unexpected <svg> tag
+(1:20) Unexpected </table> tag
+(1:28) Unexpected EOF
 #document-fragment
 tbody
 #document
@@ -63,6 +80,9 @@ tbody
 #data
 <svg><thead><title></table>
 #errors
+(1:1) Unexpected <svg> tag
+(1:20) Unexpected </table> tag
+(1:28) Unexpected EOF
 #document-fragment
 tbody
 #document
@@ -73,6 +93,9 @@ tbody
 #data
 <svg><tfoot><title></table>
 #errors
+(1:1) Unexpected <svg> tag
+(1:20) Unexpected </table> tag
+(1:28) Unexpected EOF
 #document-fragment
 tbody
 #document
diff --git a/tree-construction/tests26.dat b/tree-construction/tests26.dat
index e6f71f6a..1ba2be2d 100644
--- a/tree-construction/tests26.dat
+++ b/tree-construction/tests26.dat
@@ -395,7 +395,10 @@ Line 1 Col 19 Expected closing tag. Unexpected end of file.
 #data
 <svg></p><foo>
 #errors
+(1:1) Missing doctype
 9: HTML end tag “p” in a foreign namespace context.
+(1:6) Unexpected </p> from in body insertion mode
+(1:16) Unexpected EOF
 #document
 | <html>
 |   <head>
@@ -407,7 +410,10 @@ Line 1 Col 19 Expected closing tag. Unexpected end of file.
 #data
 <svg></br><foo>
 #errors
+(1:1) Missing doctype
 10: HTML end tag “br” in a foreign namespace context.
+(1:6) Unexpected </br> from in body insertion mode
+(1:16) Unexpected EOF
 #document
 | <html>
 |   <head>
@@ -419,7 +425,10 @@ Line 1 Col 19 Expected closing tag. Unexpected end of file.
 #data
 <math></p><foo>
 #errors
+(1:1) Missing doctype
 10: HTML end tag “p” in a foreign namespace context.
+(1:7) Unexpected </p> from in body insertion mode
+(1:16) Unexpected EOF
 #document
 | <html>
 |   <head>
@@ -431,7 +440,10 @@ Line 1 Col 19 Expected closing tag. Unexpected end of file.
 #data
 <math></br><foo>
 #errors
+(1:1) Missing doctype
 11: HTML end tag “br” in a foreign namespace context.
+(1:7) Unexpected </br> from in body insertion mode
+(1:17) Unexpected EOF
 #document
 | <html>
 |   <head>

From 6030cb6e40a0cf68ae38bf0001bb85b727b80a26 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Sat, 26 Jun 2021 17:44:30 -0400
Subject: [PATCH 40/68] Remove duplicated error

This error line appears to have been duplicated by mistake.
---
 tree-construction/tests19.dat | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tree-construction/tests19.dat b/tree-construction/tests19.dat
index a1897774..2e06fb36 100644
--- a/tree-construction/tests19.dat
+++ b/tree-construction/tests19.dat
@@ -1015,7 +1015,6 @@
 <!doctype html><p><math></p>a
 #errors
 (1,28): unexpected-end-tag
-(1,28): unexpected-end-tag
 #document
 | <!DOCTYPE html>
 | <html>

From fab9829dbd9e0968a753a4840e95085a470a267e Mon Sep 17 00:00:00 2001
From: Markus Unterwaditzer <markus-honeypot@unterwaditzer.net>
Date: Sun, 5 Dec 2021 03:06:11 +0100
Subject: [PATCH 41/68] Add new test for script data

See https://github.com/untitaker/html5gum/pull/15
---
 tokenizer/domjs.test | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tokenizer/domjs.test b/tokenizer/domjs.test
index 1373b27f..1a0824d7 100644
--- a/tokenizer/domjs.test
+++ b/tokenizer/domjs.test
@@ -324,7 +324,12 @@
             "errors":[
                 { "code": "eof-in-cdata", "line": 1, "col": 6 }
             ]
+        },
+        {
+            "description": "HTML tag in script data",
+            "input": "<b>hello world</b>",
+            "initialStates": ["Script data state"],
+            "output": [["Character", "<b>hello world</b>"]]
         }
-
     ]
 }

From ba61e5c12795102c4f462d76b5d2de6f6c038a0f Mon Sep 17 00:00:00 2001
From: Simon Pieters <zcorpan@gmail.com>
Date: Thu, 10 Feb 2022 15:30:43 +0100
Subject: [PATCH 42/68] Test parsing of the <search> element

See https://github.com/whatwg/html/pull/7320
---
 tree-construction/search-element.dat | 46 ++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)
 create mode 100644 tree-construction/search-element.dat

diff --git a/tree-construction/search-element.dat b/tree-construction/search-element.dat
new file mode 100644
index 00000000..2866d7ec
--- /dev/null
+++ b/tree-construction/search-element.dat
@@ -0,0 +1,46 @@
+#data
+<!doctype html><p>foo<search>bar<p>baz
+#errors
+(1,38): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       "foo"
+|     <search>
+|       "bar"
+|       <p>
+|         "baz"
+
+#data
+<!doctype html><search><p>foo</search>bar
+#errors
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <search>
+|       <p>
+|         "foo"
+|     "bar"
+
+#data
+<!DOCTYPE html>xxx<svg><x><g><a><search><b>
+#errors
+ * (1,44) unexpected HTML-like start tag token in foreign content
+ * (1,44) unexpected end of file
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     "xxx"
+|     <svg svg>
+|       <svg x>
+|         <svg g>
+|           <svg a>
+|             <svg search>
+|     <b>

From 8fbcde0b9fd5973543710fbc810fc232c7aa913f Mon Sep 17 00:00:00 2001
From: Simon Pieters <zcorpan@gmail.com>
Date: Fri, 11 Feb 2022 02:20:22 +0100
Subject: [PATCH 43/68] Test 'has a p element in button scope' for all relevant
 start tags

---
 tree-construction/tests20.dat | 260 ++++++++++++++++++++++++++++++++++
 1 file changed, 260 insertions(+)

diff --git a/tree-construction/tests20.dat b/tree-construction/tests20.dat
index 1bfd130e..30bc4084 100644
--- a/tree-construction/tests20.dat
+++ b/tree-construction/tests20.dat
@@ -25,6 +25,32 @@
 |       <button>
 |         <address>
 
+#data
+<!doctype html><p><button><article>
+#errors
+(1,36): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <article>
+
+#data
+<!doctype html><p><button><aside>
+#errors
+(1,34): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <aside>
+
 #data
 <!doctype html><p><button><blockquote>
 #errors
@@ -38,6 +64,175 @@
 |       <button>
 |         <blockquote>
 
+#data
+<!doctype html><p><button><center>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <center>
+
+#data
+<!doctype html><p><button><details>
+#errors
+(1,36): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <details>
+
+#data
+<!doctype html><p><button><dialog>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <dialog>
+
+#data
+<!doctype html><p><button><dir>
+#errors
+(1,32): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <dir>
+
+#data
+<!doctype html><p><button><div>
+#errors
+(1,32): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <div>
+
+#data
+<!doctype html><p><button><dl>
+#errors
+(1,31): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <dl>
+
+#data
+<!doctype html><p><button><fieldset>
+#errors
+(1,37): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <fieldset>
+
+#data
+<!doctype html><p><button><figcaption>
+#errors
+(1,39): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <figcaption>
+
+#data
+<!doctype html><p><button><figure>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <figure>
+
+#data
+<!doctype html><p><button><footer>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <footer>
+
+#data
+<!doctype html><p><button><header>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <header>
+
+#data
+<!doctype html><p><button><hgroup>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <hgroup>
+
+#data
+<!doctype html><p><button><main>
+#errors
+(1,33): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <main>
+
 #data
 <!doctype html><p><button><menu>
 #errors
@@ -51,6 +246,32 @@
 |       <button>
 |         <menu>
 
+#data
+<!doctype html><p><button><nav>
+#errors
+(1,32): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <nav>
+
+#data
+<!doctype html><p><button><ol>
+#errors
+(1,31): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <ol>
+
 #data
 <!doctype html><p><button><p>
 #errors
@@ -64,6 +285,45 @@
 |       <button>
 |         <p>
 
+#data
+<!doctype html><p><button><search>
+#errors
+(1,35): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <search>
+
+#data
+<!doctype html><p><button><section>
+#errors
+(1,36): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <section>
+
+#data
+<!doctype html><p><button><summary>
+#errors
+(1,36): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <button>
+|         <summary>
+
 #data
 <!doctype html><p><button><ul>
 #errors

From 457a78ac7bc2e911763881cb5862185fa7a8ac85 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Felix=20B=C3=B6hm?= <188768+fb55@users.noreply.github.com>
Date: Fri, 11 Mar 2022 08:47:40 +0000
Subject: [PATCH 44/68] Fix tokenizer EOF error positions (#144)

---
 tokenizer/test3.test | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index 814482c4..901a581e 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -10,7 +10,7 @@
 "input":"",
 "output":[],
 "errors":[
-    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+    { "code": "eof-in-cdata", "line": 1, "col": 1 }
 ]},
 
 {"description":"\\u0009",
@@ -36,7 +36,7 @@
 "input":"\u000A",
 "output":[["Character", "\u000A"]],
 "errors":[
-    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+    { "code": "eof-in-cdata", "line": 2, "col": 1 }
 ]},
 
 {"description":"\\u000B",
@@ -443,7 +443,7 @@
 "input":";\uDBC0\uDC00",
 "output":[["Character", ";\uDBC0\uDC00"]],
 "errors":[
-    { "code": "eof-in-cdata", "line": 1, "col": 3 }
+    { "code": "eof-in-cdata", "line": 1, "col": 4 }
 ]},
 
 {"description":"<",
@@ -1325,28 +1325,28 @@
 "input":"<!----! >",
 "output":[["Comment", "--! >"]],
 "errors":[
-    { "code": "eof-in-comment", "line": 1, "col": 9 }
+    { "code": "eof-in-comment", "line": 1, "col": 10 }
 ]},
 
 {"description":"<!----!LF>",
 "input":"<!----!\n>",
 "output":[["Comment", "--!\n>"]],
 "errors":[
-    { "code": "eof-in-comment", "line": 1, "col": 9 }
+    { "code": "eof-in-comment", "line": 2, "col": 2 }
 ]},
 
 {"description":"<!----!CR>",
 "input":"<!----!\r>",
 "output":[["Comment", "--!\n>"]],
 "errors":[
-    { "code": "eof-in-comment", "line": 1, "col": 9 }
+    { "code": "eof-in-comment", "line": 2, "col": 2 }
 ]},
 
 {"description":"<!----!CRLF>",
 "input":"<!----!\r\n>",
 "output":[["Comment", "--!\n>"]],
 "errors":[
-    { "code": "eof-in-comment", "line": 1, "col": 9 }
+    { "code": "eof-in-comment", "line": 2, "col": 2 }
 ]},
 
 {"description":"<!----!a",
@@ -11227,7 +11227,7 @@
 "input":"\uDBC0\uDC00",
 "output":[["Character", "\uDBC0\uDC00"]],
 "errors":[
-    { "code": "eof-in-cdata", "line": 1, "col": 2 }
+    { "code": "eof-in-cdata", "line": 1, "col": 3 }
 ]}
 
 ]}

From 56d589a8b4b44c0f2349866151466142e09f3e3c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Felix=20B=C3=B6hm?= <188768+fb55@users.noreply.github.com>
Date: Mon, 28 Feb 2022 11:39:33 +0000
Subject: [PATCH 45/68] Add test for form in template

Upstreamed from https://github.com/inikulin/parse5/issues/40
---
 tree-construction/template.dat | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index 2d97183e..e0bec773 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1611,3 +1611,17 @@ eof table
 |           <a>
 |           <table>
 |   <body>
+
+#data
+<template><form><input name="q"></form><div>second</div></template>
+#document-fragment
+template
+#errors
+#document
+| <template>
+|   content
+|     <form>
+|       <input>
+|         name="q"
+|     <div>
+|       "second"

From 538a6cd2a014eff08a35964a1995643b63fc02b9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Felix=20B=C3=B6hm?= <188768+fb55@users.noreply.github.com>
Date: Mon, 28 Feb 2022 11:51:20 +0000
Subject: [PATCH 46/68] Add test for << in comment

See https://github.com/inikulin/parse5/issues/325
---
 tokenizer/test1.test | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tokenizer/test1.test b/tokenizer/test1.test
index cb0eb48a..5323fbbe 100644
--- a/tokenizer/test1.test
+++ b/tokenizer/test1.test
@@ -149,6 +149,10 @@
 "input":"<!-- <test-->",
 "output":[["Comment", " <test"]]},
 
+{"description":"<< in comment",
+"input":"<!--<<-->",
+"output":[["Comment", "<<"]]},
+
 {"description":"<! in comment",
 "input":"<!-- <!test-->",
 "output":[["Comment", " <!test"]]},

From 9f14a9a1e9bde3109e41732e53f1306b4e448d91 Mon Sep 17 00:00:00 2001
From: Felix <188768+fb55@users.noreply.github.com>
Date: Mon, 9 May 2022 08:48:22 +0100
Subject: [PATCH 47/68] Add test for `</button>` closing `<p>` (#146)

Fixes #143
---
 tree-construction/tests20.dat | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tree-construction/tests20.dat b/tree-construction/tests20.dat
index 30bc4084..79ce702c 100644
--- a/tree-construction/tests20.dat
+++ b/tree-construction/tests20.dat
@@ -508,6 +508,21 @@
 |       <button>
 |         <p>
 
+#data
+<!doctype html><button><p></button>x
+#errors
+(1,35): end-tag-too-early
+(1,36): expected-named-closing-tag-but-got-eof
+(1,36): expected-closing-tag-but-got-eof
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+|     <button>
+|       <p>
+|     "x"
+
 #data
 <!doctype html><address><button></address>a
 #errors

From e3e6e150d4e1ade63d9d951381921a1fa31c25a2 Mon Sep 17 00:00:00 2001
From: Stephen Checkoway <s@pahtak.org>
Date: Thu, 9 Jun 2022 17:22:26 -0400
Subject: [PATCH 48/68] Fix test for `</button>` closing `<p>`

The document `<!doctype html><button><p></button>x` has no errors.
---
 tree-construction/tests20.dat | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tree-construction/tests20.dat b/tree-construction/tests20.dat
index 79ce702c..32bf740c 100644
--- a/tree-construction/tests20.dat
+++ b/tree-construction/tests20.dat
@@ -511,9 +511,6 @@
 #data
 <!doctype html><button><p></button>x
 #errors
-(1,35): end-tag-too-early
-(1,36): expected-named-closing-tag-but-got-eof
-(1,36): expected-closing-tag-but-got-eof
 #document
 | <!DOCTYPE html>
 | <html>

From 038c06635ae54f700fee3154acb4d45fb3dcae8d Mon Sep 17 00:00:00 2001
From: Alexander Akait <4567934+alexander-akait@users.noreply.github.com>
Date: Thu, 15 Sep 2022 22:12:50 +0300
Subject: [PATCH 49/68] Test spec change: Remove parse error for
 <template><tr></tr> </template>

See https://github.com/whatwg/html/pull/8271

Co-authored-by: Simon Pieters <zcorpan@gmail.com>
---
 tree-construction/template.dat | 54 ++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index e0bec773..a154f613 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1625,3 +1625,57 @@ template
 |         name="q"
 |     <div>
 |       "second"
+
+#data
+<!DOCTYPE HTML><template><tr><td>cell</td></tr></template>
+#document-fragment
+template
+#errors
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|     <template>
+|       content
+|         <tr>
+|           <td>
+|             "cell"
+|   <body>
+
+#data
+<!DOCTYPE HTML><template> <tr> <td>cell</td> </tr> </template>
+#document-fragment
+template
+#errors
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|     <template>
+|       content
+|         " "
+|         <tr>
+|           " "
+|           <td>
+|             "cell"
+|           " "
+|         " "
+|   <body>
+
+#data
+<!DOCTYPE HTML><template><tr><td>cell</td></tr>a</template>
+#document-fragment
+template
+#errors
+(1,59): foster-parenting-character
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|     <template>
+|       content
+|         <tr>
+|           <td>
+|             "cell"
+|         "a"
+|   <body>

From dd0d8157f15ebf35655cc0c8df2d476cda3ceba2 Mon Sep 17 00:00:00 2001
From: Alexander Akait <4567934+alexander-akait@users.noreply.github.com>
Date: Thu, 20 Oct 2022 15:38:36 +0300
Subject: [PATCH 50/68] test: fix <template> tests to not use document-fragment

---
 tree-construction/template.dat | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index a154f613..794a88ef 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1628,8 +1628,6 @@ template
 
 #data
 <!DOCTYPE HTML><template><tr><td>cell</td></tr></template>
-#document-fragment
-template
 #errors
 #document
 | <!DOCTYPE html>
@@ -1644,8 +1642,6 @@ template
 
 #data
 <!DOCTYPE HTML><template> <tr> <td>cell</td> </tr> </template>
-#document-fragment
-template
 #errors
 #document
 | <!DOCTYPE html>
@@ -1664,8 +1660,6 @@ template
 
 #data
 <!DOCTYPE HTML><template><tr><td>cell</td></tr>a</template>
-#document-fragment
-template
 #errors
 (1,59): foster-parenting-character
 #document

From 03e6c3250a569af63c48d2d09b3f70270626c3b6 Mon Sep 17 00:00:00 2001
From: Mike Dalessio <mike.dalessio@gmail.com>
Date: Wed, 25 May 2022 14:42:51 -0400
Subject: [PATCH 51/68] Correct the test case from 56d589a to match format
 description

The relevant formatting rule from the tree-construction README:

> Each test must begin with a string "\#data" followed by a newline (LF).
> All subsequent lines until a line that says "\#errors" are the test data
> and must be passed to the system being tested unchanged, except with the
> final newline (on the last line) removed.
>
> Then there must be a line that says "\#errors".

Commit 56d589a placed `#errors` after `#document-fragment`
---
 tree-construction/template.dat | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index 794a88ef..5a904d4d 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1614,9 +1614,9 @@ eof table
 
 #data
 <template><form><input name="q"></form><div>second</div></template>
+#errors
 #document-fragment
 template
-#errors
 #document
 | <template>
 |   content

From 95417e63a22e6624013558fd5a7d44d0265491b9 Mon Sep 17 00:00:00 2001
From: Mike Dalessio <mike.dalessio@gmail.com>
Date: Mon, 13 Mar 2023 21:38:40 -0400
Subject: [PATCH 52/68] ci: add a skeleton github actions workflow for
 downstream projects

See #141
---
 .github/workflows/downstream.yml | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 .github/workflows/downstream.yml

diff --git a/.github/workflows/downstream.yml b/.github/workflows/downstream.yml
new file mode 100644
index 00000000..4489d2fe
--- /dev/null
+++ b/.github/workflows/downstream.yml
@@ -0,0 +1,21 @@
+name: downstream
+
+concurrency:
+  group: "${{github.workflow}}-${{github.ref}}"
+  cancel-in-progress: true
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - master
+  pull_request:
+    types: [opened, synchronize]
+    branches:
+      - '*'
+
+jobs:
+  skeleton:
+    runs-on: ubuntu-latest
+    steps:
+      - run: echo hello world

From 4f45c0211cf1d1f1af319470f77851f60f29914c Mon Sep 17 00:00:00 2001
From: Anne van Kesteren <annevk@annevk.nl>
Date: Fri, 31 Mar 2023 10:57:07 +0200
Subject: [PATCH 53/68] Upstream WebKit tests

I diffed https://github.com/WebKit/WebKit/tree/main/LayoutTests/html5lib/resources against https://github.com/html5lib/html5lib-tests/tree/master/tree-construction and these were the missing tests.
---
 tree-construction/template.dat | 20 ++++++++++++++++++++
 tree-construction/webkit01.dat | 24 ++++++++++++++++++++++++
 tree-construction/webkit02.dat | 21 +++++++++++++++++++++
 3 files changed, 65 insertions(+)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index 5a904d4d..e869378f 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1572,6 +1572,26 @@ no doctype
 |         "Foo"
 |   <body>
 
+#data
+<html><head></head><template></template><head>
+#errors
+#document
+| <html>
+|   <head>
+|     <template>
+|       content
+|   <body>
+
+#data
+<body></body><template>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <template>
+|       content
+
 #data
 <!DOCTYPE HTML><dummy><table><template><table><template><table><script>
 #errors
diff --git a/tree-construction/webkit01.dat b/tree-construction/webkit01.dat
index 3bb3bb90..44150ce1 100644
--- a/tree-construction/webkit01.dat
+++ b/tree-construction/webkit01.dat
@@ -359,6 +359,30 @@ console.log("FOO<span>BAR</span>BAZ");
 |     <!--  Hi there  -->
 | <!--  Again  -->
 
+#data
+<html><body></body>
+   <!-- Hi there --></html>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     "
+   "
+|   <!--  Hi there  -->
+
+#data
+<html><body></body></html>
+   <!-- Hi there -->
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     "
+   "
+| <!--  Hi there  -->
+
 #data
 <html><body><ruby><div><rp>xx</rp></div></ruby></body></html>
 #errors
diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index dddfe2a7..e5eb00ac 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -159,6 +159,27 @@
 |     <input>
 |     <table>
 
+#data
+<b><em><dcell><postfield><postfield><postfield><postfield><missing_glyph><missing_glyph><missing_glyph><missing_glyph><hkern><aside></b></em>
+#errors
+#document-fragment
+div
+#document
+| <b>
+|   <em>
+|     <dcell>
+|       <postfield>
+|         <postfield>
+|           <postfield>
+|             <postfield>
+|               <missing_glyph>
+|                 <missing_glyph>
+|                   <missing_glyph>
+|                     <missing_glyph>
+|                       <hkern>
+| <aside>
+|   <b>
+
 #data
 <b><em><foo><foo><aside></b>
 #errors

From 1314b094ed2c4e418404576c24b88aae2ea4e0c1 Mon Sep 17 00:00:00 2001
From: Anne van Kesteren <annevk@annevk.nl>
Date: Fri, 31 Mar 2023 12:37:57 +0200
Subject: [PATCH 54/68] Correct another <template> test

This also has incorrect #document-fragment in it, same as #151.
---
 tree-construction/template.dat | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index e869378f..858b063a 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1635,8 +1635,6 @@ eof table
 #data
 <template><form><input name="q"></form><div>second</div></template>
 #errors
-#document-fragment
-template
 #document
 | <template>
 |   content

From b36d6bd53336f1fa9e443b4b62aad38b35e87e1c Mon Sep 17 00:00:00 2001
From: Anne van Kesteren <annevk@annevk.nl>
Date: Wed, 5 Apr 2023 09:23:26 +0200
Subject: [PATCH 55/68] Upstream new WebKit Math and namespaced attribute tests

From https://github.com/WebKit/WebKit/commit/83e92a3d7ab3fcddacde92972ea1aade639459d3.
---
 tree-construction/webkit02.dat | 36 ++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index e5eb00ac..864b4b39 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -355,3 +355,39 @@ select
 |   <body>
 |     <plaintext>
 |       "<div>foo</div>"
+
+#data
+ <svg xml:base xml:lang xml:space xml:baaah definitionurl>
+ #errors
+ #document
+ | <html>
+ |   <head>
+ |   <body>
+ |     <svg svg>
+ |       definitionurl=""
+ |       xml lang=""
+ |       xml space=""
+ |       xml:baaah=""
+ |       xml:base=""
+
+ #data
+ <math definitionurl xlink:title xlink:show>
+ #errors
+ #document
+ | <html>
+ |   <head>
+ |   <body>
+ |     <math math>
+ |       definitionURL=""
+ |       xlink show=""
+ |       xlink title=""
+
+ #data
+ <math DEFINITIONURL>
+ #errors
+ #document
+ | <html>
+ |   <head>
+ |   <body>
+ |     <math math>
+ |       definitionURL=""

From 4e82e3d6e6f26b3fad9fc472cd8eca44800e547b Mon Sep 17 00:00:00 2001
From: Anne van Kesteren <annevk@annevk.nl>
Date: Fri, 31 Mar 2023 15:36:26 +0200
Subject: [PATCH 56/68] Remove duplicate tests

As identified in https://github.com/web-platform-tests/wpt/pull/39305.
---
 tree-construction/foreign-fragment.dat  |  8 ----
 tree-construction/scriptdata01.dat      | 13 ------
 tree-construction/template.dat          | 25 -----------
 tree-construction/tests1.dat            | 31 --------------
 tree-construction/tests19.dat           | 55 -------------------------
 tree-construction/tests20.dat           | 13 ------
 tree-construction/tests21.dat           | 27 ------------
 tree-construction/tests_innerHTML_1.dat | 44 --------------------
 8 files changed, 216 deletions(-)

diff --git a/tree-construction/foreign-fragment.dat b/tree-construction/foreign-fragment.dat
index 448d9c8e..e562c6b8 100644
--- a/tree-construction/foreign-fragment.dat
+++ b/tree-construction/foreign-fragment.dat
@@ -478,14 +478,6 @@ svg desc
 #document
 | <div>
 
-#data
-<figure></figure>
-#errors
-#document-fragment
-svg desc
-#document
-| <figure>
-
 #data
 <plaintext><foo>
 #errors
diff --git a/tree-construction/scriptdata01.dat b/tree-construction/scriptdata01.dat
index e5708589..6abcb657 100644
--- a/tree-construction/scriptdata01.dat
+++ b/tree-construction/scriptdata01.dat
@@ -172,19 +172,6 @@ FOO<script>'<!-->'</script>BAR
 |       "'<!-->'"
 |     "BAR"
 
-#data
-FOO<script>'<!-->'</script>BAR
-#errors
-(1,3): expected-doctype-but-got-chars
-#document
-| <html>
-|   <head>
-|   <body>
-|     "FOO"
-|     <script>
-|       "'<!-->'"
-|     "BAR"
-
 #data
 FOO<script>'<!-- potato'</script>BAR
 #errors
diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index 858b063a..69c9b021 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -867,21 +867,6 @@ no doctype
 |         <link>
 |         <td>
 
-#data
-<body><template><template><tr></tr></template><td></td></template>
-#errors
-no doctype
-#document
-| <html>
-|   <head>
-|   <body>
-|     <template>
-|       content
-|         <template>
-|           content
-|             <tr>
-|         <td>
-
 #data
 <body><table><colgroup><template><col></col></template></colgroup></table></body>
 #errors
@@ -1582,16 +1567,6 @@ no doctype
 |       content
 |   <body>
 
-#data
-<body></body><template>
-#errors
-#document
-| <html>
-|   <head>
-|   <body>
-|     <template>
-|       content
-
 #data
 <!DOCTYPE HTML><dummy><table><template><table><template><table><script>
 #errors
diff --git a/tree-construction/tests1.dat b/tree-construction/tests1.dat
index 86632deb..e80e6401 100644
--- a/tree-construction/tests1.dat
+++ b/tree-construction/tests1.dat
@@ -1433,24 +1433,6 @@ Line1<br>Line2<br>Line3<br>Line4
 |     <meta>
 |     <p>
 
-#data
-<b><table><td><i></table>
-#errors
-(1,3): expected-doctype-but-got-start-tag
-(1,14): unexpected-cell-in-table-body
-(1,25): unexpected-cell-end-tag
-(1,25): expected-closing-tag-but-got-eof
-#document
-| <html>
-|   <head>
-|   <body>
-|     <b>
-|       <table>
-|         <tbody>
-|           <tr>
-|             <td>
-|               <i>
-
 #data
 <b><table><td></b><i></table>
 #errors
@@ -1547,19 +1529,6 @@ Line1<br>Line2<br>Line3<br>Line4
 |     <p>
 |     <p>
 
-#data
-<p><hr></p>
-#errors
-(1,3): expected-doctype-but-got-start-tag
-(1,11): unexpected-end-tag
-#document
-| <html>
-|   <head>
-|   <body>
-|     <p>
-|     <hr>
-|     <p>
-
 #data
 <select><b><option><select><option></b></select>
 #errors
diff --git a/tree-construction/tests19.dat b/tree-construction/tests19.dat
index 2e06fb36..20cdeabc 100644
--- a/tree-construction/tests19.dat
+++ b/tree-construction/tests19.dat
@@ -387,19 +387,6 @@
 |     <select>
 |       <option>
 
-#data
-<!doctype html><select><option></optgroup>
-#errors
-(1,42): unexpected-end-tag-in-select
-(1,42): eof-in-select
-#document
-| <!DOCTYPE html>
-| <html>
-|   <head>
-|   <body>
-|     <select>
-|       <option>
-
 #data
 <!doctype html><dd><optgroup><dd>
 #errors
@@ -1235,48 +1222,6 @@
 |           "c"
 |     <table>
 
-#data
-<!doctype html><table><i>a<b>b<div>c<a>d</i>e</b>f
-#errors
-(1,25): foster-parenting-start-tag
-(1,26): foster-parenting-character
-(1,29): foster-parenting-start-tag
-(1,30): foster-parenting-character
-(1,35): foster-parenting-start-tag
-(1,36): foster-parenting-character
-(1,39): foster-parenting-start-tag
-(1,40): foster-parenting-character
-(1,44): foster-parenting-end-tag
-(1,44): adoption-agency-1.3
-(1,44): adoption-agency-1.3
-(1,45): foster-parenting-character
-(1,49): foster-parenting-end-tag
-(1,44): adoption-agency-1.3
-(1,44): adoption-agency-1.3
-(1,50): foster-parenting-character
-(1,50): eof-in-table
-#document
-| <!DOCTYPE html>
-| <html>
-|   <head>
-|   <body>
-|     <i>
-|       "a"
-|       <b>
-|         "b"
-|     <b>
-|     <div>
-|       <b>
-|         <i>
-|           "c"
-|           <a>
-|             "d"
-|         <a>
-|           "e"
-|       <a>
-|         "f"
-|     <table>
-
 #data
 <!doctype html><table><i>a<div>b<tr>c<b>d</i>e
 #errors
diff --git a/tree-construction/tests20.dat b/tree-construction/tests20.dat
index 32bf740c..80c57d1a 100644
--- a/tree-construction/tests20.dat
+++ b/tree-construction/tests20.dat
@@ -533,19 +533,6 @@
 |       <button>
 |     "a"
 
-#data
-<!doctype html><address><button></address>a
-#errors
-(1,42): end-tag-too-early
-#document
-| <!DOCTYPE html>
-| <html>
-|   <head>
-|   <body>
-|     <address>
-|       <button>
-|     "a"
-
 #data
 <p><table></p>
 #errors
diff --git a/tree-construction/tests21.dat b/tree-construction/tests21.dat
index d52ab8cc..a926b138 100644
--- a/tree-construction/tests21.dat
+++ b/tree-construction/tests21.dat
@@ -52,21 +52,6 @@
 |     <svg svg>
 |       "foo"
 
-#data
-<svg><![CDATA[foo
-#errors
-(1,5): expected-doctype-but-got-start-tag
-(1:18) eof-in-cdata
-(1,17): expected-closing-tag-but-got-eof
-#new-errors
-(1:18) eof-in-cdata
-#document
-| <html>
-|   <head>
-|   <body>
-|     <svg svg>
-|       "foo"
-
 #data
 <svg><![CDATA[
 #errors
@@ -104,18 +89,6 @@
 |     <svg svg>
 |       "]] >"
 
-#data
-<svg><![CDATA[]] >]]>
-#errors
-(1,5): expected-doctype-but-got-start-tag
-(1,21): expected-closing-tag-but-got-eof
-#document
-| <html>
-|   <head>
-|   <body>
-|     <svg svg>
-|       "]] >"
-
 #data
 <svg><![CDATA[]]
 #errors
diff --git a/tree-construction/tests_innerHTML_1.dat b/tree-construction/tests_innerHTML_1.dat
index 54f43684..1a37ee52 100644
--- a/tree-construction/tests_innerHTML_1.dat
+++ b/tree-construction/tests_innerHTML_1.dat
@@ -110,16 +110,6 @@ table
 #document
 | <a>
 
-#data
-<a>
-#errors
-(1,3): unexpected-start-tag-implies-table-voodoo
-(1,3): eof-in-table
-#document-fragment
-table
-#document
-| <a>
-
 #data
 <a><caption>a
 #errors
@@ -502,30 +492,6 @@ tbody
 | <tr>
 |   <td>
 
-#data
-<a><td>
-#errors
-(1,3): unexpected-start-tag-implies-table-voodoo
-(1,7): unexpected-cell-in-table-body
-#document-fragment
-tbody
-#document
-| <a>
-| <tr>
-|   <td>
-
-#data
-<a><td>
-#errors
-(1,3): unexpected-start-tag-implies-table-voodoo
-(1,7): unexpected-cell-in-table-body
-#document-fragment
-tbody
-#document
-| <a>
-| <tr>
-|   <td>
-
 #data
 <td><table><tbody><a><tr>
 #errors
@@ -648,16 +614,6 @@ tr
 |   <table>
 | <td>
 
-#data
-<td><table></table><td>
-#errors
-#document-fragment
-tr
-#document
-| <td>
-|   <table>
-| <td>
-
 #data
 <caption><a>
 #errors

From ad8f5f68fac31efb20f67b19aee7819164ffd1ae Mon Sep 17 00:00:00 2001
From: Sam Sneddon <gsnedders@apple.com>
Date: Thu, 6 Apr 2023 13:24:06 +0100
Subject: [PATCH 57/68] Remove leading whitespcae before test lines

---
 tree-construction/webkit02.dat | 64 +++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index 864b4b39..a05f8040 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -357,37 +357,37 @@ select
 |       "<div>foo</div>"
 
 #data
- <svg xml:base xml:lang xml:space xml:baaah definitionurl>
- #errors
- #document
- | <html>
- |   <head>
- |   <body>
- |     <svg svg>
- |       definitionurl=""
- |       xml lang=""
- |       xml space=""
- |       xml:baaah=""
- |       xml:base=""
+<svg xml:base xml:lang xml:space xml:baaah definitionurl>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <svg svg>
+|       definitionurl=""
+|       xml lang=""
+|       xml space=""
+|       xml:baaah=""
+|       xml:base=""
 
- #data
- <math definitionurl xlink:title xlink:show>
- #errors
- #document
- | <html>
- |   <head>
- |   <body>
- |     <math math>
- |       definitionURL=""
- |       xlink show=""
- |       xlink title=""
+#data
+<math definitionurl xlink:title xlink:show>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <math math>
+|       definitionURL=""
+|       xlink show=""
+|       xlink title=""
 
- #data
- <math DEFINITIONURL>
- #errors
- #document
- | <html>
- |   <head>
- |   <body>
- |     <math math>
- |       definitionURL=""
+#data
+<math DEFINITIONURL>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <math math>
+|       definitionURL=""

From 251b2efff71c914dd184c868fdd70b9d9e54aed4 Mon Sep 17 00:00:00 2001
From: Mike Dalessio <mike.dalessio@gmail.com>
Date: Tue, 11 Apr 2023 13:32:11 -0400
Subject: [PATCH 58/68] Correct errors in tests recently introduced

And revert a template test change from 1314b094.
---
 tree-construction/template.dat | 5 +++++
 tree-construction/webkit01.dat | 2 ++
 tree-construction/webkit02.dat | 9 +++++++++
 3 files changed, 16 insertions(+)

diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index 69c9b021..396362b8 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1560,6 +1560,9 @@ no doctype
 #data
 <html><head></head><template></template><head>
 #errors
+no doctype
+template-after-head
+head-after-head
 #document
 | <html>
 |   <head>
@@ -1610,6 +1613,8 @@ eof table
 #data
 <template><form><input name="q"></form><div>second</div></template>
 #errors
+#document-fragment
+template
 #document
 | <template>
 |   content
diff --git a/tree-construction/webkit01.dat b/tree-construction/webkit01.dat
index 44150ce1..d30e12e5 100644
--- a/tree-construction/webkit01.dat
+++ b/tree-construction/webkit01.dat
@@ -363,6 +363,7 @@ console.log("FOO<span>BAR</span>BAZ");
 <html><body></body>
    <!-- Hi there --></html>
 #errors
+no-doctype
 #document
 | <html>
 |   <head>
@@ -375,6 +376,7 @@ console.log("FOO<span>BAR</span>BAZ");
 <html><body></body></html>
    <!-- Hi there -->
 #errors
+no-doctype
 #document
 | <html>
 |   <head>
diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index a05f8040..325568e2 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -162,6 +162,9 @@
 #data
 <b><em><dcell><postfield><postfield><postfield><postfield><missing_glyph><missing_glyph><missing_glyph><missing_glyph><hkern><aside></b></em>
 #errors
+unexpected-b-end-tag
+unexpected-em-end-tag
+eof-in-aside
 #document-fragment
 div
 #document
@@ -359,6 +362,8 @@ select
 #data
 <svg xml:base xml:lang xml:space xml:baaah definitionurl>
 #errors
+no-doctype
+eof-in-svg
 #document
 | <html>
 |   <head>
@@ -373,6 +378,8 @@ select
 #data
 <math definitionurl xlink:title xlink:show>
 #errors
+no-doctype
+eof-in-math
 #document
 | <html>
 |   <head>
@@ -385,6 +392,8 @@ select
 #data
 <math DEFINITIONURL>
 #errors
+no-doctype
+eof-in-math
 #document
 | <html>
 |   <head>

From be416cd43a0773ab4687a9c16d1d1ec04f1f6088 Mon Sep 17 00:00:00 2001
From: Sam Sneddon <gsnedders@apple.com>
Date: Mon, 17 Apr 2023 16:57:25 +0100
Subject: [PATCH 59/68] Implement linting for html5lib-tests

This checks that we have the right headers, in the right order, and
checks for both duplicate headers and duplicate tests.
---
 .github/workflows/lint.yml                   |  25 +
 .gitignore                                   |  79 ++
 lint                                         |   6 +
 lint_lib/__init__.py                         |   0
 lint_lib/_vendor-patches/funcparserlib.patch |  24 +
 lint_lib/_vendor/__init__.py                 |   0
 lint_lib/_vendor/funcparserlib/LICENSE       |  18 +
 lint_lib/_vendor/funcparserlib/__init__.py   |   0
 lint_lib/_vendor/funcparserlib/lexer.py      | 211 +++++
 lint_lib/_vendor/funcparserlib/lexer.pyi     |  34 +
 lint_lib/_vendor/funcparserlib/parser.py     | 872 +++++++++++++++++++
 lint_lib/_vendor/funcparserlib/parser.pyi    |  83 ++
 lint_lib/_vendor/funcparserlib/py.typed      |   0
 lint_lib/_vendor/funcparserlib/util.py       |  72 ++
 lint_lib/_vendor/funcparserlib/util.pyi      |   7 +
 lint_lib/_vendor/vendor.txt                  |   1 +
 lint_lib/lint.py                             | 280 ++++++
 lint_lib/parser.py                           | 177 ++++
 pyproject.toml                               |   7 +
 19 files changed, 1896 insertions(+)
 create mode 100644 .github/workflows/lint.yml
 create mode 100644 .gitignore
 create mode 100755 lint
 create mode 100644 lint_lib/__init__.py
 create mode 100644 lint_lib/_vendor-patches/funcparserlib.patch
 create mode 100644 lint_lib/_vendor/__init__.py
 create mode 100644 lint_lib/_vendor/funcparserlib/LICENSE
 create mode 100644 lint_lib/_vendor/funcparserlib/__init__.py
 create mode 100644 lint_lib/_vendor/funcparserlib/lexer.py
 create mode 100644 lint_lib/_vendor/funcparserlib/lexer.pyi
 create mode 100644 lint_lib/_vendor/funcparserlib/parser.py
 create mode 100644 lint_lib/_vendor/funcparserlib/parser.pyi
 create mode 100644 lint_lib/_vendor/funcparserlib/py.typed
 create mode 100644 lint_lib/_vendor/funcparserlib/util.py
 create mode 100644 lint_lib/_vendor/funcparserlib/util.pyi
 create mode 100644 lint_lib/_vendor/vendor.txt
 create mode 100644 lint_lib/lint.py
 create mode 100644 lint_lib/parser.py
 create mode 100644 pyproject.toml

diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
new file mode 100644
index 00000000..99f67c50
--- /dev/null
+++ b/.github/workflows/lint.yml
@@ -0,0 +1,25 @@
+name: lint
+
+concurrency:
+  group: "${{github.workflow}}-${{github.ref}}"
+  cancel-in-progress: true
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - master
+  pull_request:
+    types: [opened, synchronize]
+    branches:
+      - '*'
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+      - run: ./lint
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 00000000..f8b56708
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,79 @@
+# Copyright (c) 2014 GitHub, Inc.
+#
+# Permission is hereby granted,  free of charge,  to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to  use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*,cover
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+
+# Sphinx documentation
+doc/_build/
+
+# PyBuilder
+target/
diff --git a/lint b/lint
new file mode 100755
index 00000000..19b7f50c
--- /dev/null
+++ b/lint
@@ -0,0 +1,6 @@
+#!/usr/bin/env python3
+import sys
+
+import lint_lib.lint as lint
+
+sys.exit(lint.main())
diff --git a/lint_lib/__init__.py b/lint_lib/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/lint_lib/_vendor-patches/funcparserlib.patch b/lint_lib/_vendor-patches/funcparserlib.patch
new file mode 100644
index 00000000..fc294880
--- /dev/null
+++ b/lint_lib/_vendor-patches/funcparserlib.patch
@@ -0,0 +1,24 @@
+diff --git a/lint_lib/_vendor/funcparserlib/parser.py b/lint_lib/_vendor/funcparserlib/parser.py
+index eb2f53f..0f86e6c 100644
+--- a/lint_lib/_vendor/funcparserlib/parser.py
++++ b/lint_lib/_vendor/funcparserlib/parser.py
+@@ -137,19 +137,6 @@ class Parser(object):
+         "('x', 'y')"
+ 
+         ```
+-
+-        !!! Note
+-
+-            You can enable the parsing log this way:
+-
+-            ```python
+-            import logging
+-            logging.basicConfig(level=logging.DEBUG)
+-            import funcparserlib.parser
+-            funcparserlib.parser.debug = True
+-            ```
+-
+-            The way to enable the parsing log may be changed in future versions.
+         """
+         self.name = name
+         return self
diff --git a/lint_lib/_vendor/__init__.py b/lint_lib/_vendor/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/lint_lib/_vendor/funcparserlib/LICENSE b/lint_lib/_vendor/funcparserlib/LICENSE
new file mode 100644
index 00000000..31d3a95b
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/LICENSE
@@ -0,0 +1,18 @@
+Copyright © 2009/2021 Andrey Vlasovskikh
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this
+software and associated documentation files (the "Software"), to deal in the Software
+without restriction, including without limitation the rights to use, copy, modify,
+merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or
+substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+PURPOSE AND NON-INFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
+OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
diff --git a/lint_lib/_vendor/funcparserlib/__init__.py b/lint_lib/_vendor/funcparserlib/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/lint_lib/_vendor/funcparserlib/lexer.py b/lint_lib/_vendor/funcparserlib/lexer.py
new file mode 100644
index 00000000..0a5b5e9e
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/lexer.py
@@ -0,0 +1,211 @@
+# -*- coding: utf-8 -*-
+
+# Copyright © 2009/2021 Andrey Vlasovskikh
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy of this
+# software and associated documentation files (the "Software"), to deal in the Software
+# without restriction, including without limitation the rights to use, copy, modify,
+# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
+# permit persons to whom the Software is furnished to do so, subject to the following
+# conditions:
+#
+# The above copyright notice and this permission notice shall be included in all copies
+# or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
+# PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
+# CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
+# OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+from __future__ import unicode_literals
+
+__all__ = ["make_tokenizer", "TokenSpec", "Token", "LexerError"]
+
+import re
+
+
+class LexerError(Exception):
+    def __init__(self, place, msg):
+        self.place = place
+        self.msg = msg
+
+    def __str__(self):
+        s = "cannot tokenize data"
+        line, pos = self.place
+        return '%s: %d,%d: "%s"' % (s, line, pos, self.msg)
+
+
+class TokenSpec(object):
+    """A token specification for generating a lexer via `make_tokenizer()`."""
+
+    def __init__(self, type, pattern, flags=0):
+        """Initialize a `TokenSpec` object.
+
+        Parameters:
+            type (str): User-defined type of the token (e.g. `"name"`, `"number"`,
+                `"operator"`)
+            pattern (str): Regexp for matching this token type
+            flags (int, optional): Regexp flags, the second argument of `re.compile()`
+        """
+        self.type = type
+        self.pattern = pattern
+        self.flags = flags
+
+    def __repr__(self):
+        return "TokenSpec(%r, %r, %r)" % (self.type, self.pattern, self.flags)
+
+
+class Token(object):
+    """A token object that represents a substring of certain type in your text.
+
+    You can compare tokens for equality using the `==` operator. Tokens also define
+    custom `repr()` and `str()`.
+
+    Attributes:
+        type (str): User-defined type of the token (e.g. `"name"`, `"number"`,
+            `"operator"`)
+        value (str): Text value of the token
+        start (Optional[Tuple[int, int]]): Start position (_line_, _column_)
+        end (Optional[Tuple[int, int]]): End position (_line_, _column_)
+    """
+
+    def __init__(self, type, value, start=None, end=None):
+        """Initialize a `Token` object."""
+        self.type = type
+        self.value = value
+        self.start = start
+        self.end = end
+
+    def __repr__(self):
+        return "Token(%r, %r)" % (self.type, self.value)
+
+    def __eq__(self, other):
+        # FIXME: Case sensitivity is assumed here
+        if other is None:
+            return False
+        else:
+            return self.type == other.type and self.value == other.value
+
+    def _pos_str(self):
+        if self.start is None or self.end is None:
+            return ""
+        else:
+            sl, sp = self.start
+            el, ep = self.end
+            return "%d,%d-%d,%d:" % (sl, sp, el, ep)
+
+    def __str__(self):
+        s = "%s %s '%s'" % (self._pos_str(), self.type, self.value)
+        return s.strip()
+
+    @property
+    def name(self):
+        return self.value
+
+    def pformat(self):
+        return "%s %s '%s'" % (
+            self._pos_str().ljust(20),  # noqa
+            self.type.ljust(14),
+            self.value,
+        )
+
+
+def make_tokenizer(specs):
+    # noinspection GrazieInspection
+    """Make a function that tokenizes text based on the regexp specs.
+
+    Type: `(Sequence[TokenSpec | Tuple]) -> Callable[[str], Iterable[Token]]`
+
+    A token spec is `TokenSpec` instance.
+
+    !!! Note
+
+        For legacy reasons, a token spec may also be a tuple of (_type_, _args_), where
+        _type_ sets the value of `Token.type` for the token, and _args_ are the
+        positional arguments for `re.compile()`: either just (_pattern_,) or
+        (_pattern_, _flags_).
+
+    It returns a tokenizer function that takes a string and returns an iterable of
+    `Token` objects, or raises `LexerError` if it cannot tokenize the string according
+    to its token specs.
+
+    Examples:
+
+    ```pycon
+    >>> tokenize = make_tokenizer([
+    ...     TokenSpec("space", r"\\s+"),
+    ...     TokenSpec("id", r"\\w+"),
+    ...     TokenSpec("op", r"[,!]"),
+    ... ])
+    >>> text = "Hello, World!"
+    >>> [t for t in tokenize(text) if t.type != "space"]  # noqa
+    [Token('id', 'Hello'), Token('op', ','), Token('id', 'World'), Token('op', '!')]
+    >>> text = "Bye?"
+    >>> list(tokenize(text))
+    Traceback (most recent call last):
+        ...
+    lexer.LexerError: cannot tokenize data: 1,4: "Bye?"
+
+    ```
+    """
+    compiled = []
+    for spec in specs:
+        if isinstance(spec, TokenSpec):
+            c = spec.type, re.compile(spec.pattern, spec.flags)
+        else:
+            name, args = spec
+            c = name, re.compile(*args)
+        compiled.append(c)
+
+    def match_specs(s, i, position):
+        line, pos = position
+        for type, regexp in compiled:
+            m = regexp.match(s, i)
+            if m is not None:
+                value = m.group()
+                nls = value.count("\n")
+                n_line = line + nls
+                if nls == 0:
+                    n_pos = pos + len(value)
+                else:
+                    n_pos = len(value) - value.rfind("\n") - 1
+                return Token(type, value, (line, pos + 1), (n_line, n_pos))
+        else:
+            err_line = s.splitlines()[line - 1]
+            raise LexerError((line, pos + 1), err_line)
+
+    def f(s):
+        length = len(s)
+        line, pos = 1, 0
+        i = 0
+        while i < length:
+            t = match_specs(s, i, (line, pos))
+            yield t
+            line, pos = t.end
+            i += len(t.value)
+
+    return f
+
+
+# This is an example of token specs. See also [this article][1] for a
+# discussion of searching for multiline comments using regexps (including `*?`).
+#
+#   [1]: http://ostermiller.org/findcomment.html
+_example_token_specs = [
+    TokenSpec("COMMENT", r"\(\*(.|[\r\n])*?\*\)", re.MULTILINE),
+    TokenSpec("COMMENT", r"\{(.|[\r\n])*?\}", re.MULTILINE),
+    TokenSpec("COMMENT", r"//.*"),
+    TokenSpec("NL", r"[\r\n]+"),
+    TokenSpec("SPACE", r"[ \t\r\n]+"),
+    TokenSpec("NAME", r"[A-Za-z_][A-Za-z_0-9]*"),
+    TokenSpec("REAL", r"[0-9]+\.[0-9]*([Ee][+\-]?[0-9]+)*"),
+    TokenSpec("INT", r"[0-9]+"),
+    TokenSpec("INT", r"\$[0-9A-Fa-f]+"),
+    TokenSpec("OP", r"(\.\.)|(<>)|(<=)|(>=)|(:=)|[;,=\(\):\[\]\.+\-<>\*/@\^]"),
+    TokenSpec("STRING", r"'([^']|(''))*'"),
+    TokenSpec("CHAR", r"#[0-9]+"),
+    TokenSpec("CHAR", r"#\$[0-9A-Fa-f]+"),
+]
+# tokenize = make_tokenizer(_example_token_specs)
diff --git a/lint_lib/_vendor/funcparserlib/lexer.pyi b/lint_lib/_vendor/funcparserlib/lexer.pyi
new file mode 100644
index 00000000..b1e88fe7
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/lexer.pyi
@@ -0,0 +1,34 @@
+from typing import Tuple, Optional, Callable, Iterable, Text, Sequence
+
+_Place = Tuple[int, int]
+_Spec = Tuple[Text, Tuple]
+
+class Token:
+    type: Text
+    value: Text
+    start: Optional[_Place]
+    end: Optional[_Place]
+    name: Text
+    def __init__(
+        self,
+        type: Text,
+        value: Text,
+        start: Optional[_Place] = ...,
+        end: Optional[_Place] = ...,
+    ) -> None: ...
+    def pformat(self) -> Text: ...
+
+class TokenSpec:
+    name: Text
+    pattern: Text
+    flags: int
+    def __init__(self, name: Text, pattern: Text, flags: int = ...) -> None: ...
+
+def make_tokenizer(
+    specs: Sequence[TokenSpec | _Spec],
+) -> Callable[[Text], Iterable[Token]]: ...
+
+class LexerError(Exception):
+    place: Tuple[int, int]
+    msg: Text
+    def __init__(self, place: _Place, msg: Text) -> None: ...
diff --git a/lint_lib/_vendor/funcparserlib/parser.py b/lint_lib/_vendor/funcparserlib/parser.py
new file mode 100644
index 00000000..0bbac7f5
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/parser.py
@@ -0,0 +1,872 @@
+# -*- coding: utf-8 -*-
+
+# Copyright © 2009/2021 Andrey Vlasovskikh
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy of this
+# software and associated documentation files (the "Software"), to deal in the Software
+# without restriction, including without limitation the rights to use, copy, modify,
+# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
+# permit persons to whom the Software is furnished to do so, subject to the following
+# conditions:
+#
+# The above copyright notice and this permission notice shall be included in all copies
+# or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
+# PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
+# CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
+# OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+"""Functional parsing combinators.
+
+Parsing combinators define an internal domain-specific language (DSL) for describing
+the parsing rules of a grammar. The DSL allows you to start with a few primitive
+parsers, then combine your parsers to get more complex ones, and finally cover
+the whole grammar you want to parse.
+
+The structure of the language:
+
+* Class `Parser`
+    * All the primitives and combinators of the language return `Parser` objects
+    * It defines the main `Parser.parse(tokens)` method
+* Primitive parsers
+    * `tok(type, value)`, `a(value)`, `some(pred)`, `forward_decl()`, `finished`
+* Parser combinators
+    * `p1 + p2`, `p1 | p2`, `p >> f`, `-p`, `maybe(p)`, `many(p)`, `oneplus(p)`,
+      `skip(p)`
+* Abstraction
+    * Use regular Python variables `p = ...  # Expression of type Parser` to define new
+      rules (non-terminals) of your grammar
+
+Every time you apply one of the combinators, you get a new `Parser` object. In other
+words, the set of `Parser` objects is closed under the means of combination.
+
+!!! Note
+
+    We took the parsing combinators language from the book [Introduction to Functional
+    Programming][1] and translated it from ML into Python.
+
+  [1]: https://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/
+"""
+
+from __future__ import unicode_literals
+
+__all__ = [
+    "some",
+    "a",
+    "tok",
+    "many",
+    "pure",
+    "finished",
+    "maybe",
+    "skip",
+    "oneplus",
+    "forward_decl",
+    "NoParseError",
+    "Parser",
+]
+
+import sys
+import logging
+import warnings
+
+from lint_lib._vendor.funcparserlib.lexer import Token
+
+log = logging.getLogger("funcparserlib")
+
+debug = False
+if sys.version_info < (3,):
+    string_types = (str, unicode)  # noqa
+else:
+    string_types = str
+
+
+class Parser(object):
+    """A parser object that can parse a sequence of tokens or can be combined with
+    other parsers using `+`, `|`, `>>`, `many()`, and other parsing combinators.
+
+    Type: `Parser[A, B]`
+
+    The generic variables in the type are: `A` — the type of the tokens in the
+    sequence to parse,`B` — the type of the parsed value.
+
+    In order to define a parser for your grammar:
+
+    1. You start with primitive parsers by calling `a(value)`, `some(pred)`,
+       `forward_decl()`, `finished`
+    2. You use parsing combinators `p1 + p2`, `p1 | p2`, `p >> f`, `many(p)`, and
+       others to combine parsers into a more complex parser
+    3. You can assign complex parsers to variables to define names that correspond to
+       the rules of your grammar
+
+    !!! Note
+
+        The constructor `Parser.__init__()` is considered **internal** and may be
+        changed in future versions. Use primitive parsers and parsing combinators to
+        construct new parsers.
+    """
+
+    def __init__(self, p):
+        """Wrap the parser function `p` into a `Parser` object."""
+        self.name = ""
+        self.define(p)
+
+    def named(self, name):
+        # noinspection GrazieInspection
+        """Specify the name of the parser for easier debugging.
+
+        Type: `(str) -> Parser[A, B]`
+
+        This name is used in the debug-level parsing log. You can also get it via the
+        `Parser.name` attribute.
+
+        Examples:
+
+        ```pycon
+        >>> expr = (a("x") + a("y")).named("expr")
+        >>> expr.name
+        'expr'
+
+        ```
+
+        ```pycon
+        >>> expr = a("x") + a("y")
+        >>> expr.name
+        "('x', 'y')"
+
+        ```
+        """
+        self.name = name
+        return self
+
+    def define(self, p):
+        """Define the parser created earlier as a forward declaration.
+
+        Type: `(Parser[A, B]) -> None`
+
+        Use `p = forward_decl()` in combination with `p.define(...)` to define
+        recursive parsers.
+
+        See the examples in the docs for `forward_decl()`.
+        """
+        f = getattr(p, "run", p)
+        if debug:
+            setattr(self, "_run", f)
+        else:
+            setattr(self, "run", f)
+        self.named(getattr(p, "name", p.__doc__))
+
+    def run(self, tokens, s):
+        """Run the parser against the tokens with the specified parsing state.
+
+        Type: `(Sequence[A], State) -> Tuple[B, State]`
+
+        The parsing state includes the current position in the sequence being parsed,
+        and the position of the rightmost token that has been consumed while parsing for
+        better error messages.
+
+        If the parser fails to parse the tokens, it raises `NoParseError`.
+
+        !!! Warning
+
+            This is method is **internal** and may be changed in future versions. Use
+            `Parser.parse(tokens)` instead and let the parser object take care of
+            updating the parsing state.
+        """
+        if debug:
+            log.debug("trying %s" % self.name)
+        return self._run(tokens, s)  # noqa
+
+    def _run(self, tokens, s):
+        raise NotImplementedError("you must define() a parser")
+
+    def parse(self, tokens):
+        """Parse the sequence of tokens and return the parsed value.
+
+        Type: `(Sequence[A]) -> B`
+
+        It takes a sequence of tokens of arbitrary type `A` and returns the parsed value
+        of arbitrary type `B`.
+
+        If the parser fails to parse the tokens, it raises `NoParseError`.
+
+        !!! Note
+
+            Although `Parser.parse()` can parse sequences of any objects (including
+            `str` which is a sequence of `str` chars), **the recommended way** is
+            parsing sequences of `Token` objects.
+
+            You **should** use a regexp-based tokenizer `make_tokenizer()` defined in
+            `funcparserlib.lexer` to convert your text into a sequence of `Token`
+            objects before parsing it. You will get more readable parsing error messages
+            (as `Token` objects contain their position in the source file) and good
+            separation of the lexical and syntactic levels of the grammar.
+        """
+        try:
+            (tree, _) = self.run(tokens, State(0, 0, None))
+            return tree
+        except NoParseError as e:
+            max = e.state.max
+            if len(tokens) > max:
+                t = tokens[max]
+                if isinstance(t, Token):
+                    if t.start is None or t.end is None:
+                        loc = ""
+                    else:
+                        s_line, s_pos = t.start
+                        e_line, e_pos = t.end
+                        loc = "%d,%d-%d,%d: " % (s_line, s_pos, e_line, e_pos)
+                    msg = "%s%s: %r" % (loc, e.msg, t.value)
+                elif isinstance(t, string_types):
+                    msg = "%s: %r" % (e.msg, t)
+                else:
+                    msg = "%s: %s" % (e.msg, t)
+            else:
+                msg = "got unexpected end of input"
+            if e.state.parser is not None:
+                msg = "%s, expected: %s" % (msg, e.state.parser.name)
+            e.msg = msg
+            raise
+
+    def __add__(self, other):
+        """Sequential combination of parsers. It runs this parser, then the other
+        parser.
+
+        The return value of the resulting parser is a tuple of each parsed value in
+        the sum of parsers. We merge all parsing results of `p1 + p2 + ... + pN` into a
+        single tuple. It means that the parsing result may be a 2-tuple, a 3-tuple,
+        a 4-tuple, etc. of parsed values. You avoid this by transforming the parsed
+        pair into a new value using the `>>` combinator.
+
+        You can also skip some parsing results in the resulting parsers by using `-p`
+        or `skip(p)` for some parsers in your sum of parsers. It means that the parsing
+        result might be a single value, not a tuple of parsed values. See the docs
+        for `Parser.__neg__()` for more examples.
+
+        Overloaded types (lots of them to provide stricter checking for the quite
+        dynamic return type of this method):
+
+        * `(self: Parser[A, B], _IgnoredParser[A]) -> Parser[A, B]`
+        * `(self: Parser[A, B], Parser[A, C]) -> _TupleParser[A, Tuple[B, C]]`
+        * `(self: _TupleParser[A, B], _IgnoredParser[A]) -> _TupleParser[A, B]`
+        * `(self: _TupleParser[A, B], Parser[A, Any]) -> Parser[A, Any]`
+        * `(self: _IgnoredParser[A], _IgnoredParser[A]) -> _IgnoredParser[A]`
+        * `(self: _IgnoredParser[A], Parser[A, C]) -> Parser[A, C]`
+
+        Examples:
+
+        ```pycon
+        >>> expr = a("x") + a("y")
+        >>> expr.parse("xy")
+        ('x', 'y')
+
+        ```
+
+        ```pycon
+        >>> expr = a("x") + a("y") + a("z")
+        >>> expr.parse("xyz")
+        ('x', 'y', 'z')
+
+        ```
+
+        ```pycon
+        >>> expr = a("x") + a("y")
+        >>> expr.parse("xz")
+        Traceback (most recent call last):
+            ...
+        parser.NoParseError: got unexpected token: 'z', expected: 'y'
+
+        ```
+        """
+
+        def magic(v1, v2):
+            if isinstance(v1, _Tuple):
+                return _Tuple(v1 + (v2,))
+            else:
+                return _Tuple((v1, v2))
+
+        @_TupleParser
+        def _add(tokens, s):
+            (v1, s2) = self.run(tokens, s)
+            (v2, s3) = other.run(tokens, s2)
+            return magic(v1, v2), s3
+
+        @Parser
+        def ignored_right(tokens, s):
+            v, s2 = self.run(tokens, s)
+            _, s3 = other.run(tokens, s2)
+            return v, s3
+
+        name = "(%s, %s)" % (self.name, other.name)
+        if isinstance(other, _IgnoredParser):
+            return ignored_right.named(name)
+        else:
+            return _add.named(name)
+
+    def __or__(self, other):
+        """Choice combination of parsers.
+
+        It runs this parser and returns its result. If the parser fails, it runs the
+        other parser.
+
+        Examples:
+
+        ```pycon
+        >>> expr = a("x") | a("y")
+        >>> expr.parse("x")
+        'x'
+        >>> expr.parse("y")
+        'y'
+        >>> expr.parse("z")
+        Traceback (most recent call last):
+            ...
+        parser.NoParseError: got unexpected token: 'z', expected: 'x' or 'y'
+
+        ```
+        """
+
+        @Parser
+        def _or(tokens, s):
+            try:
+                return self.run(tokens, s)
+            except NoParseError as e:
+                state = e.state
+            try:
+                return other.run(tokens, State(s.pos, state.max, state.parser))
+            except NoParseError as e:
+                if s.pos == e.state.max:
+                    e.state = State(e.state.pos, e.state.max, _or)
+                raise
+
+        _or.name = "%s or %s" % (self.name, other.name)
+        return _or
+
+    def __rshift__(self, f):
+        """Transform the parsing result by applying the specified function.
+
+        Type: `(Callable[[B], C]) -> Parser[A, C]`
+
+        You can use it for transforming the parsed value into another value before
+        including it into the parse tree (the AST).
+
+        Examples:
+
+        ```pycon
+        >>> def make_canonical_name(s):
+        ...     return s.lower()
+        >>> expr = (a("D") | a("d")) >> make_canonical_name
+        >>> expr.parse("D")
+        'd'
+        >>> expr.parse("d")
+        'd'
+
+        ```
+        """
+
+        @Parser
+        def _shift(tokens, s):
+            (v, s2) = self.run(tokens, s)
+            return f(v), s2
+
+        return _shift.named(self.name)
+
+    def bind(self, f):
+        """Bind the parser to a monadic function that returns a new parser.
+
+        Type: `(Callable[[B], Parser[A, C]]) -> Parser[A, C]`
+
+        Also known as `>>=` in Haskell.
+
+        !!! Note
+
+            You can parse any context-free grammar without resorting to `bind`. Due
+            to its poor performance please use it only when you really need it.
+        """
+
+        @Parser
+        def _bind(tokens, s):
+            (v, s2) = self.run(tokens, s)
+            return f(v).run(tokens, s2)
+
+        _bind.name = "(%s >>=)" % (self.name,)
+        return _bind
+
+    def __neg__(self):
+        """Return a parser that parses the same tokens, but its parsing result is
+        ignored by the sequential `+` combinator.
+
+        Type: `(Parser[A, B]) -> _IgnoredParser[A]`
+
+        You can use it for throwing away elements of concrete syntax (e.g. `","`,
+        `";"`).
+
+        Examples:
+
+        ```pycon
+        >>> expr = -a("x") + a("y")
+        >>> expr.parse("xy")
+        'y'
+
+        ```
+
+        ```pycon
+        >>> expr = a("x") + -a("y")
+        >>> expr.parse("xy")
+        'x'
+
+        ```
+
+        ```pycon
+        >>> expr = a("x") + -a("y") + a("z")
+        >>> expr.parse("xyz")
+        ('x', 'z')
+
+        ```
+
+        ```pycon
+        >>> expr = -a("x") + a("y") + -a("z")
+        >>> expr.parse("xyz")
+        'y'
+
+        ```
+
+        ```pycon
+        >>> expr = -a("x") + a("y")
+        >>> expr.parse("yz")
+        Traceback (most recent call last):
+            ...
+        parser.NoParseError: got unexpected token: 'y', expected: 'x'
+
+        ```
+
+        ```pycon
+        >>> expr = a("x") + -a("y")
+        >>> expr.parse("xz")
+        Traceback (most recent call last):
+            ...
+        parser.NoParseError: got unexpected token: 'z', expected: 'y'
+
+        ```
+
+        !!! Note
+
+            You **should not** pass the resulting parser to any combinators other than
+            `+`. You **should** have at least one non-skipped value in your
+            `p1 + p2 + ... + pN`. The parsed value of `-p` is an **internal** `_Ignored`
+            object, not intended for actual use.
+        """
+        return _IgnoredParser(self)
+
+    def __class_getitem__(cls, key):
+        return cls
+
+
+class State(object):
+    """Parsing state that is maintained basically for error reporting.
+
+    It consists of the current position `pos` in the sequence being parsed, and the
+    position `max` of the rightmost token that has been consumed while parsing.
+    """
+
+    def __init__(self, pos, max, parser=None):
+        self.pos = pos
+        self.max = max
+        self.parser = parser
+
+    def __str__(self):
+        return str((self.pos, self.max))
+
+    def __repr__(self):
+        return "State(%r, %r)" % (self.pos, self.max)
+
+
+class NoParseError(Exception):
+    def __init__(self, msg, state):
+        self.msg = msg
+        self.state = state
+
+    def __str__(self):
+        return self.msg
+
+
+class _Tuple(tuple):
+    pass
+
+
+class _TupleParser(Parser):
+    pass
+
+
+class _Ignored(object):
+    def __init__(self, value):
+        self.value = value
+
+    def __repr__(self):
+        return "_Ignored(%s)" % repr(self.value)
+
+    def __eq__(self, other):
+        return isinstance(other, _Ignored) and self.value == other.value
+
+
+@Parser
+def finished(tokens, s):
+    """A parser that throws an exception if there are any unparsed tokens left in the
+    sequence."""
+    if s.pos >= len(tokens):
+        return None, s
+    else:
+        s2 = State(s.pos, s.max, finished if s.pos == s.max else s.parser)
+        raise NoParseError("got unexpected token", s2)
+
+
+finished.name = "end of input"
+
+
+def many(p):
+    """Return a parser that applies the parser `p` as many times as it succeeds at
+    parsing the tokens.
+
+    Return a parser that infinitely applies the parser `p` to the input sequence
+    of tokens as long as it successfully parses them. The parsed value is a list of
+    the sequentially parsed values.
+
+    Examples:
+
+    ```pycon
+    >>> expr = many(a("x"))
+    >>> expr.parse("x")
+    ['x']
+    >>> expr.parse("xx")
+    ['x', 'x']
+    >>> expr.parse("xxxy")  # noqa
+    ['x', 'x', 'x']
+    >>> expr.parse("y")
+    []
+
+    ```
+    """
+
+    @Parser
+    def _many(tokens, s):
+        res = []
+        try:
+            while True:
+                (v, s) = p.run(tokens, s)
+                res.append(v)
+        except NoParseError as e:
+            s2 = State(s.pos, e.state.max, e.state.parser)
+            if debug:
+                log.debug(
+                    "*matched* %d instances of %s, new state = %s"
+                    % (len(res), _many.name, s2)
+                )
+            return res, s2
+
+    _many.name = "{ %s }" % p.name
+    return _many
+
+
+def some(pred):
+    """Return a parser that parses a token if it satisfies the predicate `pred`.
+
+    Type: `(Callable[[A], bool]) -> Parser[A, A]`
+
+    Examples:
+
+    ```pycon
+    >>> expr = some(lambda s: s.isalpha()).named('alpha')
+    >>> expr.parse("x")
+    'x'
+    >>> expr.parse("y")
+    'y'
+    >>> expr.parse("1")
+    Traceback (most recent call last):
+        ...
+    parser.NoParseError: got unexpected token: '1', expected: alpha
+
+    ```
+
+    !!! Warning
+
+        The `some()` combinator is quite slow and may be changed or removed in future
+        versions. If you need a parser for a token by its type (e.g. any identifier)
+        and maybe its value, use `tok(type[, value])` instead. You should use
+        `make_tokenizer()` from `funcparserlib.lexer` to tokenize your text first.
+    """
+
+    @Parser
+    def _some(tokens, s):
+        if s.pos >= len(tokens):
+            s2 = State(s.pos, s.max, _some if s.pos == s.max else s.parser)
+            raise NoParseError("got unexpected end of input", s2)
+        else:
+            t = tokens[s.pos]
+            if pred(t):
+                pos = s.pos + 1
+                s2 = State(pos, max(pos, s.max), s.parser)
+                if debug:
+                    log.debug("*matched* %r, new state = %s" % (t, s2))
+                return t, s2
+            else:
+                s2 = State(s.pos, s.max, _some if s.pos == s.max else s.parser)
+                if debug:
+                    log.debug(
+                        "failed %r, state = %s, expected = %s" % (t, s2, s2.parser.name)
+                    )
+                raise NoParseError("got unexpected token", s2)
+
+    _some.name = "some(...)"
+    return _some
+
+
+def a(value):
+    """Return a parser that parses a token if it's equal to `value`.
+
+    Type: `(A) -> Parser[A, A]`
+
+    Examples:
+
+    ```pycon
+    >>> expr = a("x")
+    >>> expr.parse("x")
+    'x'
+    >>> expr.parse("y")
+    Traceback (most recent call last):
+        ...
+    parser.NoParseError: got unexpected token: 'y', expected: 'x'
+
+    ```
+
+    !!! Note
+
+        Although `Parser.parse()` can parse sequences of any objects (including
+        `str` which is a sequence of `str` chars), **the recommended way** is
+        parsing sequences of `Token` objects.
+
+        You **should** use a regexp-based tokenizer `make_tokenizer()` defined in
+        `funcparserlib.lexer` to convert your text into a sequence of `Token` objects
+        before parsing it. You will get more readable parsing error messages (as `Token`
+        objects contain their position in the source file) and good separation of the
+        lexical and syntactic levels of the grammar.
+    """
+    name = getattr(value, "name", value)
+    return some(lambda t: t == value).named(repr(name))
+
+
+def tok(type, value=None):
+    """Return a parser that parses a `Token` and returns the string value of the token.
+
+    Type: `(str, Optional[str]) -> Parser[Token, str]`
+
+    You can match any token of the specified `type` or you can match a specific token by
+    its `type` and `value`.
+
+    Examples:
+
+    ```pycon
+    >>> expr = tok("expr")
+    >>> expr.parse([Token("expr", "foo")])
+    'foo'
+    >>> expr.parse([Token("expr", "bar")])
+    'bar'
+    >>> expr.parse([Token("op", "=")])
+    Traceback (most recent call last):
+        ...
+    parser.NoParseError: got unexpected token: '=', expected: expr
+
+    ```
+
+    ```pycon
+    >>> expr = tok("op", "=")
+    >>> expr.parse([Token("op", "=")])
+    '='
+    >>> expr.parse([Token("op", "+")])
+    Traceback (most recent call last):
+        ...
+    parser.NoParseError: got unexpected token: '+', expected: '='
+
+    ```
+
+    !!! Note
+
+        In order to convert your text to parse into a sequence of `Token` objects,
+        use a regexp-based tokenizer `make_tokenizer()` defined in
+        `funcparserlib.lexer`. You will get more readable parsing error messages (as
+        `Token` objects contain their position in the source file) and good separation
+        of the lexical and syntactic levels of the grammar.
+    """
+    if value is not None:
+        p = a(Token(type, value))
+    else:
+        p = some(lambda t: t.type == type).named(type)
+    return (p >> (lambda t: t.value)).named(p.name)
+
+
+def pure(x):
+    """Wrap any object into a parser.
+
+    Type: `(A) -> Parser[A, A]`
+
+    A pure parser doesn't touch the tokens sequence, it just returns its pure `x`
+    value.
+
+    Also known as `return` in Haskell.
+    """
+
+    @Parser
+    def _pure(_, s):
+        return x, s
+
+    _pure.name = "(pure %r)" % (x,)
+    return _pure
+
+
+def maybe(p):
+    """Return a parser that returns `None` if the parser `p` fails.
+
+    Examples:
+
+    ```pycon
+    >>> expr = maybe(a("x"))
+    >>> expr.parse("x")
+    'x'
+    >>> expr.parse("y") is None
+    True
+
+    ```
+    """
+    return (p | pure(None)).named("[ %s ]" % (p.name,))
+
+
+def skip(p):
+    """An alias for `-p`.
+
+    See also the docs for `Parser.__neg__()`.
+    """
+    return -p
+
+
+class _IgnoredParser(Parser):
+    def __init__(self, p):
+        super(_IgnoredParser, self).__init__(p)
+        run = self._run if debug else self.run
+
+        def ignored(tokens, s):
+            v, s2 = run(tokens, s)
+            return v if isinstance(v, _Ignored) else _Ignored(v), s2
+
+        self.define(ignored)
+        self.name = getattr(p, "name", p.__doc__)
+
+    def __add__(self, other):
+        def ignored_left(tokens, s):
+            _, s2 = self.run(tokens, s)
+            v, s3 = other.run(tokens, s2)
+            return v, s3
+
+        if isinstance(other, _IgnoredParser):
+            return _IgnoredParser(ignored_left).named(
+                "(%s, %s)" % (self.name, other.name)
+            )
+        else:
+            return Parser(ignored_left).named("(%s, %s)" % (self.name, other.name))
+
+
+def oneplus(p):
+    """Return a parser that applies the parser `p` one or more times.
+
+    A similar parser combinator `many(p)` means apply `p` zero or more times, whereas
+    `oneplus(p)` means apply `p` one or more times.
+
+    Examples:
+
+    ```pycon
+    >>> expr = oneplus(a("x"))
+    >>> expr.parse("x")
+    ['x']
+    >>> expr.parse("xx")
+    ['x', 'x']
+    >>> expr.parse("y")
+    Traceback (most recent call last):
+        ...
+    parser.NoParseError: got unexpected token: 'y', expected: 'x'
+
+    ```
+    """
+
+    @Parser
+    def _oneplus(tokens, s):
+        (v1, s2) = p.run(tokens, s)
+        (v2, s3) = many(p).run(tokens, s2)
+        return [v1] + v2, s3
+
+    _oneplus.name = "(%s, { %s })" % (p.name, p.name)
+    return _oneplus
+
+
+def with_forward_decls(suspension):
+    warnings.warn(
+        "Use forward_decl() instead:\n"
+        "\n"
+        "    p = forward_decl()\n"
+        "    ...\n"
+        "    p.define(parser_value)\n",
+        DeprecationWarning,
+    )
+
+    @Parser
+    def f(tokens, s):
+        return suspension().run(tokens, s)
+
+    return f
+
+
+def forward_decl():
+    """Return an undefined parser that can be used as a forward declaration.
+
+    Type: `Parser[Any, Any]`
+
+    Use `p = forward_decl()` in combination with `p.define(...)` to define recursive
+    parsers.
+
+
+    Examples:
+
+    ```pycon
+    >>> expr = forward_decl()
+    >>> expr.define(a("x") + maybe(expr) + a("y"))
+    >>> expr.parse("xxyy")  # noqa
+    ('x', ('x', None, 'y'), 'y')
+    >>> expr.parse("xxy")
+    Traceback (most recent call last):
+        ...
+    parser.NoParseError: got unexpected end of input, expected: 'y'
+
+    ```
+
+    !!! Note
+
+        If you care about static types, you should add a type hint for your forward
+        declaration, so that your type checker can check types in `p.define(...)` later:
+
+        ```python
+        p: Parser[str, int] = forward_decl()
+        p.define(a("x"))  # Type checker error
+        p.define(a("1") >> int)  # OK
+        ```
+    """
+
+    @Parser
+    def f(_tokens, _s):
+        raise NotImplementedError("you must define() a forward_decl somewhere")
+
+    f.name = "forward_decl()"
+    return f
+
+
+if __name__ == "__main__":
+    import doctest
+
+    doctest.testmod()
diff --git a/lint_lib/_vendor/funcparserlib/parser.pyi b/lint_lib/_vendor/funcparserlib/parser.pyi
new file mode 100644
index 00000000..e21ded5a
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/parser.pyi
@@ -0,0 +1,83 @@
+from typing import (
+    Optional,
+    Generic,
+    TypeVar,
+    Union,
+    Callable,
+    Tuple,
+    Sequence,
+    Any,
+    List,
+    Text,
+    overload,
+)
+from funcparserlib.lexer import Token
+
+_A = TypeVar("_A")
+_B = TypeVar("_B")
+_C = TypeVar("_C")
+_D = TypeVar("_D")
+
+class State:
+    pos: int
+    max: int
+    parser: Union[Parser, _ParserCallable, None]
+    def __init__(
+        self,
+        pos: int,
+        max: int,
+        parser: Union[Parser, _ParserCallable, None] = ...,
+    ) -> None: ...
+
+_ParserCallable = Callable[[_A, State], Tuple[_B, State]]
+
+class Parser(Generic[_A, _B]):
+    name: Text
+    def __init__(self, p: Union[Parser[_A, _B], _ParserCallable]) -> None: ...
+    def named(self, name: Text) -> Parser[_A, _B]: ...
+    def define(self, p: Union[Parser[_A, _B], _ParserCallable]) -> None: ...
+    def run(self, tokens: Sequence[_A], s: State) -> Tuple[_B, State]: ...
+    def parse(self, tokens: Sequence[_A]) -> _B: ...
+    @overload
+    def __add__(  # type: ignore[misc]
+        self, other: _IgnoredParser[_A]
+    ) -> Parser[_A, _B]: ...
+    @overload
+    def __add__(self, other: Parser[_A, _C]) -> _TupleParser[_A, Tuple[_B, _C]]: ...
+    def __or__(self, other: Parser[_A, _C]) -> Parser[_A, Union[_B, _C]]: ...
+    def __rshift__(self, f: Callable[[_B], _C]) -> Parser[_A, _C]: ...
+    def bind(self, f: Callable[[_B], Parser[_A, _C]]) -> Parser[_A, _C]: ...
+    def __neg__(self) -> _IgnoredParser[_A]: ...
+
+class _Ignored:
+    value: Any
+    def __init__(self, value: Any) -> None: ...
+
+class _IgnoredParser(Parser[_A, _Ignored]):
+    @overload  # type: ignore[override]
+    def __add__(self, other: _IgnoredParser[_A]) -> _IgnoredParser[_A]: ...
+    @overload  # type: ignore[override]
+    def __add__(self, other: Parser[_A, _C]) -> Parser[_A, _C]: ...
+
+class _TupleParser(Parser[_A, _B]):
+    @overload  # type: ignore[override]
+    def __add__(self, other: _IgnoredParser[_A]) -> _TupleParser[_A, _B]: ...
+    @overload
+    def __add__(self, other: Parser[_A, Any]) -> Parser[_A, Any]: ...
+
+finished: Parser[Any, None]
+
+def many(p: Parser[_A, _B]) -> Parser[_A, List[_B]]: ...
+def some(pred: Callable[[_A], bool]) -> Parser[_A, _A]: ...
+def a(value: _A) -> Parser[_A, _A]: ...
+def tok(type: Text, value: Optional[Text] = ...) -> Parser[Token, Text]: ...
+def pure(x: _A) -> Parser[_A, _A]: ...
+def maybe(p: Parser[_A, _B]) -> Parser[_A, Optional[_B]]: ...
+def skip(p: Parser[_A, Any]) -> _IgnoredParser[_A]: ...
+def oneplus(p: Parser[_A, _B]) -> Parser[_A, List[_B]]: ...
+def forward_decl() -> Parser[Any, Any]: ...
+
+class NoParseError(Exception):
+    msg: Text
+    state: State
+    def __init__(self, msg: Text, state: State) -> None: ...
diff --git a/lint_lib/_vendor/funcparserlib/py.typed b/lint_lib/_vendor/funcparserlib/py.typed
new file mode 100644
index 00000000..e69de29b
diff --git a/lint_lib/_vendor/funcparserlib/util.py b/lint_lib/_vendor/funcparserlib/util.py
new file mode 100644
index 00000000..5c9ea51e
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/util.py
@@ -0,0 +1,72 @@
+# -*- coding: utf-8 -*-
+
+# Copyright © 2009/2021 Andrey Vlasovskikh
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy of this
+# software and associated documentation files (the "Software"), to deal in the Software
+# without restriction, including without limitation the rights to use, copy, modify,
+# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
+# permit persons to whom the Software is furnished to do so, subject to the following
+# conditions:
+#
+# The above copyright notice and this permission notice shall be included in all copies
+# or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
+# PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
+# CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
+# OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+from __future__ import unicode_literals
+
+
+def pretty_tree(x, kids, show):
+    """Return a pseudo-graphic tree representation of the object `x` similar to the
+    `tree` command in Unix.
+
+    Type: `(T, Callable[[T], List[T]], Callable[[T], str]) -> str`
+
+    It applies the parameter `show` (which is a function of type `(T) -> str`) to get a
+    textual representation of the objects to show.
+
+    It applies the parameter `kids` (which is a function of type `(T) -> List[T]`) to
+    list the children of the object to show.
+
+    Examples:
+
+    ```pycon
+    >>> print(pretty_tree(
+    ...     ["foo", ["bar", "baz"], "quux"],
+    ...     lambda obj: obj if isinstance(obj, list) else [],
+    ...     lambda obj: "[]" if isinstance(obj, list) else str(obj),
+    ... ))
+    []
+    |-- foo
+    |-- []
+    |   |-- bar
+    |   `-- baz
+    `-- quux
+
+    ```
+    """
+    (MID, END, CONT, LAST, ROOT) = ("|-- ", "`-- ", "|   ", "    ", "")
+
+    def rec(obj, indent, sym):
+        line = indent + sym + show(obj)
+        obj_kids = kids(obj)
+        if len(obj_kids) == 0:
+            return line
+        else:
+            if sym == MID:
+                next_indent = indent + CONT
+            elif sym == ROOT:
+                next_indent = indent + ROOT
+            else:
+                next_indent = indent + LAST
+            chars = [MID] * (len(obj_kids) - 1) + [END]
+            lines = [rec(kid, next_indent, sym) for kid, sym in zip(obj_kids, chars)]
+            return "\n".join([line] + lines)
+
+    return rec(x, "", ROOT)
diff --git a/lint_lib/_vendor/funcparserlib/util.pyi b/lint_lib/_vendor/funcparserlib/util.pyi
new file mode 100644
index 00000000..cf6a3d48
--- /dev/null
+++ b/lint_lib/_vendor/funcparserlib/util.pyi
@@ -0,0 +1,7 @@
+from typing import TypeVar, Callable, List, Text
+
+_A = TypeVar("_A")
+
+def pretty_tree(
+    x: _A, kids: Callable[[_A], List[_A]], show: Callable[[_A], Text]
+) -> Text: ...
diff --git a/lint_lib/_vendor/vendor.txt b/lint_lib/_vendor/vendor.txt
new file mode 100644
index 00000000..8af787f1
--- /dev/null
+++ b/lint_lib/_vendor/vendor.txt
@@ -0,0 +1 @@
+funcparserlib==1.0.1
diff --git a/lint_lib/lint.py b/lint_lib/lint.py
new file mode 100644
index 00000000..de4ccd09
--- /dev/null
+++ b/lint_lib/lint.py
@@ -0,0 +1,280 @@
+import codecs
+import contextlib
+import io
+import json
+import os
+import re
+import sys
+from collections import Counter
+from os.path import dirname, join, pardir, relpath
+from typing import Any, Dict, List, Optional, Set, TypeVar
+
+from . import parser
+from ._vendor.funcparserlib.parser import NoParseError
+
+text_type = str
+binary_type = bytes
+
+StringLike = TypeVar("StringLike", str, bytes)
+
+base = join(dirname(__file__), pardir)
+
+_surrogateRe = re.compile(r"\\u([0-9A-Fa-f]{4})(?:\\u([0-9A-Fa-f]{4}))?")
+
+
+def clean_path(path: str) -> str:
+    return relpath(path, base)
+
+
+def is_subsequence(l1: List[StringLike], l2: List[StringLike]) -> bool:
+    """checks if l1 is a subsequence of l2"""
+    i = 0
+    for x in l2:
+        if l1[i] == x:
+            i += 1
+            if i == len(l1):
+                return True
+    return False
+
+
+def unescape_json(obj: Any) -> Any:
+    def decode_str(inp):
+        """Decode \\uXXXX escapes
+
+        This decodes \\uXXXX escapes, possibly into non-BMP characters when
+        two surrogate character escapes are adjacent to each other.
+        """
+
+        # This cannot be implemented using the unicode_escape codec
+        # because that requires its input be ISO-8859-1, and we need
+        # arbitrary unicode as input.
+        def repl(m):
+            if m.group(2) is not None:
+                high = int(m.group(1), 16)
+                low = int(m.group(2), 16)
+                if (
+                    0xD800 <= high <= 0xDBFF
+                    and 0xDC00 <= low <= 0xDFFF
+                    and sys.maxunicode == 0x10FFFF
+                ):
+                    cp = ((high - 0xD800) << 10) + (low - 0xDC00) + 0x10000
+                    return chr(cp)
+                else:
+                    return chr(high) + chr(low)
+            else:
+                return chr(int(m.group(1), 16))
+
+        return _surrogateRe.sub(repl, inp)
+
+    if isinstance(obj, dict):
+        return {decode_str(k): unescape_json(v) for k, v in obj.items()}
+    elif isinstance(obj, list):
+        return [unescape_json(x) for x in obj]
+    elif isinstance(obj, text_type):
+        return decode_str(obj)
+    else:
+        return obj
+
+
+def lint_dat_format(
+    path: str,
+    encoding: Optional[str],
+    first_header: StringLike,
+    expected_headers: Optional[List[StringLike]] = None,
+    input_headers: Optional[Set[StringLike]] = None,
+) -> List[Dict[StringLike, StringLike]]:
+    if expected_headers is not None and first_header not in expected_headers:
+        raise ValueError("First header must be an expected header. (lint config error)")
+
+    if (
+        input_headers is not None
+        and expected_headers is not None
+        and not (set(input_headers) < set(expected_headers))
+    ):
+        raise ValueError(
+            "Input header must be a subset of expected headers. (lint config error)"
+        )
+
+    if expected_headers is not None and len(set(expected_headers)) < len(
+        expected_headers
+    ):
+        raise ValueError(
+            "Can't expect a single header multiple times. (lint config error)"
+        )
+
+    if input_headers is None:
+        input_headers = set(expected_headers)
+
+    try:
+        if encoding is not None:
+            with codecs.open(path, "r", encoding=encoding) as fp:
+                dat = fp.read()
+                parsed = parser.parse(dat, first_header)
+        else:
+            with open(path, "rb") as fp:
+                dat = fp.read()
+                parsed = parser.parse(dat, first_header)
+    except NoParseError as e:
+        print("Parse error in {}, {}".format(path, e))
+        return
+
+    seen_items = {}
+
+    for item in parsed:
+        # Check we don't have duplicate headers within one item.
+        headers = Counter(x[0] for x in item.data)
+        headers.subtract(set(headers.elements()))  # remove one instance of each
+        for header in set(headers.elements()):
+            c = headers[header]
+            print(
+                f"Duplicate header {header!r} occurs {c+1} times in one item in {path} at line {item.lineno}"
+            )
+
+        item_dict = dict(item.data)
+
+        # Check we only have expected headers.
+        if expected_headers is not None:
+            if not is_subsequence(
+                list(item_dict.keys()),
+                expected_headers,
+            ):
+                unexpected = item_dict.keys()
+                print(
+                    f"Unexpected item headings in {list(unexpected)!r} in {path} at line {item.lineno}"
+                )
+
+        # Check for duplicated items.
+        if input_headers is not None:
+            found_input = set()
+            for input_header in input_headers:
+                found_input.add((input_header, item_dict.get(input_header)))
+        else:
+            found_input = set(item_dict.items())
+
+        first_line = seen_items.setdefault(frozenset(found_input), item.lineno)
+        if first_line is not None and first_line != item.lineno:
+            print(
+                f"Duplicate item in {path} at line {item.lineno} previously seen on line {first_line}"
+            )
+
+    return [dict(x.data) for x in parsed]
+
+
+def lint_encoding_test(path: str) -> None:
+    parsed = lint_dat_format(
+        path,
+        None,
+        b"data",
+        expected_headers=[b"data", b"encoding"],
+        input_headers={b"data"},
+    )
+    if not parsed:
+        # We'll already have output if there's a parse error.
+        return
+
+    # We'd put extra linting here, if we ever have anything specific to the
+    # encoding tests here.
+
+
+def lint_encoding_tests(path: str) -> None:
+    for root, dirs, files in os.walk(path):
+        for file in sorted(files):
+            if not file.endswith(".dat"):
+                continue
+            lint_encoding_test(clean_path(join(root, file)))
+
+
+def lint_tokenizer_test(path: str) -> None:
+    all_keys = {
+        "description",
+        "input",
+        "output",
+        "initialStates",
+        "lastStartTag",
+        "ignoreErrorOrder",
+        "doubleEscaped",
+        "errors",
+    }
+    required = {"input", "output"}
+    with codecs.open(path, "r", "utf-8") as fp:
+        parsed = json.load(fp)
+    if not parsed:
+        return
+    if not isinstance(parsed, dict):
+        print("Top-level must be an object in %s" % path)
+        return
+    for test_group in parsed.values():
+        if not isinstance(test_group, list):
+            print("Test groups must be a lists in %s" % path)
+            continue
+        for test in test_group:
+            if "doubleEscaped" in test and test["doubleEscaped"] is True:
+                test = unescape_json(test)
+            keys = set(test.keys())
+            if not (required <= keys):
+                print(
+                    "missing test properties {!r} in {}".format(required - keys, path)
+                )
+            if not (keys <= all_keys):
+                print(
+                    "unknown test properties {!r} in {}".format(keys - all_keys, path)
+                )
+
+
+def lint_tokenizer_tests(path: str) -> None:
+    for root, dirs, files in os.walk(path):
+        for file in sorted(files):
+            if not file.endswith(".test"):
+                continue
+            lint_tokenizer_test(clean_path(join(root, file)))
+
+
+def lint_tree_construction_test(path: str) -> None:
+    parsed = lint_dat_format(
+        path,
+        "utf-8",
+        "data",
+        expected_headers=[
+            "data",
+            "errors",
+            "new-errors",
+            "document-fragment",
+            "script-off",
+            "script-on",
+            "document",
+        ],
+        input_headers={
+            "data",
+            "document-fragment",
+            "script-on",
+            "script-off",
+        },
+    )
+    if not parsed:
+        # We'll already have output if there's a parse error.
+        return
+
+    # We'd put extra linting here, if we ever have anything specific to the
+    # tree construction tests here.
+
+
+def lint_tree_construction_tests(path: str) -> None:
+    for root, dirs, files in os.walk(path):
+        for file in sorted(files):
+            if not file.endswith(".dat"):
+                continue
+            lint_tree_construction_test(clean_path(join(root, file)))
+
+
+def main() -> int:
+    with contextlib.redirect_stdout(io.StringIO()) as f:
+        lint_encoding_tests(join(base, "encoding"))
+        lint_tokenizer_tests(join(base, "tokenizer"))
+        lint_tree_construction_tests(join(base, "tree-construction"))
+
+    print(f.getvalue(), end="")
+    return 0 if f.getvalue() == "" else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/lint_lib/parser.py b/lint_lib/parser.py
new file mode 100644
index 00000000..d18605a6
--- /dev/null
+++ b/lint_lib/parser.py
@@ -0,0 +1,177 @@
+import re
+from typing import Callable, List, Optional, Tuple, Type, TypeVar, Union
+
+from ._vendor.funcparserlib.lexer import LexerError, Token
+from ._vendor.funcparserlib.parser import (
+    NoParseError,
+    Parser,
+    _Tuple,
+    finished,
+    many,
+    pure,
+    skip,
+    some,
+    tok,
+)
+
+StringLike = TypeVar("StringLike", str, bytes)
+
+
+class Test:
+    def __init__(
+        self, data: List[Tuple[StringLike, StringLike]], lineno: Optional[int] = None
+    ) -> None:
+        self.data = data
+        self.lineno = lineno
+
+
+def _make_tokenizer(specs: List[Tuple[str, Tuple[StringLike]]]) -> Callable:
+    # Forked from upstream funcparserlib.lexer to fix #46
+    def compile_spec(spec):
+        name, args = spec
+        return name, re.compile(*args)
+
+    compiled = [compile_spec(s) for s in specs]
+
+    def match_specs(specs, s, i, position):
+        if isinstance(s, str):
+            lf = "\n"
+        else:
+            lf = b"\n"
+        line, pos = position
+        for type, regexp in specs:
+            m = regexp.match(s, i)
+            if m is not None:
+                value = m.group()
+                nls = value.count(lf)
+                n_line = line + nls
+                if nls == 0:
+                    n_pos = pos + len(value)
+                else:
+                    n_pos = len(value) - value.rfind(lf) - 1
+                return Token(type, value, (line, pos + 1), (n_line, n_pos))
+        else:
+            errline = s.splitlines()[line - 1]
+            raise LexerError((line, pos + 1), errline)
+
+    def f(s):
+        length = len(s)
+        line, pos = 1, 0
+        i = 0
+        while i < length:
+            t = match_specs(compiled, s, i, (line, pos))
+            yield t
+            line, pos = t.end
+            i += len(t.value)
+
+    return f
+
+
+_token_specs_u = [
+    ("HEADER", (r"[ \t]*#[^\n]*",)),
+    ("BODY", (r"[^#\n][^\n]*",)),
+    ("EOL", (r"\n",)),
+]
+
+_token_specs_b = [
+    (name, (regexp.encode("ascii"),)) for (name, (regexp,)) in _token_specs_u
+]
+
+_tokenizer_u = _make_tokenizer(_token_specs_u)
+_tokenizer_b = _make_tokenizer(_token_specs_b)
+
+
+def _many_merge(toks: _Tuple) -> List[Test]:
+    x, xs = toks
+    return [x] + xs
+
+
+def _notFollowedBy(p: Parser) -> Parser:
+    @Parser
+    def __notFollowedBy(tokens, s):
+        try:
+            p.run(tokens, s)
+        except NoParseError:
+            return skip(pure(None)).run(tokens, s)
+        else:
+            raise NoParseError("is followed by", s)
+
+    __notFollowedBy.name = "(notFollowedBy {})".format(p)
+    return __notFollowedBy
+
+
+def _trim_prefix(s: StringLike, prefix: StringLike) -> StringLike:
+    if s.startswith(prefix):
+        return s[len(prefix) :]
+    else:
+        return s
+
+
+def _make_test(result: _Tuple) -> Test:
+    first, rest = result
+    (first_header, first_lineno), first_body = first
+    return Test([(first_header, first_body)] + rest, lineno=first_lineno)
+
+
+def _parser(
+    tokens: List[Token],
+    new_test_header: StringLike,
+    tok_type: Union[Type[str], Type[bytes]],
+) -> List[Test]:
+    if tok_type is str:
+        header_prefix = "#"
+    elif tok_type is bytes:
+        header_prefix = b"#"
+    else:
+        assert False, "unreachable"
+
+    first_header = (
+        some(
+            lambda tok: tok.type == "HEADER"
+            and tok.value == header_prefix + new_test_header
+        )
+        >> (
+            lambda x: (
+                _trim_prefix(x.value, header_prefix),
+                x.start[0] if x.start is not None else None,
+            )
+        )
+    ) + skip(tok("EOL"))
+
+    header = (
+        some(
+            lambda tok: tok.type == "HEADER"
+            and tok.value != header_prefix + new_test_header
+        )
+        >> (lambda x: _trim_prefix(x.value, header_prefix))
+    ) + skip(tok("EOL"))
+
+    body = tok("BODY") + tok("EOL") >> (lambda x: x[0] + x[1])
+    empty = tok("EOL")
+
+    actual_body = many(body | (empty + skip(_notFollowedBy(first_header)))) >> (
+        lambda xs: tok_type().join(xs)[:-1]
+    )
+
+    first_segment = first_header + actual_body >> tuple
+    rest_segment = header + actual_body >> tuple
+
+    test = first_segment + many(rest_segment) >> _make_test
+
+    tests = (test + many(skip(empty) + test)) >> _many_merge
+
+    toplevel = tests + skip(finished)
+
+    return toplevel.parse(tokens)
+
+
+def parse(s: StringLike, new_test_header: StringLike) -> List[Test]:
+    if type(s) != type(new_test_header):
+        raise TypeError("s and new_test_header must have same type")
+
+    if isinstance(s, str):
+        return _parser(list(_tokenizer_u(s)), new_test_header, str)
+    elif isinstance(s, bytes):
+        return _parser(list(_tokenizer_b(s)), new_test_header, bytes)
+    else:
+        raise TypeError("s must be unicode or bytes object")
diff --git a/pyproject.toml b/pyproject.toml
new file mode 100644
index 00000000..a68f7874
--- /dev/null
+++ b/pyproject.toml
@@ -0,0 +1,7 @@
+[tool.vendoring]
+destination = "lint_lib/_vendor/"
+requirements = "lint_lib/_vendor/vendor.txt"
+namespace = "lint_lib._vendor"
+
+protected-files = ["__init__.py", "vendor.txt"]
+patches-dir = "lint_lib/_vendor-patches"

From 0b8d24c160a811555fa16119903e14143b07687a Mon Sep 17 00:00:00 2001
From: luzpaz <luzpaz@users.noreply.github.com>
Date: Wed, 26 Apr 2023 12:03:26 +0000
Subject: [PATCH 60/68] Fix typos in descriptions and errors

---
 serializer/core.test           | 4 ++--
 tokenizer/test2.test           | 2 +-
 tree-construction/template.dat | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/serializer/core.test b/serializer/core.test
index c0b4222d..a6fa0754 100644
--- a/serializer/core.test
+++ b/serializer/core.test
@@ -112,12 +112,12 @@
  "expected": ["<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">"]
 },
 
-{"description": "HTML 4.01 DOCTYPE without system identifer",
+{"description": "HTML 4.01 DOCTYPE without system identifier",
  "input": [["Doctype", "HTML",  "-//W3C//DTD HTML 4.01//EN"]],
  "expected": ["<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\">"]
 },
 
-{"description": "IBM DOCTYPE without public identifer",
+{"description": "IBM DOCTYPE without public identifier",
  "input": [["Doctype", "html",  "", "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd"]],
  "expected": ["<!DOCTYPE html SYSTEM \"http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd\">"]
 }
diff --git a/tokenizer/test2.test b/tokenizer/test2.test
index f80f27d1..c29e4c31 100644
--- a/tokenizer/test2.test
+++ b/tokenizer/test2.test
@@ -190,7 +190,7 @@
     { "code": "unexpected-question-mark-instead-of-tag-name", "line": 1, "col": 2 }
 ]},
 
-{"description":"A bogus comment stops at >, even if preceeded by two dashes",
+{"description":"A bogus comment stops at >, even if preceded by two dashes",
 "input":"<?foo-->",
 "output":[["Comment", "?foo--"]],
 "errors":[
diff --git a/tree-construction/template.dat b/tree-construction/template.dat
index 396362b8..45fb507c 100644
--- a/tree-construction/template.dat
+++ b/tree-construction/template.dat
@@ -1092,7 +1092,7 @@ eof in template
 <body><template><i><menu>Foo</i>
 #errors
 no doctype
-mising /menu
+missing /menu
 eof in template
 #document
 | <html>

From 31086ec71479f6ee2e70fcc2377cd66bc97f7534 Mon Sep 17 00:00:00 2001
From: Anne van Kesteren <annevk@annevk.nl>
Date: Wed, 5 Apr 2023 18:36:22 +0200
Subject: [PATCH 61/68] Test <div><table><svg><foreignObject><select><table><s>

Closes #137.
---
 tree-construction/tables01.dat | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tree-construction/tables01.dat b/tree-construction/tables01.dat
index f0caaa3c..311b4247 100644
--- a/tree-construction/tables01.dat
+++ b/tree-construction/tables01.dat
@@ -284,3 +284,18 @@
 |             <svg svg>
 |               <svg desc>
 |           <td>
+
+#data
+<div><table><svg><foreignObject><select><table><s>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <div>
+|       <svg svg>
+|         <svg foreignObject>
+|           <select>
+|       <table>
+|       <s>
+|       <table>

From 66f49b1de0487b8077f73813137149d15dc43af4 Mon Sep 17 00:00:00 2001
From: Mike Dalessio <mike.dalessio@gmail.com>
Date: Thu, 27 Apr 2023 12:56:40 -0400
Subject: [PATCH 62/68] Provide errors for test introduced in #163

---
 tree-construction/tables01.dat | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tree-construction/tables01.dat b/tree-construction/tables01.dat
index 311b4247..decd68b5 100644
--- a/tree-construction/tables01.dat
+++ b/tree-construction/tables01.dat
@@ -288,6 +288,13 @@
 #data
 <div><table><svg><foreignObject><select><table><s>
 #errors
+1:1: Expected a doctype token
+1:13: 'svg' tag isn't allowed here. Currently open tags: html, body, div, table.
+1:33: 'select' tag isn't allowed here. Currently open tags: html, body, div, table, svg, foreignobject.
+1:41: 'table' tag isn't allowed here. Currently open tags: html, body, div, table, svg, foreignobject, select.
+1:41: 'table' tag isn't allowed here. Currently open tags: html, body, div, table, svg, foreignobject.
+1:48: 's' tag isn't allowed here. Currently open tags: html, body, div, table.
+1:51: Premature end of file. Currently open tags: html, body, div, table, s.
 #document
 | <html>
 |   <head>

From 55aa183097fa52bb1328cd93633be6f88159d4b8 Mon Sep 17 00:00:00 2001
From: Anne van Kesteren <annevk@annevk.nl>
Date: Tue, 2 May 2023 14:28:39 +0200
Subject: [PATCH 63/68] Support <hr>-in-<select>

As per https://github.com/whatwg/html/pull/9124. Tests from https://github.com/WebKit/WebKit/pull/12407.
---
 tree-construction/webkit02.dat | 132 +++++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index 325568e2..2987a70d 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -400,3 +400,135 @@ eof-in-math
 |   <body>
 |     <math math>
 |       definitionURL=""
+
+#data
+<select><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <select>
+|       <hr>
+
+#data
+<select><option><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <select>
+|       <option>
+|       <hr>
+
+#data
+<select><optgroup><option><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <select>
+|       <optgroup>
+|         <option>
+|       <hr>
+
+#data
+<select><optgroup><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <select>
+|       <optgroup>
+|       <hr>
+
+#data
+<select><option><optgroup><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <select>
+|       <option>
+|       <optgroup>
+|       <hr>
+
+#data
+<table><tr><td><select><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <table>
+|       <tbody>
+|         <tr>
+|           <td>
+|             <select>
+|               <hr>
+
+#data
+<table><tr><td><select><option><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <table>
+|       <tbody>
+|         <tr>
+|           <td>
+|             <select>
+|               <option>
+|               <hr>
+
+#data
+<table><tr><td><select><optgroup><option><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <table>
+|       <tbody>
+|         <tr>
+|           <td>
+|             <select>
+|               <optgroup>
+|                 <option>
+|               <hr>
+
+#data
+<table><tr><td><select><optgroup><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <table>
+|       <tbody>
+|         <tr>
+|           <td>
+|             <select>
+|               <optgroup>
+|               <hr>
+
+#data
+<table><tr><td><select><option><optgroup><hr>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <table>
+|       <tbody>
+|         <tr>
+|           <td>
+|             <select>
+|               <option>
+|               <optgroup>
+|               <hr>

From c67f90eacac14e022b1f2c2e5ac559879581e9ff Mon Sep 17 00:00:00 2001
From: Mike Dalessio <mike.dalessio@gmail.com>
Date: Wed, 3 May 2023 09:00:17 -0400
Subject: [PATCH 64/68] update hr-in-select tests with parser errors

See #167
---
 tree-construction/webkit02.dat | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tree-construction/webkit02.dat b/tree-construction/webkit02.dat
index 2987a70d..7d817ec6 100644
--- a/tree-construction/webkit02.dat
+++ b/tree-construction/webkit02.dat
@@ -404,6 +404,8 @@ eof-in-math
 #data
 <select><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:13: ERROR: Premature end of file. Currently open tags: html, body, select.
 #document
 | <html>
 |   <head>
@@ -414,6 +416,8 @@ eof-in-math
 #data
 <select><option><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:21: ERROR: Premature end of file. Currently open tags: html, body, select.
 #document
 | <html>
 |   <head>
@@ -425,6 +429,8 @@ eof-in-math
 #data
 <select><optgroup><option><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:31: ERROR: Premature end of file. Currently open tags: html, body, select.
 #document
 | <html>
 |   <head>
@@ -437,6 +443,8 @@ eof-in-math
 #data
 <select><optgroup><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:23: ERROR: Premature end of file. Currently open tags: html, body, select.
 #document
 | <html>
 |   <head>
@@ -448,6 +456,8 @@ eof-in-math
 #data
 <select><option><optgroup><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:31: ERROR: Premature end of file. Currently open tags: html, body, select.
 #document
 | <html>
 |   <head>
@@ -460,6 +470,8 @@ eof-in-math
 #data
 <table><tr><td><select><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:28: ERROR: Premature end of file. Currently open tags: html, body, table, tbody, tr, td, select.
 #document
 | <html>
 |   <head>
@@ -474,6 +486,8 @@ eof-in-math
 #data
 <table><tr><td><select><option><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:36: ERROR: Premature end of file. Currently open tags: html, body, table, tbody, tr, td, select.
 #document
 | <html>
 |   <head>
@@ -489,6 +503,8 @@ eof-in-math
 #data
 <table><tr><td><select><optgroup><option><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:46: ERROR: Premature end of file. Currently open tags: html, body, table, tbody, tr, td, select.
 #document
 | <html>
 |   <head>
@@ -505,6 +521,8 @@ eof-in-math
 #data
 <table><tr><td><select><optgroup><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:38: ERROR: Premature end of file. Currently open tags: html, body, table, tbody, tr, td, select.
 #document
 | <html>
 |   <head>
@@ -520,6 +538,8 @@ eof-in-math
 #data
 <table><tr><td><select><option><optgroup><hr>
 #errors
+1:1: ERROR: Expected a doctype token
+1:46: ERROR: Premature end of file. Currently open tags: html, body, table, tbody, tr, td, select.
 #document
 | <html>
 |   <head>

From 921e6f286b04c77e847b51d810d97a24d1a660b1 Mon Sep 17 00:00:00 2001
From: Felix Boehm <188768+fb55@users.noreply.github.com>
Date: Wed, 9 Aug 2023 14:08:45 +0100
Subject: [PATCH 65/68] ci: Run downstream parse5 tests (#155)

---
 .github/workflows/downstream.yml | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/.github/workflows/downstream.yml b/.github/workflows/downstream.yml
index 4489d2fe..27a358c4 100644
--- a/.github/workflows/downstream.yml
+++ b/.github/workflows/downstream.yml
@@ -19,3 +19,22 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - run: echo hello world
+
+  parse5:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          repository: inikulin/parse5
+          submodules: recursive
+      - run: rm -rf test/data/html5lib-tests/
+      - uses: actions/checkout@v2
+        with:
+          path: test/data/html5lib-tests/
+      - uses: actions/setup-node@v3
+        with:
+          node-version: lts/*
+          cache: npm
+      - run: npm ci
+      - run: npm run build --if-present
+      - run: npm run unit-tests

From 7e0d6e6799f58c5b7900c8e25c6b9da11bd9e445 Mon Sep 17 00:00:00 2001
From: Markus Unterwaditzer <markus-tarpit+git@unterwaditzer.net>
Date: Wed, 9 Aug 2023 15:41:05 +0200
Subject: [PATCH 66/68] ci: Run downstream html5gum tests (#152)

---
 .github/workflows/downstream.yml | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/.github/workflows/downstream.yml b/.github/workflows/downstream.yml
index 27a358c4..bc7a015c 100644
--- a/.github/workflows/downstream.yml
+++ b/.github/workflows/downstream.yml
@@ -38,3 +38,20 @@ jobs:
       - run: npm ci
       - run: npm run build --if-present
       - run: npm run unit-tests
+
+  html5gum:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          repository: untitaker/html5gum
+      - run: rm -rf tests/html5lib-tests/
+      - uses: actions/checkout@v2
+        with:
+          path: tests/html5lib-tests/
+      - uses: actions-rs/toolchain@v1
+        with:
+          profile: minimal
+          toolchain: stable
+          override: true
+      - run: cargo test

From 8b45ec20d9c4daa065f68dd30f2a860e3681da62 Mon Sep 17 00:00:00 2001
From: Mike Dalessio <mike.dalessio@gmail.com>
Date: Wed, 9 Aug 2023 09:44:40 -0400
Subject: [PATCH 67/68] ci: run downstream nokogiri tests (#154)

---
 .github/workflows/downstream.yml | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/.github/workflows/downstream.yml b/.github/workflows/downstream.yml
index bc7a015c..59f121f0 100644
--- a/.github/workflows/downstream.yml
+++ b/.github/workflows/downstream.yml
@@ -55,3 +55,22 @@ jobs:
           toolchain: stable
           override: true
       - run: cargo test
+
+  nokogiri:
+    runs-on: ubuntu-latest
+    container:
+      image: ghcr.io/sparklemotion/nokogiri-test:mri-3.2
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          repository: sparklemotion/nokogiri
+          path: nokogiri
+      - uses: actions/checkout@v3
+        with:
+          path: nokogiri/test/html5lib-tests
+      - working-directory: nokogiri
+        name: "Run the Nokogiri test suite"
+        run: |
+          bundle install
+          bundle exec rake compile -- --enable-system-libraries
+          bundle exec rake test

From a9f44960a9fedf265093d22b2aa3c7ca123727b9 Mon Sep 17 00:00:00 2001
From: Felix Boehm <188768+fb55@users.noreply.github.com>
Date: Thu, 17 Aug 2023 14:16:16 +0100
Subject: [PATCH 68/68] Add tests to increase spec coverage

parse5 is now tracking the test coverage of its codebase. The tests here all cover areas that previously didn't have tests.

Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
---
 tree-construction/quirks01.dat | 53 ++++++++++++++++++++++++++++++++++
 tree-construction/tables01.dat | 14 +++++++++
 tree-construction/tests2.dat   | 10 +++++++
 tree-construction/tests4.dat   | 16 ++++++++++
 tree-construction/tests7.dat   | 36 +++++++++++++++++++++++
 5 files changed, 129 insertions(+)
 create mode 100644 tree-construction/quirks01.dat

diff --git a/tree-construction/quirks01.dat b/tree-construction/quirks01.dat
new file mode 100644
index 00000000..bc58de5c
--- /dev/null
+++ b/tree-construction/quirks01.dat
@@ -0,0 +1,53 @@
+#data
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
+"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"><p><table>
+#errors
+(2,54): unknown-doctype
+(2,64): eof-in-table
+#document
+| <!DOCTYPE html "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|     <table>
+
+#data
+<!DOCTYPE html SYSTEM "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd"><p><table>
+#errors
+(1,83): unknown-doctype
+(1,93): eof-in-table
+#document
+| <!DOCTYPE html "" "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd">
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <table>
+
+#data
+<!DOCTYPE html PUBLIC "html"><p><table>
+#errors
+(1,30): unknown-doctype
+(1,39): eof-in-table
+#document
+| <!DOCTYPE html "html" "">
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <table>
+
+#data
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"
+   "http://www.w3.org/TR/html4/strict.dtd"><p><table>
+#errors
+(2,43): unknown-doctype
+(2,53): eof-in-table
+#document
+| <!DOCTYPE html "-//W3C//DTD HTML 3.2//EN" "http://www.w3.org/TR/html4/strict.dtd">
+| <html>
+|   <head>
+|   <body>
+|     <p>
+|       <table>
diff --git a/tree-construction/tables01.dat b/tree-construction/tables01.dat
index decd68b5..aa7915eb 100644
--- a/tree-construction/tables01.dat
+++ b/tree-construction/tables01.dat
@@ -306,3 +306,17 @@
 |       <table>
 |       <s>
 |       <table>
+
+#data
+<table>a<!doctype html>
+#errors
+(1,1): expected-doctype-but-got-start-tag
+(1,8): illegal-character-token
+(1,9): illegal-doctype
+(1,24): expected-closing-tag-but-got-eof
+#document
+| <html>
+|   <head>
+|   <body>
+|     "a"
+|     <table>
diff --git a/tree-construction/tests2.dat b/tree-construction/tests2.dat
index b44fec4d..11ef9b16 100644
--- a/tree-construction/tests2.dat
+++ b/tree-construction/tests2.dat
@@ -584,6 +584,16 @@
 |   <head>
 |   <body>
 
+#data
+<!DOCTYPE html> <!DOCTYPE html>
+#errors
+Line: 1 Col: 31 Unexpected DOCTYPE. Ignored.
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|   <body>
+
 #data
 test
 test
diff --git a/tree-construction/tests4.dat b/tree-construction/tests4.dat
index 0a6174c3..4f0cf70e 100644
--- a/tree-construction/tests4.dat
+++ b/tree-construction/tests4.dat
@@ -56,3 +56,19 @@ head
 #document
 | <title>
 |   "setting head's innerHTML"
+
+#data
+direct <title> content
+#errors
+#document-fragment
+title
+#document
+| "direct <title> content"
+
+#data
+<!-- inside </script> -->
+#errors
+#document-fragment
+script
+#document
+| "<!-- inside </script> -->"
diff --git a/tree-construction/tests7.dat b/tree-construction/tests7.dat
index 8c5596b0..b2db4de1 100644
--- a/tree-construction/tests7.dat
+++ b/tree-construction/tests7.dat
@@ -46,6 +46,42 @@
 |       "X"
 |   <body>
 
+#data
+<!doctype html></head><base>X
+#errors
+(1,28): unexpected-start-tag-out-of-my-head
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|     <base>
+|   <body>
+|     "X"
+
+#data
+<!doctype html></head><basefont>X
+#errors
+(1,32): unexpected-start-tag-out-of-my-head
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|     <basefont>
+|   <body>
+|     "X"
+
+#data
+<!doctype html></head><bgsound>X
+#errors
+(1,31): unexpected-start-tag-out-of-my-head
+#document
+| <!DOCTYPE html>
+| <html>
+|   <head>
+|     <bgsound>
+|   <body>
+|     "X"
+
 #data
 <!doctype html><table><meta></table>
 #errors