From de70ae4ca71b15bc8c77f54c4e60ed345214c4df Mon Sep 17 00:00:00 2001
From: inikulin
Date: Fri, 23 Jun 2017 21:50:04 +0300
Subject: [PATCH 01/68] Fix malformed JSON from previous commit
---
tokenizer/entities.test | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tokenizer/entities.test b/tokenizer/entities.test
index 1daff254..7c514563 100644
--- a/tokenizer/entities.test
+++ b/tokenizer/entities.test
@@ -17,14 +17,14 @@
{"description": "Semicolonless named entity 'not' followed by 'i;' in body",
"input":"¬i;",
-"output": [["Character", "\u00ACi;"]]},
+"output": [["Character", "\u00ACi;"]],
"errors":[
{ "code": "missing-semicolon-after-character-reference", "line": 1, "col": 5 }
]},
{"description": "Very long undefined named entity in body",
"input":"&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;",
-"output": [["Character", "&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;"]]},
+"output": [["Character", "&ammmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmp;"]],
"errors":[
{ "code": "unknown-named-character-reference", "line": 1, "col": 950 }
]},
From a5c88a483e4f643a5446ecca579ce344e6bd6d8a Mon Sep 17 00:00:00 2001
From: Ingvar Stepanyan
Date: Wed, 12 Jul 2017 16:53:16 +0100
Subject: [PATCH 02/68] Remove `ignoreErrorOrder` option from docs
It's not used anymore with changes in #92.
---
tokenizer/README.md | 8 --------
1 file changed, 8 deletions(-)
diff --git a/tokenizer/README.md b/tokenizer/README.md
index 56956369..50ba680f 100644
--- a/tokenizer/README.md
+++ b/tokenizer/README.md
@@ -84,14 +84,6 @@ If `test.doubleEscaped` is present and `true`, then every string within
`test.output` must be further unescaped (as described above) before
comparing with the tokenizer's output.
-`test.ignoreErrorOrder` is a boolean value indicating that the order of
-`ParseError` tokens relative to other tokens in the output stream is
-unimportant, and implementations should ignore such differences between
-their output and `expected_output_tokens`. (This is used for errors
-emitted by the input stream preprocessing stage, since it is useful to
-test that code but it is undefined when the errors occur). If it is
-omitted, it defaults to `false`.
-
xmlViolation tests
------------------
From 8e19e7ad29473842154977d7624aee0097a6def2 Mon Sep 17 00:00:00 2001
From: Ingvar Stepanyan
Date: Mon, 17 Jul 2017 15:56:04 +0100
Subject: [PATCH 03/68] Concatenate character tokens
Looks like these few places were missed when ParseError token type was removed.
This PR fixes them to restore the state promised in the README:
> All adjacent character tokens are coalesced into a single ["Character", data] token.
---
tokenizer/test1.test | 4 ++--
tokenizer/test2.test | 6 +++---
tokenizer/test3.test | 4 ++--
tokenizer/test4.test | 6 +++---
tokenizer/unicodeCharsProblematic.test | 4 ++--
5 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/tokenizer/test1.test b/tokenizer/test1.test
index 09d15024..8b85050f 100644
--- a/tokenizer/test1.test
+++ b/tokenizer/test1.test
@@ -182,14 +182,14 @@
{"description":"Entity without trailing semicolon (1)",
"input":"I'm ¬it",
-"output":[["Character","I'm "], ["Character", "\u00ACit"]],
+"output":[["Character","I'm \u00ACit"]],
"errors": [
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
]},
{"description":"Entity without trailing semicolon (2)",
"input":"I'm ¬in",
-"output":[["Character","I'm "], ["Character", "\u00ACin"]],
+"output":[["Character","I'm \u00ACin"]],
"errors": [
{"code" : "missing-semicolon-after-character-reference", "line": 1, "col": 9 }
]},
diff --git a/tokenizer/test2.test b/tokenizer/test2.test
index 73f0421d..521694ca 100644
--- a/tokenizer/test2.test
+++ b/tokenizer/test2.test
@@ -119,7 +119,7 @@
{"description":"Hexadecimal entity pair representing a surrogate pair",
"input":"",
-"output":[["Character", "\uFFFD"], ["Character", "\uFFFD"]],
+"output":[["Character", "\uFFFD\uFFFD"]],
"errors":[
{ "code": "surrogate-character-reference", "line": 1, "col": 9 },
{ "code": "surrogate-character-reference", "line": 1, "col": 17 }
@@ -195,7 +195,7 @@
{"description":"Unescaped <",
"input":"foo < bar",
-"output":[["Character", "foo "], ["Character", "< bar"]],
+"output":[["Character", "foo < bar"]],
"errors":[
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 6 }
]},
@@ -242,7 +242,7 @@
{"description":"Empty end tag with following characters",
"input":"a>bc",
-"output":[["Character", "a"], ["Character", "bc"]],
+"output":[["Character", "abc"]],
"errors":[
{ "code": "missing-end-tag-name", "line": 1, "col": 4 }
]},
diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index ba3c15b3..85139d4d 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -88,7 +88,7 @@
{"description":"<\\u0000",
"input":"<\u0000",
-"output":[["Character", "<"], ["Character", "\u0000"]],
+"output":[["Character", "<\u0000"]],
"errors":[
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 2 },
{ "code": "unexpected-null-character", "line": 1, "col": 2 }
@@ -8415,7 +8415,7 @@
{"description":"<<",
"input":"<<",
-"output":[["Character", "<"], ["Character", "<"]],
+"output":[["Character", "<<"]],
"errors":[
{ "code": "invalid-first-character-of-tag-name", "line": 1, "col": 2 },
{ "code": "eof-before-tag-name", "line": 1, "col": 3 }
diff --git a/tokenizer/test4.test b/tokenizer/test4.test
index 8e55e767..dd247d54 100644
--- a/tokenizer/test4.test
+++ b/tokenizer/test4.test
@@ -190,7 +190,7 @@
{"description":"Empty hex numeric entities",
"input":" ",
-"output":[["Character", " "], ["Character", " "]],
+"output":[["Character", " "]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 4 },
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 8 }
@@ -205,7 +205,7 @@
{"description":"Empty decimal numeric entities",
"input":" ",
-"output":[["Character", " "], ["Character", " "]],
+"output":[["Character", " "]],
"errors":[
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 3 },
{ "code": "absence-of-digits-in-numeric-character-reference", "line": 1, "col": 6 }
@@ -274,7 +274,7 @@
{"description":"Surrogate code point edge cases",
"input":"",
-"output":[["Character", "\uD7FF"], ["Character", "\uFFFD"], ["Character", "\uFFFD"], ["Character", "\uFFFD"], ["Character", "\uFFFD\uE000"]],
+"output":[["Character", "\uD7FF\uFFFD\uFFFD\uFFFD\uFFFD\uE000"]],
"errors":[
{ "code": "surrogate-character-reference", "line": 1, "col": 17 },
{ "code": "surrogate-character-reference", "line": 1, "col": 25 },
diff --git a/tokenizer/unicodeCharsProblematic.test b/tokenizer/unicodeCharsProblematic.test
index 346cad17..3ddb96c0 100644
--- a/tokenizer/unicodeCharsProblematic.test
+++ b/tokenizer/unicodeCharsProblematic.test
@@ -18,7 +18,7 @@
{"description": "Invalid Unicode character U+DFFF with valid preceding character",
"doubleEscaped":true,
"input": "a\\uDFFF",
-"output":[["Character", "a"], ["Character", "\\uDFFF"]],
+"output":[["Character", "a\\uDFFF"]],
"errors":[
{ "code": "surrogate-in-input-stream", "line": 1, "col": 2 }
]},
@@ -33,7 +33,7 @@
{"description":"CR followed by U+0000",
"input":"\r\u0000",
-"output":[["Character", "\n"], ["Character", "\u0000"]],
+"output":[["Character", "\n\u0000"]],
"errors":[
{ "code": "unexpected-null-character", "line": 2, "col": 1 }
]}
From 9314ef76ec48af7fe89aba23e754d47df6bb8a4b Mon Sep 17 00:00:00 2001
From: Ingvar Stepanyan
Date: Tue, 25 Jul 2017 22:36:23 +0100
Subject: [PATCH 04/68] Add a list of currently allowed initial states (#101)
Fixes #99
---
tokenizer/README.md | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/tokenizer/README.md b/tokenizer/README.md
index 50ba680f..66b81e8f 100644
--- a/tokenizer/README.md
+++ b/tokenizer/README.md
@@ -45,9 +45,18 @@ into the corresponding Unicode code point. (Note that this option also
affects the interpretation of `test.output`.)
`test.initialStates` is a list of strings, each being the name of a
-tokenizer state. The test should be run once for each string, using it
+tokenizer state which can be one of the following:
+
+- `Data state`
+- `PLAINTEXT state`
+- `RCDATA state`
+- `RAWTEXT state`
+- `Script data state`
+- `CDATA section state`
+
+ The test should be run once for each string, using it
to set the tokenizer's initial state for that run. If
-`test.initialStates` is omitted, it defaults to `["data state"]`.
+`test.initialStates` is omitted, it defaults to `["Data state"]`.
`test.lastStartTag` is a lowercase string that should be used as "the
tag name of the last start tag to have been emitted from this
From cbafeba94586a1ade00d55e600fc52da8f849986 Mon Sep 17 00:00:00 2001
From: Simon Pieters
Date: Tue, 22 Aug 2017 11:34:03 +0200
Subject: [PATCH 05/68] Test U+0000 in bogus comment and bogus doctype states
Follows https://github.com/whatwg/html/pull/2939
---
tokenizer/test3.test | 161 +++++++++++++++++++++---
tokenizer/test4.test | 3 +-
tree-construction/plain-text-unsafe.dat | Bin 9291 -> 9388 bytes
3 files changed, 148 insertions(+), 16 deletions(-)
diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index 85139d4d..cb04d037 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -141,7 +141,8 @@
"input":"$`G|yxwc7w^Qc~*hw&Bu73i2(q7H3_Z&
From be9fb2431d679e4e0c4a9db5f350cf0686a729b1 Mon Sep 17 00:00:00 2001
From: Henri Sivonen
Date: Tue, 23 Jan 2018 17:33:37 +0200
Subject: [PATCH 06/68] Move `#script-off` to the usual place relative to the
other sections of a test
---
tree-construction/tests18.dat | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tree-construction/tests18.dat b/tree-construction/tests18.dat
index 3ce39fc6..05363b39 100644
--- a/tree-construction/tests18.dat
+++ b/tree-construction/tests18.dat
@@ -51,11 +51,11 @@
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
#document
|
|
@@ -127,9 +127,9 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
+#script-off
#document
|
|
@@ -140,12 +140,12 @@ Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
Line: 1 Col: 21 Element br not allowed in a inhead-noscript context
Line: 1 Col: 21 Unexpected end tag (br). Treated as br element.
Line: 1 Col: 42 Unexpected end tag (noscript). Ignored.
+#script-off
#document
|
|
@@ -156,10 +156,10 @@ Line: 1 Col: 42 Unexpected end tag (noscript). Ignored.
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
Line: 1 Col: 34 Unexpected start tag (head).
+#script-off
#document
|
|
@@ -169,10 +169,10 @@ Line: 1 Col: 34 Unexpected start tag (head).
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
Line: 1 Col: 34 Unexpected start tag (noscript).
+#script-off
#document
|
|
@@ -182,10 +182,10 @@ Line: 1 Col: 34 Unexpected start tag (noscript).
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
Line: 1 Col: 20 Unexpected end tag (p). Ignored.
+#script-off
#document
|
|
@@ -195,11 +195,11 @@ Line: 1 Col: 20 Unexpected end tag (p). Ignored.
#data
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
Line: 1 Col: 19 Element p not allowed in a inhead-noscript context
Line: 1 Col: 40 Unexpected end tag (noscript). Ignored.
+#script-off
#document
|
|
@@ -210,12 +210,12 @@ Line: 1 Col: 40 Unexpected end tag (noscript). Ignored.
#data
XXX
-#script-off
#errors
Line: 1 Col: 6 Unexpected start tag (head). Expected DOCTYPE.
Line: 1 Col: 19 Unexpected non-space character. Expected inhead-noscript content
Line: 1 Col: 30 Unexpected end tag (noscript). Ignored.
Line: 1 Col: 37 Unexpected end tag (head). Ignored.
+#script-off
#document
|
|
@@ -226,10 +226,10 @@ Line: 1 Col: 37 Unexpected end tag (head). Ignored.
#data
-#script-off
#errors
(1,6): expected-doctype-but-got-tag
(1,6): eof-in-head-noscript
+#script-off
#document
|
|
From 4fffa16ca4c5643cfd438729b3e8c13714721819 Mon Sep 17 00:00:00 2001
From: Henri Sivonen
Date: Thu, 4 Apr 2019 15:05:20 +0300
Subject: [PATCH 08/68] Add tests for line breaks in the comment end bang state
---
tokenizer/test3.test | 28 ++++++++++++++++++++++++++++
tree-construction/README.md | 10 +++++++---
tree-construction/comments01.dat | 28 ++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index cb04d037..2fd93049 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -954,6 +954,34 @@
{ "code": "incorrectly-closed-comment", "line": 1, "col": 8 }
]},
+{"description":"BAZ
|
| "BAZ"
+#data
+FOO
+
+#data
+FOO
+
#data
FOO",
+ "output":[["Character", ""]]
+ },
+ {
+ "description":"Dash less-than in script HTML comment",
+ "initialStates":["Script data state"],
+ "input":"",
+ "output":[["Character", ""]]
+ },
+ {
+ "description":"Dash at end of script HTML comment",
+ "initialStates":["Script data state"],
+ "input":"",
+ "output":[["Character", ""]]
+ },
+ {
+ "description":" in script HTML comment",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":"",
+ "output":[["Character", ""], ["EndTag", "script"]]
+ },
+ {
+ "description":" in script HTML comment - double escaped",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":"",
+ "output":[["Character", ""], ["EndTag", "script"]]
+ },
+ {
+ "description":" in script HTML comment - double escaped with nested -->",
+ "output":[["Character", ""], ["EndTag", "script"]]
+ },
+ {
+ "description":" in script HTML comment - double escaped with abrupt end",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":" -->",
+ "output":[["Character", ""], ["EndTag", "script"], ["Character", " -->"], ["EndTag", "script"]]
+ },
+ {
+ "description":"Incomplete start tag in script HTML comment double escaped",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":"",
+ "output":[["Character", ""]]
+ },
+ {
+ "description":"Unclosed start tag in script HTML comment double escaped",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":"",
+ "output":[["Character", ""]]
+ },
+ {
+ "description":"Incomplete end tag in script HTML comment double escaped",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":"",
+ "output":[["Character", ""]]
+ },
+ {
+ "description":"Unclosed end tag in script HTML comment double escaped",
+ "initialStates":["Script data state"],
+ "lastStartTag":"script",
+ "input":"",
+ "output":[["Character", ""]]
+ },
{
"description":"leading U+FEFF must pass through",
+ "initialStates":["Data state", "RCDATA state", "RAWTEXT state", "Script data state"],
"doubleEscaped":true,
"input":"\\uFEFFfoo\\uFEFFbar",
"output":[["Character", "\\uFEFFfoo\\uFEFFbar"]]
},
{
- "description":"Non BMP-charref in in RCDATA",
+ "description":"Non BMP-charref in RCDATA",
"initialStates":["RCDATA state"],
"input":"≂̸",
"output":[["Character", "\u2242\u0338"]]
},
{
- "description":"Bad charref in in RCDATA",
+ "description":"Bad charref in RCDATA",
"initialStates":["RCDATA state"],
"input":"&NotEqualTild;",
"output":[["Character", "&NotEqualTild;"]],
@@ -134,36 +216,36 @@
]
},
{
- "description":"lowercase endtags in RCDATA and RAWTEXT",
- "initialStates":["RCDATA state", "RAWTEXT state"],
+ "description":"lowercase endtags",
+ "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"",
"output":[["EndTag","xmp"]]
},
{
- "description":"bad endtag in RCDATA and RAWTEXT",
- "initialStates":["RCDATA state", "RAWTEXT state"],
+ "description":"bad endtag (space before name)",
+ "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":" XMP>",
"output":[["Character"," XMP>"]]
},
{
- "description":"bad endtag in RCDATA and RAWTEXT",
- "initialStates":["RCDATA state", "RAWTEXT state"],
+ "description":"bad endtag (not matching last start tag)",
+ "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"",
"output":[["Character",""]]
},
{
- "description":"bad endtag in RCDATA and RAWTEXT",
- "initialStates":["RCDATA state", "RAWTEXT state"],
+ "description":"bad endtag (without close bracket)",
+ "initialStates":["RCDATA state", "RAWTEXT state", "Script data state"],
"lastStartTag":"xmp",
"input":"",
+ "initialStates":["CDATA section state"],
+ "output":[["Character", "foo "]]
+ },
+ {
+ "description":"CDATA followed by HTML content",
+ "input":"foo ]]> ",
+ "initialStates":["CDATA section state"],
+ "output":[["Character", "foo "]]
+ },
+ {
+ "description":"CDATA with extra bracket",
+ "input":"foo]]]>",
+ "initialStates":["CDATA section state"],
+ "output":[["Character", "foo]"]]
+ },
+ {
+ "description":"CDATA without end marker",
+ "input":"foo",
+ "initialStates":["CDATA section state"],
+ "output":[["Character", "foo"]],
+ "errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 4 }
+ ]
+ },
+ {
+ "description":"CDATA with single bracket ending",
+ "input":"foo]",
+ "initialStates":["CDATA section state"],
+ "output":[["Character", "foo]"]],
+ "errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 5 }
+ ]
+ },
+ {
+ "description":"CDATA with two brackets ending",
+ "input":"foo]]",
"initialStates":["CDATA section state"],
- "output":[["Character", "foo&bar"]],
+ "output":[["Character", "foo]]"]],
"errors":[
- { "code": "eof-in-cdata", "line": 1, "col": 8 }
+ { "code": "eof-in-cdata", "line": 1, "col": 6 }
]
}
diff --git a/tokenizer/entities.test b/tokenizer/entities.test
index 7c514563..a6469cd0 100644
--- a/tokenizer/entities.test
+++ b/tokenizer/entities.test
@@ -1,13 +1,47 @@
{"tests": [
-{"description": "Undefined named entity in attribute value ending in semicolon and whose name starts with a known entity name.",
+{"description": "Undefined named entity in a double-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
+"input":"",
+"output": [["StartTag", "h", {"a": "¬i;"}]]},
+
+{"description": "Entity name requiring semicolon instead followed by the equals sign in a double-quoted attribute value.",
+"input":"",
+"output": [["StartTag", "h", {"a": "&lang="}]]},
+
+{"description": "Valid entity name followed by the equals sign in a double-quoted attribute value.",
+"input":"",
+"output": [["StartTag", "h", {"a": "¬="}]]},
+
+{"description": "Undefined named entity in a single-quoted attribute value ending in semicolon and whose name starts with a known entity name.",
"input":"",
"output": [["StartTag", "h", {"a": "¬i;"}]]},
-{"description": "Entity name followed by the equals sign in an attribute value.",
+{"description": "Entity name requiring semicolon instead followed by the equals sign in a single-quoted attribute value.",
"input":"",
"output": [["StartTag", "h", {"a": "&lang="}]]},
+{"description": "Valid entity name followed by the equals sign in a single-quoted attribute value.",
+"input":"",
+"output": [["StartTag", "h", {"a": "¬="}]]},
+
+{"description": "Undefined named entity in an unquoted attribute value ending in semicolon and whose name starts with a known entity name.",
+"input":"",
+"output": [["StartTag", "h", {"a": "¬i;"}]]},
+
+{"description": "Entity name requiring semicolon instead followed by the equals sign in an unquoted attribute value.",
+"input":"",
+"output": [["StartTag", "h", {"a": "&lang="}]],
+"errors":[
+ { "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 11 }
+]},
+
+{"description": "Valid entity name followed by the equals sign in an unquoted attribute value.",
+"input":"",
+"output": [["StartTag", "h", {"a": "¬="}]],
+"errors":[
+ { "code": "unexpected-character-in-unquoted-attribute-value", "line": 1, "col": 10 }
+]},
+
{"description": "Ambiguous ampersand.",
"input":"&rrrraannddom;",
"output": [["Character", "&rrrraannddom;"]],
diff --git a/tokenizer/test1.test b/tokenizer/test1.test
index 8b85050f..cb0eb48a 100644
--- a/tokenizer/test1.test
+++ b/tokenizer/test1.test
@@ -102,6 +102,10 @@
"input":"",
"output":[["Comment", " --comment "]]},
+{"description":"Comment, central less-than bang",
+"input":"",
+"output":[["Comment", "",
"output":[["Comment", ""]],
@@ -135,6 +145,18 @@
"input":"",
"output":[["Comment", ""]]},
+{"description":"< in comment",
+"input":"",
+"output":[["Comment", " ",
+"output":[["Comment", " ",
+"output":[["Comment", " ",
"output":[["Comment", " ",
+"output":[["Comment", " <",
+"output":[["Character", ""]]},
+
+{"description":"",
+"output":[["Character", ""]]},
+
+{"description":"",
+"output":[["Character", ""]]},
+
+{"description":"Escaped script data",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"< in script HTML comment",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":" in script HTML comment",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"Start tag in script HTML comment",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"End tag in script HTML comment",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"- in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"-- in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"--- in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"- spaced in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
+{"description":"-- spaced in script HTML comment double escaped",
+"initialStates":["Script data state"],
+"input":"",
+"output":[["Character", ""]]},
+
{"description":"Ampersand EOF",
"input":"&",
"output":[["Character", "&"]]},
diff --git a/tokenizer/test2.test b/tokenizer/test2.test
index 521694ca..f80f27d1 100644
--- a/tokenizer/test2.test
+++ b/tokenizer/test2.test
@@ -50,6 +50,10 @@
"input":"",
"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
+{"description":"DOCTYPE with single-quoted systemId",
+"input":"",
+"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
+
{"description":"DOCTYPE with publicId and systemId",
"input":"",
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
diff --git a/tokenizer/test3.test b/tokenizer/test3.test
index d1b323a5..814482c4 100644
--- a/tokenizer/test3.test
+++ b/tokenizer/test3.test
@@ -1,84 +1,451 @@
{"tests": [
{"description":"[empty]",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"",
"output":[]},
+{"description":"[empty]",
+"initialStates":["CDATA section state"],
+"input":"",
+"output":[],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"\\u0009",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"\u0009",
"output":[["Character", "\u0009"]]},
+{"description":"\\u0009",
+"initialStates":["CDATA section state"],
+"input":"\u0009",
+"output":[["Character", "\u0009"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"\\u000A",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"\u000A",
"output":[["Character", "\u000A"]]},
+{"description":"\\u000A",
+"initialStates":["CDATA section state"],
+"input":"\u000A",
+"output":[["Character", "\u000A"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"\\u000B",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"\u000B",
"output":[["Character", "\u000B"]],
"errors":[
{ "code": "control-character-in-input-stream", "line": 1, "col": 1 }
]},
+{"description":"\\u000B",
+"initialStates":["CDATA section state"],
+"input":"\u000B",
+"output":[["Character", "\u000B"]],
+"errors":[
+ { "code": "control-character-in-input-stream", "line": 1, "col": 1 },
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"\\u000C",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"\u000C",
"output":[["Character", "\u000C"]]},
+{"description":"\\u000C",
+"initialStates":["CDATA section state"],
+"input":"\u000C",
+"output":[["Character", "\u000C"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":" ",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":" ",
"output":[["Character", " "]]},
+{"description":" ",
+"initialStates":["CDATA section state"],
+"input":" ",
+"output":[["Character", " "]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"!",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"!",
"output":[["Character", "!"]]},
+{"description":"!",
+"initialStates":["CDATA section state"],
+"input":"!",
+"output":[["Character", "!"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"\"",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"\"",
"output":[["Character", "\""]]},
+{"description":"\"",
+"initialStates":["CDATA section state"],
+"input":"\"",
+"output":[["Character", "\""]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"%",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"%",
"output":[["Character", "%"]]},
+{"description":"%",
+"initialStates":["CDATA section state"],
+"input":"%",
+"output":[["Character", "%"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"&",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"&",
"output":[["Character", "&"]]},
+{"description":"&",
+"initialStates":["CDATA section state"],
+"input":"&",
+"output":[["Character", "&"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"'",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"'",
"output":[["Character", "'"]]},
+{"description":"'",
+"initialStates":["CDATA section state"],
+"input":"'",
+"output":[["Character", "'"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":",",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":",",
"output":[["Character", ","]]},
+{"description":",",
+"initialStates":["CDATA section state"],
+"input":",",
+"output":[["Character", ","]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"-",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"-",
"output":[["Character", "-"]]},
+{"description":"-",
+"initialStates":["CDATA section state"],
+"input":"-",
+"output":[["Character", "-"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":".",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":".",
"output":[["Character", "."]]},
+{"description":".",
+"initialStates":["CDATA section state"],
+"input":".",
+"output":[["Character", "."]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"/",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"/",
"output":[["Character", "/"]]},
+{"description":"/",
+"initialStates":["CDATA section state"],
+"input":"/",
+"output":[["Character", "/"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"0",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"0",
"output":[["Character", "0"]]},
+{"description":"0",
+"initialStates":["CDATA section state"],
+"input":"0",
+"output":[["Character", "0"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"1",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"1",
"output":[["Character", "1"]]},
+{"description":"1",
+"initialStates":["CDATA section state"],
+"input":"1",
+"output":[["Character", "1"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"9",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"9",
"output":[["Character", "9"]]},
+{"description":"9",
+"initialStates":["CDATA section state"],
+"input":"9",
+"output":[["Character", "9"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":";",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":";",
"output":[["Character", ";"]]},
+{"description":";",
+"initialStates":["CDATA section state"],
+"input":";",
+"output":[["Character", ";"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
+{"description":";=",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";=",
+"output":[["Character", ";="]]},
+
+{"description":";=",
+"initialStates":["CDATA section state"],
+"input":";=",
+"output":[["Character", ";="]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";>",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";>",
+"output":[["Character", ";>"]]},
+
+{"description":";>",
+"initialStates":["CDATA section state"],
+"input":";>",
+"output":[["Character", ";>"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";?",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";?",
+"output":[["Character", ";?"]]},
+
+{"description":";?",
+"initialStates":["CDATA section state"],
+"input":";?",
+"output":[["Character", ";?"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";@",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";@",
+"output":[["Character", ";@"]]},
+
+{"description":";@",
+"initialStates":["CDATA section state"],
+"input":";@",
+"output":[["Character", ";@"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";A",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";A",
+"output":[["Character", ";A"]]},
+
+{"description":";A",
+"initialStates":["CDATA section state"],
+"input":";A",
+"output":[["Character", ";A"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";B",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";B",
+"output":[["Character", ";B"]]},
+
+{"description":";B",
+"initialStates":["CDATA section state"],
+"input":";B",
+"output":[["Character", ";B"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";Y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";Y",
+"output":[["Character", ";Y"]]},
+
+{"description":";Y",
+"initialStates":["CDATA section state"],
+"input":";Y",
+"output":[["Character", ";Y"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";Z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";Z",
+"output":[["Character", ";Z"]]},
+
+{"description":";Z",
+"initialStates":["CDATA section state"],
+"input":";Z",
+"output":[["Character", ";Z"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";`",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";`",
+"output":[["Character", ";`"]]},
+
+{"description":";`",
+"initialStates":["CDATA section state"],
+"input":";`",
+"output":[["Character", ";`"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";a",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";a",
+"output":[["Character", ";a"]]},
+
+{"description":";a",
+"initialStates":["CDATA section state"],
+"input":";a",
+"output":[["Character", ";a"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";b",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";b",
+"output":[["Character", ";b"]]},
+
+{"description":";b",
+"initialStates":["CDATA section state"],
+"input":";b",
+"output":[["Character", ";b"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";y",
+"output":[["Character", ";y"]]},
+
+{"description":";y",
+"initialStates":["CDATA section state"],
+"input":";y",
+"output":[["Character", ";y"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";z",
+"output":[["Character", ";z"]]},
+
+{"description":";z",
+"initialStates":["CDATA section state"],
+"input":";z",
+"output":[["Character", ";z"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";{",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";{",
+"output":[["Character", ";{"]]},
+
+{"description":";{",
+"initialStates":["CDATA section state"],
+"input":";{",
+"output":[["Character", ";{"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
+{"description":";\\uDBC0\\uDC00",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
+"input":";\uDBC0\uDC00",
+"output":[["Character", ";\uDBC0\uDC00"]]},
+
+{"description":";\\uDBC0\\uDC00",
+"initialStates":["CDATA section state"],
+"input":";\uDBC0\uDC00",
+"output":[["Character", ";\uDBC0\uDC00"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 3 }
+]},
+
{"description":"<",
"input":"<",
"output":[["Character", "<"]],
@@ -10669,63 +11036,198 @@
]},
{"description":"=",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"=",
"output":[["Character", "="]]},
+{"description":"=",
+"initialStates":["CDATA section state"],
+"input":"=",
+"output":[["Character", "="]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":">",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":">",
"output":[["Character", ">"]]},
+{"description":">",
+"initialStates":["CDATA section state"],
+"input":">",
+"output":[["Character", ">"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"?",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"?",
"output":[["Character", "?"]]},
+{"description":"?",
+"initialStates":["CDATA section state"],
+"input":"?",
+"output":[["Character", "?"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"@",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"@",
"output":[["Character", "@"]]},
+{"description":"@",
+"initialStates":["CDATA section state"],
+"input":"@",
+"output":[["Character", "@"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"A",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"A",
"output":[["Character", "A"]]},
+{"description":"A",
+"initialStates":["CDATA section state"],
+"input":"A",
+"output":[["Character", "A"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"B",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"B",
"output":[["Character", "B"]]},
+{"description":"B",
+"initialStates":["CDATA section state"],
+"input":"B",
+"output":[["Character", "B"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"Y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"Y",
"output":[["Character", "Y"]]},
+{"description":"Y",
+"initialStates":["CDATA section state"],
+"input":"Y",
+"output":[["Character", "Y"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"Z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"Z",
"output":[["Character", "Z"]]},
+{"description":"Z",
+"initialStates":["CDATA section state"],
+"input":"Z",
+"output":[["Character", "Z"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"`",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"`",
"output":[["Character", "`"]]},
+{"description":"`",
+"initialStates":["CDATA section state"],
+"input":"`",
+"output":[["Character", "`"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"a",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"a",
"output":[["Character", "a"]]},
+{"description":"a",
+"initialStates":["CDATA section state"],
+"input":"a",
+"output":[["Character", "a"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"b",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"b",
"output":[["Character", "b"]]},
+{"description":"b",
+"initialStates":["CDATA section state"],
+"input":"b",
+"output":[["Character", "b"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"y",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"y",
"output":[["Character", "y"]]},
+{"description":"y",
+"initialStates":["CDATA section state"],
+"input":"y",
+"output":[["Character", "y"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"z",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"z",
"output":[["Character", "z"]]},
+{"description":"z",
+"initialStates":["CDATA section state"],
+"input":"z",
+"output":[["Character", "z"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"{",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"{",
"output":[["Character", "{"]]},
+{"description":"{",
+"initialStates":["CDATA section state"],
+"input":"{",
+"output":[["Character", "{"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]},
+
{"description":"\\uDBC0\\uDC00",
+"initialStates":["Data state", "PLAINTEXT state", "RCDATA state", "RAWTEXT state", "Script data state"],
"input":"\uDBC0\uDC00",
-"output":[["Character", "\uDBC0\uDC00"]]}
+"output":[["Character", "\uDBC0\uDC00"]]},
+
+{"description":"\\uDBC0\\uDC00",
+"initialStates":["CDATA section state"],
+"input":"\uDBC0\uDC00",
+"output":[["Character", "\uDBC0\uDC00"]],
+"errors":[
+ { "code": "eof-in-cdata", "line": 1, "col": 2 }
+]}
]}
From 71eebd59772d1d39aced0c0582ae9c09acf3ce6e Mon Sep 17 00:00:00 2001
From: Sam Sneddon
Date: Tue, 26 May 2020 23:28:15 +0100
Subject: [PATCH 12/68] Add a test for order of comments after
Notably, html5lib-python's lxml treebuilder gets this wrong:
https://github.com/html5lib/html5lib-python/issues/488
---
tree-construction/webkit01.dat | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tree-construction/webkit01.dat b/tree-construction/webkit01.dat
index b5fafdc7..2127cfe1 100644
--- a/tree-construction/webkit01.dat
+++ b/tree-construction/webkit01.dat
@@ -307,6 +307,20 @@ console.log("FOOBARBAZ");
|
|
+#data
+
+#errors
+(1,6): expected-doctype-but-got-start-tag
+#document
+|
+|
+|
+|
+|
+|
+|
+|
+
#data
x
#errors
From bef9ad1e6ffe8ed6a084be7fbc3ba521eab70844 Mon Sep 17 00:00:00 2001
From: "Michael[tm] Smith"
Date: Thu, 13 Aug 2020 11:23:40 +0900
Subject: [PATCH 13/68] Test SVG fragment parsing w/ td/tr/tbody context
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This change adds a new tree-construction/svg.dat file that’s essentially an
analogue of the existing tree-construction/math.dat file. It contains tests
for fragment parsing of SVG content with td, tr, and tbody/thead/tfoot
context elements.
---
tree-construction/svg.dat | 81 +++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
create mode 100644 tree-construction/svg.dat
diff --git a/tree-construction/svg.dat b/tree-construction/svg.dat
new file mode 100644
index 00000000..8e9a2bbb
--- /dev/null
+++ b/tree-construction/svg.dat
@@ -0,0 +1,81 @@
+#data
+