8000 Improve CDATA and comment parse performance (#246) · ruby/rexml@a5f31c4 · GitHub
[go: up one dir, main page]

Skip to content

Commit a5f31c4

Browse files
naitohkou
andauthored
Improve CDATA and comment parse performance (#246)
## Why? Since `<a><!a` and `<a><!a>` are malformed node, they do not need to be checked before comments and CDATA. ## Benchmark : comment (after_doctype) ``` $ benchmark-driver benchmark/parse_comment.yaml Calculating ------------------------------------- before after before(YJIT) after(YJIT) after_doctype 1.306k 5.586k 1.152k 3.569k i/s - 100.000 times in 0.076563s 0.017903s 0.086822s 0.028020s Comparison: after_doctype after: 5585.7 i/s after(YJIT): 3568.9 i/s - 1.57x slower before: 1306.1 i/s - 4.28x slower before(YJIT): 1151.8 i/s - 4.85x slower ``` - YJIT=ON : 3.09x faster - YJIT=OFF : 4.28x faster ## Benchmark : CDATA ``` $ benchmark-driver benchmark/parse_cdata.yaml Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 1.269k 5.548k 1.053k 3.072k i/s - 100.000 times in 0.078808s 0.018026s 0.094976s 0.032553s sax 1.399k 8.244k 1.220k 4.460k i/s - 100.000 times in 0.071458s 0.012130s 0.081958s 0.022422s pull 1.411k 8.319k 1.260k 4.806k i/s - 100.000 times in 0.070883s 0.012021s 0.079335s 0.020809s stream 1.420k 8.320k 1.254k 4.728k i/s - 100.000 times in 0.070406s 0.012019s 0.079738s 0.021149s Comparison: dom after: 5547.5 i/s after(YJIT): 3071.9 i/s - 1.81x slower before: 1268.9 i/s - 4.37x slower before(YJIT): 1052.9 i/s - 5.27x slower sax after: 8244.0 i/s after(YJIT): 4459.9 i/s - 1.85x slower before: 1399.4 i/s - 5.89x slower before(YJIT): 1220.1 i/s - 6.76x slower pull after: 8318.8 i/s after(YJIT): 4805.6 i/s - 1.73x slower before: 1410.8 i/s - 5.90x slower before(YJIT): 1260.5 i/s - 6.60x slower stream after: 8320.2 i/s after(YJIT): 4728.4 i/s - 1.76x slower before: 1420.3 i/s - 5.86x slower before(YJIT): 1254.1 i/s - 6.63x slower ``` - YJIT=ON : 2.91x - 3.80x faster - YJIT=OFF : 4.37x - 5.90x faster Co-authored-by: Sutou Kouhei <kou@clear-code.com>
1 parent 4349091 commit a5f31c4

File tree

2 files changed

+15
-4
lines changed

2 files changed

+15
-4
lines changed

lib/rexml/parsers/baseparser.rb

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -449,9 +449,7 @@ def pull_event
449449
end
450450
return [ :end_element, last_tag ]
451451
elsif @source.match?("!", true)
452-
md = @source.match(/([^>]*>)/um)
453452
#STDERR.puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}"
454-
raise REXML::ParseException.new("Malformed node", @source) unless md
455453
if @source.match?("--", true)
456454
return [ :comment, process_comment ]
457455
elsif @source.match?("[CDATA[", true)
@@ -461,9 +459,9 @@ def pull_event
461459
else
462460
raise REXML::ParseException.new("Malformed CDATA: Missing end ']]>'", @source)
463461
end
462+
else
463+
raise REXML::ParseException.new("Malformed node: Started with '<!' but not a comment nor CDATA", @source)
464464
end
465-
raise REXML::ParseException.new( "Declarations can only occur "+
466-
"in the doctype declaration.", @source)
467465
elsif @source.match?("?", true)
468466
return process_instruction
469467
else

test/parse/test_comment.rb

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,19 @@ def test_doctype_malformed_comment_end
8484
DETAIL
8585
end
8686

87+
def test_after_doctype_malformed_node
88+
exception = assert_raise(REXML::ParseException) do
89+
parse("<a><!a")
90+
end
91+
assert_equal(<<~DETAIL.chomp, exception.to_s)
92+
Malformed node: Started with '<!' but not a comment nor CDATA
93+
Line: 1
94+
Position: 6
95+
Last 80 unconsumed characters:
96+
a
97+
DETAIL
98+
end
99+
87100
def test_after_doctype_unclosed_comment
88101
exception = assert_raise(REXML::ParseException) do
89102
parse("<a><!-->")

0 commit comments

Comments
 (0)
0