8000 Implement differential fuzzer for pandoc by notriddle · Pull Request #673 · pulldown-cmark/pulldown-cmark · GitHub
[go: up one dir, main page]

Skip to content

Conversation

notriddle
Copy link
Collaborator

No description provided.

@Martin1887
Copy link
Collaborator

Thanks for your contribution.

The goal of the project is supporting CommonMark and Github Flavored Markdown, Pandoc target is far from the scope. May this fuzzer provide help to catch errors in CommonMark+GFM? The only case I find is when both pulldown-cmark and commonmark.js are wrong and Pandoc does the job well.

On the other hand, this code is independent of the final binary and only a dev tool. What do you think, @raphlinus?

@notriddle
Copy link
Collaborator Author

May this fuzzer provide help to catch errors in CommonMark+GFM?

That's what I'm thinking, yeah. Pandoc lets you select your extensions, such as commonmark+footnotes or commonmark+task_lists.

10000

fuzz/src/lib.rs Outdated
Ok(events)
}

pub fn normalize_pandoc(events: Vec<Event<'_>>) -> Vec<Event<'_>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is cool! I would rename normalize below to normalize_commonmarkjs or similar.

@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch 2 times, most recently from 26d9ebb to c459690 Compare June 27, 2023 18:34
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch 3 times, most recently from 64de992 to 567866b Compare October 13, 2023 01:11
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch 4 times, most recently from 69b81e6 to d03e618 Compare October 26, 2023 01:29
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch from 0e44799 to 56b45e0 Compare October 30, 2023 21:12
use pulldown_cmark::{Event, Tag, TagEnd};
match event {
Event::Start(Tag::FootnoteDefinition(id)) => {
if id.starts_with("\n") || id.ends_with("\n") || id.starts_with("\r") || id.ends_with("\r") || id.starts_with(" ") || id.starts_with("\t") || id.contains(" ") || id.contains("\t ") || id.contains(" \t") || id.contains("\t\t") || id.ends_with(" ") || id.ends_with("\t") { return };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be simpler to use the slice variants of the starts_with and ends_with patterns:

Suggested change
if id.starts_with("\n") || id.ends_with("\n") || id.starts_with("\r") || id.ends_with("\r") || id.starts_with(" ") || id.starts_with("\t") || id.contains(" ") || id.contains("\t ") || id.contains(" \t") || id.contains("\t\t") || id.ends_with(" ") || id.ends_with("\t") { return };
let whitespace = &['\n', '\r', ' ', '\t'];
if id.starts_with(whitespace)
|| id.ends_with(whitespace)
|| id.contains("\t ")
|| id.contains(" \t")
|| id.contains("\t\t")
{
return;
};

@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch from 8d3923e to 2098f86 Compare November 14, 2023 22:48
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch 2 times, most recently from 7cc429e to e6caf75 Compare November 24, 2023 22:54
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch 4 times, most recently from c8a7098 to 9b5cd31 Compare January 21, 2024 18:40
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch from 9b5cd31 to 30bbeb4 Compare January 23, 2024 20:23
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch 2 times, most recently from 99ca650 to 7c97616 Compare March 5, 2024 19:23
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch from 7c97616 to a17b615 Compare March 25, 2024 17:46
ollpu and others added 2 commits April 17, 2024 18:37
Based on pulldown-cmark#622 and
copied from https://github.com/ollpu/pulldown-cmark/tree/alt-math.

Co-authored-by: rhysd <lin90162@yahoo.co.jp>
This feature is loosely based on what 63a29a1
described, but copies [commonmark-hs] more closely (the balanced braces
feature is added).

[commonmark-hs]: https://github.com/nschloe/github-math-bugs

It largely ignores GitHub, because its math parsing [is very buggy].

[is very buggy]: https://github.com/nschloe/github-math-bugs
notriddle and others added 24 commits April 17, 2024 18:37
This approach, based on @ollpu's suggestion, tracks single `$`s
in the inline tree, and merges them later. It avoids having
to merge and unmerge them in some corner cases.
The essential problem is: every time you write `$$x$}`, you get another
entry added to a hash table. Even if it's not [theoretically] *quadratic*,
it's still slow. Hard limiting it to 255 entries makes this not a problem.

Interestingly enough, when I tried to write an analogous torture test
for code spans, I couldn't find a way to do it because code spans are
keyed by their *length* instead of their *position*. In order to get
N entries in the hash table, I basically had to write N `` ` `` in a
row, forcing me to write quadratic amounts of input text.

Comparison:

```
michaelhowell@Michael-Howells-Macbook-Pro pulldown-cmark % python3 -c 'print("$$x$}"*5000)' | time target/release/pulldown-cmark.old -M > /dev/null
target/release/pulldown-cmark.old -M > /dev/null  2.63s user 0.02s system 99% cpu 2.673 total
michaelhowell@Michael-Howells-Macbook-Pro pulldown-cmark % python3 -c 'print("$$x$}"*5000)' | time target/release/pulldown-cmark.new -M > /dev/null
target/release/pulldown-cmark.new -M > /dev/null  0.01s user 0.00s system 6% cpu 0.109 total
```

[theoretically]: http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html
Co-authored-by: Linda_pp <rhysd@users.noreply.github.com>
- Disallow $$ matching a closing $ and then marching delimiters in
  `make_math_span`. Instead, retry scanning at the second position.

- Remove the `seen_first` optimization from `MathDelims`. It doesn't
  work with the retry strategy.
Co-authored-by: Michael Howell <michael@notriddle.com>
@notriddle notriddle force-pushed the notriddle/fuzz-pandoc branch from a17b615 to 4e02fac Compare April 18, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0