|
| 1 | +--- |
| 2 | +group: recipe |
| 3 | +index: 6 |
| 4 | +title: HTML and remark |
| 5 | +description: How to use remark to turn markdown into HTML, and to allow embedded HTML inside markdown |
| 6 | +tags: |
| 7 | + - remark |
| 8 | + - html |
| 9 | + - plugin |
| 10 | + - markdown |
| 11 | + - html |
| 12 | + - parse |
| 13 | +author: Titus Wormer |
| 14 | +authorTwitter: wooorm |
| 15 | +authorGithub: wooorm |
| 16 | +published: 2021-03-09 |
| 17 | +modified: 2021-03-09 |
| 18 | +--- |
| 19 | + |
| 20 | +## HTML and remark |
| 21 | + |
| 22 | +remark is a markdown compiler. |
| 23 | +It’s concerned with HTML in two ways: |
| 24 | + |
| 25 | +1. markdown is often turned into HTML |
| 26 | +2. markdown sometimes has embedded HTML |
| 27 | + |
| 28 | +When dealing with HTML and markdown, we will use both remark and rehype. |
| 29 | +This article shows some examples of how to do that. |
| 30 | + |
| 31 | +### Contents |
| 32 | + |
| 33 | +* [How to turn markdown into HTML](#how-to-turn-markdown-into-html) |
| 34 | +* [How to turn HTML into markdown](#how-to-turn-html-into-markdown) |
| 35 | +* [How to allow HTML embedded in markdown](#how-to-allow-html-embedded-in-markdown) |
| 36 | +* [How to properly support HTML inside markdown](#how-to-properly-support-html-inside-markdown) |
| 37 | + |
| 38 | +### How to turn markdown into HTML |
| 39 | + |
| 40 | +remark handles markdown: it can parse and serialize it. |
| 41 | +But it’s **not** for HTML. |
| 42 | +That’s what rehype does, which exists to parse and serialize HTML. |
| 43 | + |
| 44 | +To turn markdown into HTML, we need [`remark-parse`][remark-parse], |
| 45 | +[`remark-rehype`][remark-rehype], and [`rehype-stringify`][rehype-stringify]: |
| 46 | + |
| 47 | +```javascript |
| 48 | +var unified = require('unified') |
| 49 | +var remarkParse = require('remark-parse') |
| 50 | +var remarkRehype = require('remark-rehype') |
| 51 | +var rehypeStringify = require('rehype-stringify') |
| 52 | + |
| 53 | +unified() |
| 54 | + .use(remarkParse) // Parse markdown content to a syntax tree |
| 55 | + .use(remarkRehype) // Turn markdown syntax tree to HTML syntax tree, ignoring embedded HTML |
| 56 | + .use(rehypeStringify) // Serialize HTML syntax tree |
| 57 | + .process('*emphasis* and **strong**') |
| 58 | + .then((file) => console.log(String(file))) |
| 59 | + .catch((error) => { |
| 60 | + throw error |
| 61 | + }) |
| 62 | +``` |
| 63 | + |
| 64 | +This turns `*emphasis* and **strong**` into |
| 65 | +`<em>emphasis</em> and <strong>strong</strong>`, but it does not support HTML |
| 66 | +embedded inside markdown (such as `*emphasis* and <strong>strong</strong>`). |
| 67 | + |
| 68 | +This solution **is safe**: content you don’t trust cannot cause an XSS |
| 69 | +vulnerability. |
| 70 | + |
| 71 | +### How to turn HTML into markdown |
| 72 | + |
| 73 | +We can also do the inverse. |
| 74 | +To turn HTML into markdown, we need [`rehype-parse`][rehype-parse], |
| 75 | +[`rehype-remark`][rehype-remark], and [`remark-stringify`][remark-stringify]: |
| 76 | + |
| 77 | +```javascript |
| 78 | +var unified = require('unified') |
| 79 | +var rehypeParse = require('rehype-parse') |
| 80 | +var rehypeRemark = require('rehype-remark') |
| 81 | +var remarkStringify = require('remark-stringify') |
| 82 | + |
| 83 | +unified() |
| 84 | + .use(rehypeParse) // Parse HTML to a syntax tree |
| 85 | + .use(rehypeRemark) // Turn HTML syntax tree to markdown syntax tree |
| 86 | + .use(remarkStringify) // Serialize HTML syntax tree |
| 87 | + .process('<em>emphasis</em> and <strong>strong</strong>') |
| 88 | + .then((file) => console.log(String(file))) |
| 89 | + .catch((error) => { |
| 90 | + throw error |
| 91 | + }) |
| 92 | +``` |
| 93 | + |
| 94 | +This turns `<em&
B41A
gt;emphasis</em> and <strong>strong</strong>` into |
| 95 | +`*emphasis* and **strong**`. |
| 96 | + |
| 97 | +### How to allow HTML embedded in markdown |
| 98 | + |
| 99 | +Markdown is a content format that’s great for the more basic things: |
| 100 | +it’s nicer to write `*emphasis*` than `<em>emphasis</em>`. |
| 101 | +But, it’s limited: only a couple things are supported with its terse syntax. |
| 102 | +Luckily, for more complex things, markdown allows HTML inside it. |
| 103 | +A common example of this is to include a `<details>` element. |
| 104 | + |
| 105 | +HTML embedded in markdown can be allowed when going from markdown to HTML |
| 106 | +by configuring [`remark-rehype`][remark-rehype] and |
| 107 | +[`rehype-stringify`][rehype-stringify]: |
| 108 | + |
| 109 | +```javascript |
| 110 | +var unified = require('unified') |
| 111 | +var remarkParse = require('remark-parse') |
| 112 | +var remarkRehype = require('remark-rehype') |
| 113 | +var rehypeStringify = require('rehype-stringify') |
| 114 | + |
| 115 | +unified() |
| 116 | + .use(remarkParse) |
| 117 | + .use(remarkRehype, {allowDangerousHtml: true}) // Pass raw HTML strings through. |
| 118 | + .use(rehypeStringify, {allowDangerousHtml: true}) // Serialize the raw HTML strings |
| 119 | + .process('*emphasis* and <strong>strong</strong>') |
| 120 | + .then((file) => console.log(String(file))) |
| 121 | + .catch((error) => { |
| 122 | + throw error |
| 123 | + }) |
| 124 | +``` |
| 125 | + |
| 126 | +This solution **is not safe**: content you don’t trust can cause XSS |
| 127 | +vulnerabilities. |
| 128 | + |
| 129 | +### How to properly support HTML inside markdown |
| 130 | + |
| 131 | +To properly support HTML embedded inside markdown, we need another plugin: |
| 132 | +[`rehype-raw`][rehype-raw]. |
| 133 | +This plugin will take the strings of HTML embedded in markdown and parse them |
| 134 | +with an actual HTML parser. |
| 135 | + |
| 136 | +```javascript |
| 137 | +var unified = require('unified') |
| 138 | +var remarkParse = require('remark-parse') |
| 139 | +var remarkRehype = require('remark-rehype') |
| 140 | +var rehypeRaw = require('rehype-raw') |
| 141 | +var rehypeStringify = require('rehype-stringify') |
| 142 | + |
| 143 | +unified() |
| 144 | + .use(remarkParse) |
| 145 | + .use(remarkRehype, {allowDangerousHtml: true}) |
| 146 | + .use(rehypeRaw) // *Parse* the raw HTML strings embedded in the tree |
| 147 | + .use(rehypeStringify) |
| 148 | + .process('*emphasis* and <strong>strong</strong>') |
| 149 | + .then((file) => console.log(String(file))) |
| 150 | + .catch((error) => { |
| 151 | + throw error |
| 152 | + }) |
| 153 | +``` |
| 154 | + |
| 155 | +This solution **is not safe**: content you don’t trust can cause XSS |
| 156 | +vulnerabilities. |
| 157 | + |
| 158 | +But because we now have a complete HTML syntax tree, we can sanitize that tree. |
| 159 | +For a safe solution, add [`rehype-sanitize`][rehype-sanitize] right before |
| 160 | +`rehype-stringify`. |
| 161 | + |
| 162 | +[remark-parse]: https://github.com/remarkjs/remark/tree/main/packages/remark-parse |
| 163 | + |
| 164 | +[remark-stringify]: https://github.com/remarkjs/remark/tree/main/packages/remark-stringify |
| 165 | + |
| 166 | +[remark-rehype]: https://github.com/remarkjs/remark-rehype |
| 167 | + |
| 168 | +[rehype-parse]: https://github.com/rehypejs/rehype/tree/main/packages/rehype-parse |
| 169 | + |
| 170 | +[rehype-stringify]: https://github.com/rehypejs/rehype/tree/main/packages/rehype-stringify |
| 171 | + |
| 172 | +[rehype-remark]: https://github.com/rehypejs/rehype-remark |
| 173 | + |
| 174 | +[rehype-raw]: https://github.com/rehypejs/rehype-raw |
| 175 | + |
| 176 | +[rehype-sanitize]: https://github.com/rehypejs/rehype-sanitize |
0 commit comments