fix: relaxed right-flanking check for CJK characters #1145

utact · 2026-01-04T10:06:12Z

Motivation

In CJK languages, grammatical particles (e.g., 는, 가) often attach directly to the preceding token without a space.
Currently, markdown-it fails to recognize a closing emphasis delimiter if it is preceded by punctuation and immediately followed by a CJK character (e.g., **JWT)**는).

This occurs because the current scanDelims logic strictly follows CommonMark rules for "right-flanking delimiter runs". It interprets the sequence Punctuation + Delimiter + Letter (CJK) as a non-closing condition (similar to intra-word delimiters like a**b**c), preventing the emphasis from closing.

Changes

Introduced an isCJK(code) helper in lib/rules_inline/state_inline.mjs to detect CJK Unified Ideographs and Hangul Syllables.
Modified StateInline.prototype.scanDelims to treat the delimiter as right-flanking (valid closer) if the following character is a CJK character, regardless of the preceding character.

Verification

Correctness

Issue Case Fixed: **JWT(JSON Web Token)**는 now correctly renders as <strong>JWT(JSON Web Token)</strong>는.
Safety Check:
- Standard intra-word emphasis (e.g., in**tra**word) remains unaffected.
- Underscore strictness rules (e.g., __test)__는) are preserved (underscores are not allowed to close in this context due to stricter Left-Flanking rules), preventing unintended formatting in code-like strings.

Performance (Benchmarks)

Ran ./benchmark/benchmark.mjs to ensure no performance degradation.

Baseline: ~24,400 ops/sec
Patched: ~24,400 ops/sec
Result: No measurable regression.

Copilot

Pull request overview

This PR fixes emphasis delimiter recognition for CJK languages by relaxing the right-flanking delimiter rules when followed by CJK characters. In CJK languages, grammatical particles often attach directly to preceding tokens without spaces, which previously prevented emphasis from closing correctly (e.g., **JWT)**는 would not render the emphasis).

Key Changes

Added isCJK() helper function to detect CJK Unified Ideographs and Hangul Syllables
Modified right_flanking logic in scanDelims to treat delimiters as valid closers when followed by CJK characters
Maintains strict rules for underscores to avoid unintended formatting in code-like strings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-04T10:10:19Z

lib/rules_inline/state_inline.mjs

+// - 0x4E00 - 0x9FFF : CJK Unified Ideographs
+// - 0xAC00 - 0xD7A3 : Hangul Syllables
+//
+function isCJK (code) {
+  return (code >= 0x4E00 && code <= 0x9FFF) ||
+         (code >= 0xAC00 && code <= 0xD7A3)


The isCJK function currently covers CJK Unified Ideographs (0x4E00-0x9FFF) and Hangul Syllables (0xAC00-0xD7A3). Consider whether additional CJK-related ranges should be included for more comprehensive coverage:

Hiragana (0x3040-0x309F): Japanese script, commonly used in similar contexts

Katakana (0x30A0-0x30FF): Japanese script, commonly used in similar contexts

CJK Unified Ideographs Extension A (0x3400-0x4DBF): Extended Chinese characters

These scripts may exhibit similar behavior to the reported issue with grammatical particles attaching directly to preceding tokens. If these are intentionally excluded to keep the scope minimal, consider documenting this limitation.

Suggested change

// - 0x4E00 - 0x9FFF : CJK Unified Ideographs

// - 0xAC00 - 0xD7A3 : Hangul Syllables

//

function isCJK (code) {

return (code >= 0x4E00 && code <= 0x9FFF) ||

(code >= 0xAC00 && code <= 0xD7A3)

// - 0x3040 - 0x309F : Hiragana

// - 0x30A0 - 0x30FF : Katakana

// - 0x3400 - 0x4DBF : CJK Unified Ideographs Extension A

// - 0x4E00 - 0x9FFF : CJK Unified Ideographs

// - 0xAC00 - 0xD7A3 : Hangul Syllables

//

function isCJK (code) {

return (code >= 0x3040 && code <= 0x309F) || // Hiragana

(code >= 0x30A0 && code <= 0x30FF) || // Katakana

(code >= 0x3400 && code <= 0x4DBF) || // CJK Unified Ideographs Extension A

(code >= 0x4E00 && code <= 0x9FFF) || // CJK Unified Ideographs

(code >= 0xAC00 && code <= 0xD7A3) // Hangul Syllables

Copilot · 2026-01-04T10:10:19Z

lib/rules_inline/state_inline.mjs

+  const isNextCJK = isCJK(nextChar)
+
  const left_flanking =
    !isNextWhiteSpace && (!isNextPunctChar || isLastWhiteSpace || isLastPunctChar)
  const right_flanking =
-    !isLastWhiteSpace && (!isLastPunctChar || isNextWhiteSpace || isNextPunctChar)
+    !isLastWhiteSpace && (!isLastPunctChar || isNextWhiteSpace || isNextPunctChar || isNextCJK)


The new CJK character handling in the delimiter scanning logic lacks test coverage. Given that the repository has comprehensive test fixtures for other emphasis scenarios (e.g., in test/fixtures/markdown-it/commonmark_extras.txt), tests should be added for the CJK use cases.

Suggested test cases to add:

**JWT(JSON Web Token)**는 → <strong>JWT(JSON Web Token)</strong>는

CJK characters with other delimiters (e.g., *text*는, __text__는)

Edge cases with multiple CJK characters

Ensuring existing intra-word emphasis behavior remains unchanged (e.g., in**tra**word)

Tests ensure this fix works correctly and prevent regressions in future changes.

tats-u · 2026-01-15T14:27:55Z

@utact Have you tried markdown-it-cjk-friendly? Such a change must be merged into CommonMark first.

Demo with bench: https://tats-u.github.io/markdown-cjk-friendly/?sc8=KipKV1QoSlNPTiBXZWIgVG9rZW4pKirripQ&gfm=1&engine=markdown-it&bench=1 (you can omit the trailing &bench=1 to suppress bench on load)

Emphasis with CJK punctuation commonmark/commonmark-spec#650
Meta: Port CJK Friendly Emphasis to CommonMark spec / This Markdown parser will not support CJK Friendly Emphasis as a built-in extension tats-u/markdown-cjk-friendly#22

Fix emphasis not closing when followed by CJK character

b2164cb

Copilot AI review requested due to automatic review settings January 4, 2026 10:06

Copilot started reviewing on behalf of utact January 4, 2026 10:06 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: relaxed right-flanking check for CJK characters #1145

fix: relaxed right-flanking check for CJK characters #1145

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: relaxed right-flanking check for CJK characters #1145

Are you sure you want to change the base?

fix: relaxed right-flanking check for CJK characters #1145

Conversation

Motivation

Changes

Verification

Correctness

Performance (Benchmarks)

Uh oh!

Choose a reason for hiding this comment

Pull request overview

Key Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants