8000 gh-101180: Fix a bug where iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds by moriyama · Pull Request #111695 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-101180: Fix a bug where iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds #111695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 6, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
gh-101180: Add test for iso2022_jp_3 and iso2022_jp_2004 codecs
iso2022_jp_3 and iso2022_jp_2004 are upward compatible with iso2022_jp.
In addition to testing iso2022_jp, we will test the following characters
added in iso2022_jp_3 and iso2022_jp_2004.

  JIS X 0213        Unicode
  ----------------  ---------------------------------------------
  Plane 1 \x2E\x23  U+3402        Basic Multilingual Plane
  Plane 1 \x2E\x22  U+2000B       Supplementary Ideographic Plane
  Plane 1 \x24\x77  U+304B U+309A Combining Character Suqence
  Plane 2 \x21\x22  U+4E02        Basic Multilingual Plane
  Plane 2 \x7E\x76  U+2A6B2       Supplementary Ideographic Plane

The difference between iso2022_jp_3 and iso2022_jp_2004 is the
difference between JIS X 0213:2000 and JIS X 0213:2004.
Tests the following a character added from JIS X 0213:2000 to JIS X
0213:2004.

  JIS X 0213:2004   Unicode
  ----------------  -------
  Plane 1 \x2E\x21  U+4FF1

Escape sequence to designate JIS X 0213 character set to G0:

  character set            ESC sequence
  -----------------------  ---------------------------
  JIS X 0213:2000 Plane 1  ESC 2/4 2/8 4/15  ESC $ ( O
  JIS X 0213:2000 Plane 2  ESC 2/4 2/8 5/0   ESC $ ( P
  JIS X 0213:2004 Plane 1  ESC 2/4 2/8 5/1   ESC $ ( Q
  JIS X 0213:2004 Plane 2  ESC 2/4 2/8 5/0   ESC $ ( P
  • Loading branch information
moriyama committed Nov 3, 2023
commit ddff542d84d24c82fd23c4137e2c6362c617d22c
46 changes: 46 additions & 0 deletions Lib/test/test_codecencodings_iso2022.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,52 @@ class Test_ISO2022_JP2(multibytecodec_support.TestBase, unittest.TestCase):
(b'ab\x1BNdef', 'replace', 'abdef'),
)

class Test_ISO2022_JP3(multibytecodec_support.TestBase, unittest.TestCase):
encoding = 'iso2022_jp_3'
tstring = multibytecodec_support.load_teststring('iso2022_jp')
codectests = COMMON_CODEC_TESTS + (
(b'ab\x1BNdef', 'replace', 'ab\x1BNdef'),
(b'\x1B$(O\x2E\x23\x1B(B', 'strict', '\u3402' ),
(b'\x1B$(O\x2E\x22\x1B(B', 'strict', '\U0002000B' ),
(b'\x1B$(O\x24\x77\x1B(B', 'strict', '\u304B\u309A'),
(b'\x1B$(P\x21\x22\x1B(B', 'strict', '\u4E02' ),
(b'\x1B$(P\x7E\x76\x1B(B', 'strict', '\U0002A6B2' ),
('\u3402', 'strict', b'\x1B$(O\x2E\x23\x1B(B'),
('\U0002000B', 'strict', b'\x1B$(O\x2E\x22\x1B(B'),
('\u304B\u309A', 'strict', b'\x1B$(O\x24\x77\x1B(B'),
('\u4E02', 'strict', b'\x1B$(P\x21\x22\x1B(B'),
('\U0002A6B2', 'strict', b'\x1B$(P\x7E\x76\x1B(B'),
(b'ab\x1B$(O\x2E\x21\x1B(Bdef', 'replace', 'ab\uFFFDdef'),
('ab\u4FF1def', 'replace', b'ab?def'),
)
xmlcharnametest = (
'\xAB\u211C\xBB = \u2329\u1234\u232A',
b'\x1B$(O\x29\x28\x1B(Bℜ\x1B$(O\x29\x32\x1B(B = ⟨ሴ⟩'
)

class Test_ISO2022_JP2004(multibytecodec_support.TestBase, unittest.TestCase):
encoding = 'iso2022_jp_2004'
tstring = multibytecodec_support.load_teststring('iso2022_jp')
codectests = COMMON_CODEC_TESTS + (
(b'ab\x1BNdef', 'replace', 'ab\x1BNdef'),
(b'\x1B$(Q\x2E\x23\x1B(B', 'strict', '\u3402' ),
(b'\x1B$(Q\x2E\x22\x1B(B', 'strict', '\U0002000B' ),
(b'\x1B$(Q\x24\x77\x1B(B', 'strict', '\u304B\u309A'),
(b'\x1B$(P\x21\x22\x1B(B', 'strict', '\u4E02' ),
(b'\x1B$(P\x7E\x76\x1B(B', 'strict', '\U0002A6B2' ),
('\u3402', 'strict', b'\x1B$(Q\x2E\x23\x1B(B'),
('\U0002000B', 'strict', b'\x1B$(Q\x2E\x22\x1B(B'),
('\u304B\u309A', 'strict', b'\x1B$(Q\x24\x77\x1B(B'),
('\u4E02', 'strict', b'\x1B$(P\x21\x22\x1B(B'),
('\U0002A6B2', 'strict', b'\x1B$(P\x7E\x76\x1B(B'),
(b'ab\x1B$(Q\x2E\x21\x1B(Bdef', 'replace', 'ab\u4FF1def'),
('ab\u4FF1def', 'replace', b'ab\x1B$(Q\x2E\x21\x1B(Bdef'),
)
xmlcharnametest = (
'\xAB\u211C\xBB = \u2329\u1234\u232A',
b'\x1B$(Q\x29\x28\x1B(Bℜ\x1B$(Q\x29\x32\x1B(B = ⟨ሴ⟩'
)

class Test_ISO2022_KR(multibytecodec_support.TestBase, unittest.TestCase):
encoding = 'iso2022_kr'
tstring = multibytecodec_support.load_teststring('iso2022_kr')
Expand Down
0