8000 Fix backslash-escaping multibyte chars in COPY FROM. · postgrespro/postgres_cluster@c87cbd5 · GitHub
[go: up one dir, main page]

Skip to content

Commit c87cbd5

Browse files
committed
Fix backslash-escaping multibyte chars in COPY FROM.
If a multi-byte character is escaped with a backslash in TEXT mode input, and the encoding is one of the client-only encodings where the bytes after the first one can have an ASCII byte "embedded" in the char, we didn't skip the character correctly. After a backslash, we only skipped the first byte of the next character, so if it was a multi-byte character, we would try to process its second byte as if it was a separate character. If it was one of the characters with special meaning, like '\n', '\r', or another '\\', that would cause trouble. One suc 10000 h exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding. That's supposed to be [backslash][two-byte character][.][f][o][o], but because the second byte of the two-byte character is 0x5c, we incorrectly treat it as another backslash. And because the next character is a dot, we parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt" error. Backpatch to all supported versions. Reviewed-by: John Naylor, Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
1 parent 9843841 commit c87cbd5

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

src/backend/commands/copy.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4227,7 +4227,7 @@ CopyReadLineText(CopyState cstate)
42274227
break;
42284228
}
42294229
else if (!cstate->csv_mode)
4230-
4230+
{
42314231
/*
42324232
* If we are here, it means we found a backslash followed by
42334233
* something other than a period. In non-CSV mode, anything
@@ -4238,8 +4238,16 @@ CopyReadLineText(CopyState cstate)
42384238
* backslashes are not special, so we want to process the
42394239
* character after the backslash just like a normal character,
42404240
* so we don't increment in those cases.
4241+
*
4242+
* Set 'c' to skip whole character correctly in multi-byte
4243+
* encodings. If we don't have the whole character in the
4244+
* buffer yet, we might loop back to process it, after all,
4245+
* but that's OK because multi-byte characters cannot have any
4246+
* special meaning.
42414247
*/
42424248
raw_buf_ptr++;
4249+
c = c2;
4250+
}
42434251
}
42444252

42454253
/*

0 commit comments

Comments
 (0)
0