10000 Feature/escape unicode control chars by cpjulia · Pull Request #14805 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Feature/escape unicode control chars #14805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 30, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4bfa630
Added parser for retaining or escaping control and unicode characters…
cpjulia Sep 18, 2021
600deca
Merge branch 'devel' of https://github.com/arangodb/arangodb into fea…
cpjulia Sep 20, 2021
feb58b7
Added unicode escaping for 4 bytes representation, parsing for broken…
cpjulia Sep 20, 2021
616265e
Merge branch 'devel' of https://github.com/arangodb/arangodb into fea…
cpjulia Sep 20, 2021
f48f87a
Added more tests
cpjulia Sep 20, 2021
4cc1e64
Removed unused functions, updated CHANGELOG, removed unused include i…
cpjulia Sep 20, 2021
2448e9c
Resolved CHANGELOG conflict from merge with devel
cpjulia Sep 20, 2021
3c29365
Update tests/Logger/EscaperTest.cpp
cpjulia Sep 20, 2021
bfa05eb
Update lib/Logger/LoggerFeature.cpp
cpjulia Sep 20, 2021
0ca0d25
Update lib/Logger/LoggerFeature.h
cpjulia Sep 20, 2021
39f97e1
Update lib/Logger/Escaper.h
cpjulia Sep 20, 2021
7a3480f
Update lib/Logger/Escaper.cpp
cpjulia Sep 20, 2021
963921b
Update CHANGELOG
cpjulia Sep 20, 2021
015a188
Update CHANGELOG
cpjulia Sep 21, 2021
a007926
Update CHANGELOG
cpjulia Sep 21, 2021
2f8f3a2
Updated CHANGELOG
cpjulia Sep 21, 2021
3067a5a
Added more tests, updated CHANGELOG
cpjulia Sep 21, 2021
482e186
Update tests/Logger/EscaperTest.cpp
cpjulia Sep 22, 2021
0ea373f
Update tests/Logger/EscaperTest.cpp
cpjulia Sep 22, 2021
c9ad897
Update CHANGELOG
cpjulia Sep 22, 2021
832f569
Update CHANGELOG
cpjulia Sep 22, 2021
7950a3f
Update CHANGELOG
cpjulia Sep 22, 2021
7818ca2
Merge branch 'devel' of github.com:arangodb/arangodb into feature/esc…
jsteemann Sep 22, 2021
a3d9738
Merge branch 'devel' into feature/escape-unicode-control-chars
cpjulia Sep 27, 2021
d9385b1
Merge branch 'devel' of https://github.com/arangodb/arangodb into fea…
cpjulia Sep 29, 2021
432a737
Merge branch 'feature/escape-unicode-control-chars' of https://github…
cpjulia Sep 29, 2021
92f3cbf
Merge branch 'devel' into feature/escape-unicode-control-chars
mchacki Sep 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Removed unused functions, updated CHANGELOG, removed unused include i…
…n unit test
  • Loading branch information
cpjulia committed Sep 20, 2021
commit 4cc1e64ff62b3d74b1975452ee48b74962d5ca39
26 changes: 26 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@
devel
-----

* feature/escape-unicode-control-chars: the server now has two flags for retaining or escaping control and unicode
characters in the log. The flag `log.escape` is deprecated and, instead,
the new flags `--log.escape-control-chars` and `log.escape-unicode-chars`
should be used.

- `--log.escape-control-chars`: this flag applies to the control characters, which have hex code below `\x20`, and also the character DEL, with hex code of `\x7f`. When its value is set to false, the control
character will be retained, and its actual value will be displayed when it
is a visible character, or a space ` ` character will be displayed if it is
not a visible character. The same will happen to `DEL` character (code `\xF7`),
even though it is not a control character, because it is not visible. For example,
control characer `\n` is visible, so a `\n` will be displayed in the log, and
control character `BEL` is not visible, so a space ` ` would be displayed.
When its value is set to true, the hex code for the character is displayed, for
example, `BEL` character would be displayed as its hex code, `\x07`.
The default value for this flag is `true` for compatibility with
previous versions.

- `--log.escape-unicode-chars`: when its value is set to false, the unicode character
will be retained, and its actual value will be displayed. For example, `犬` will
be displayed as `犬`. When its value is set to true, the character is escaped, and
the hex code for the character is displayed. For example, `犬` would be displayed
as its hex code, `\u72AC`.
The default value for this flag is set to `false` for compatibility with
previous versions.


* APM-60: optionally allow special characters and Unicode characters in
database names.

Expand Down
62 changes: 0 additions & 62 deletions lib/Basics/tri-strings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -303,68 +303,6 @@ char* TRI_SHA256String(char const* source, size_t sourceLen, size_t* dstLen) {
return (char*)dst;
}

////////////////////////////////////////////////////////////////////////////////
/// @brief escapes special characters using C escapes
/// the target buffer must have been allocated already and big enough to hold
/// the result of at most (4 * inLength) + 2 bytes!
////////////////////////////////////////////////////////////////////////////////

char* TRI_EscapeControlsCString(char const* in, size_t inLength, char* out,
size_t* outLength, bool appendNewline) {
if (out == nullptr) {
return nullptr;
}

char* qtr = out;
char const* ptr;
char const* end;

for (ptr = in, end = ptr + inLength; ptr < end; ptr++, qtr++) {
uint8_t n;

switch (*ptr) {
case '\n':
*qtr++ = '\\';
*qtr = 'n';
break;

case '\r':
*qtr++ = '\\';
*qtr = 'r';
break;

case '\t':
*qtr++ = '\\';
*qtr = 't';
break;

default:
n = (uint8_t)(*ptr);

if (n < 32) {
uint8_t n1 = n >> 4;
uint8_t n2 = n & 0x0F;

*qtr++ = '\\';
*qtr++ = 'x';
*qtr++ = (n1 < 10) ? ('0' + n1) : ('A' + n1 - 10);
*qtr = (n2 < 10) ? ('0' + n2) : ('A' + n2 - 10);
} else {
*qtr = *ptr;
}

break;
}
}

if (appendNewline) {
*qtr++ = '\n';
}

*qtr = '\0';
*outLength = static_cast<size_t>(qtr - out);
return out;
}

////////////////////////////////////////////////////////////////////////////////
/// @brief unescapes unicode escape sequences
Expand Down
18 changes: 0 additions & 18 deletions lib/Basics/tri-strings.h
Original file line number Diff line number Diff line change
Expand Up @@ -109,24 +109,6 @@ void TRI_FreeString(char*) noexcept;

char* TRI_SHA256String(char const* source, size_t sourceLen, size_t* dstLen);

////////////////////////////////////////////////////////////////////////////////
/// @brief returns the maximum result length for an escaped string
/// (4 * inLength) + 2 bytes!
////////////////////////////////////////////////////////////////////////////////

constexpr size_t TRI_MaxLengthEscapeControlsCString(size_t inLength) {
return (4 * inLength) + 2; // for newline and 0 byte
}

////////////////////////////////////////////////////////////////////////////////
/// @brief escapes special characters using C escapes
/// the target buffer must have been allocated already and big enough to hold
/// the result of at most (4 * inLength) + 2 bytes!
////////////////////////////////////////////////////////////////////////////////

char* TRI_EscapeControlsCString(char const* in, size_t inLength, char* out,
size_t* outLength, bool appendNewline);

////////////////////////////////////////////////////////////////////////////////
/// @brief unescapes unicode escape sequences
///
Expand Down
1 change: 0 additions & 1 deletion tests/Logger/EscaperTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@

#include "Logger/Escaper.h"

#include "Logger/LogMacros.h"

#include <string.h>
#include <string>
Expand Down
0