-
Notifications
You must be signed in to change notification settings - Fork 9
PG-1465 Add ASAN and UBSAN sanitize 8000 rs #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: TDE_REL_17_STABLE
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❌ Your project status has failed because the head coverage (80.07%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## TDE_REL_17_STABLE #285 +/- ##
==================================================
Coverage 80.07% 80.07%
==================================================
Files 22 22
Lines 2569 2570 +1
Branches 402 403 +1
==================================================
+ Hits 2057 2058 +1
Misses 434 434
Partials 78 78
🚀 New features to boost your workflow:
|
6fa0c9f
to
bf3ad43
Compare
bf3ad43
to
5b2b56e
Compare
5b2b56e
to
42bb329
Compare
4299778
to
b04f73e
Compare
b04f73e
to
4fcf202
Compare
4fcf202
to
ef637b7
Compare
@@ -781,7 +781,10 @@ scan_key_provider_file(ProviderScanType scanType, void *scanKey, Oid dbOid) | |||
providers_list = lappend(providers_list, keyring); | |||
#else | |||
if (providers_list == NULL) | |||
{ | |||
providers_list = palloc_object(SimplePtrList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should maybe use palloc0_object()
here instead so that everything is null by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, make sense. Changed.
Add CI workflow with address and undefined behaviour santitzers.
Sanitizers increase time needed by server to shut down. That's not a problem itself, but pg_ctl has problem that made this tests flacky. If some child processes still alive at the moment when test tries to start new instance whole test may stuck until timeout will be reached. https://www.postgresql.org/message-id/1709793.1644523137%40sss.pgh.pa.us
Use palloc0_object to initialize tail and head pointers to zero value.
ef637b7
to
2fae6b3
Compare
leak:preproc.y | ||
leak:pgtypeslib/common.c | ||
leak:ecpglib/memory.c | ||
leak:kmip.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a ticket/tickets about this? Some of these supressions definitely seem like issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of them are not real issues, but "design features" or just allocation functions. I used timescale as a reference and they mute the same things: https://github.com/timescale/timescaledb/blob/main/scripts/suppressions/suppr_leak.txt
But I'm agree that this list should be reviewed. Probably some mutes scope can be reduced. Here is the ticket: https://perconadev.atlassian.net/browse/PG-1588
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kmip.c
one is likely to be an actual bug. But, yeah, that can be outside the scope of this PR:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and strdup also seems to be a possible issue, or at least it can hide possible issues, as it suppresses any leaked strings created by strdup - even if it's a leak in postgres itself, we should look at that.
@@ -31,6 +31,7 @@ | |||
|
|||
PGTDE::append_to_result_file("-- kill -9"); | |||
$node->kill9(); | |||
sleep(3); # sanitizers slow down the kill, let server die in peace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something but how can kill -9
be slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the thread where exact same problem discussed: https://www.postgresql.org/message-id/flat/1709793.1644523137%40sss.pgh.pa.us#c4caef6ad812ecaceeb5b711b02ee8f4
So as I understood kill -9
could wait for program to terminate syscals/coredumps and it looks like ASAN doing something that slightly delays node death.
|
||
sanitize) | ||
echo "Building with sanitize option" | ||
export CFLAGS="-fsanitize=address -fsanitize=undefined -fno-omit-frame-pointer -fno-inline-functions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you disable inlining? LLVM's documentation says nothing about that as far as I can see.
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#stack-traces-and-report-symbolization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://clang.llvm.org/docs/AddressSanitizer.html#usage
To get a reasonable performance add -O1 or higher. To get nicer stack traces in error messages add -fno-omit-frame-pointer. To get perfect stack traces you may need to disable inlining (just use -O1) and tail call elimination (-fno-optimize-sibling-calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for ASAN. BTW it looks like it's worth to add -fno-optimize-sibling-calls
as well.
sanitize) | ||
echo "Building with sanitize option" | ||
export CFLAGS="-fsanitize=address -fsanitize=undefined -fno-omit-frame-pointer -fno-inline-functions" | ||
BUILD_TYPE=debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the best practices about optimization and sanitizers? Our production build will be -O2
but maybe that fucks up sanitizers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see information that -O2
could harm sanitizer checks, however lower optimization levels simplify report reviews. I hope there should be no difference between -O0
and -O2
in terms of leaks and UBs, am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for contribute and contributing changes to docs
https://perconadev.atlassian.net/browse/PG-1465
This PR adds CI workflow with ASAN(including LSAN) and UBSAN sanitizers. To make this workflow pass it required to modify couple tests that were affected by sanitizers influence. Running
check-world
takes 46 minutes, so it looks like overkill for workflow that we will run for every PR. Because of that I ended withpg_tde
tests + PG regression tests.One real issue were found with sanitizers in
tde_keyring.c
file (access uninitialized pointer)