8000 PG-1465 Add ASAN and UBSAN sanitizers by artemgavrilov · Pull Request #285 · percona/postgres · GitHub
[go: up one dir, main page]

Skip to content

PG-1465 Add ASAN and UBSAN sanitize 8000 rs #285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: TDE_REL_17_STABLE
Choose a base branch
from

Conversation

artemgavrilov
Copy link
Collaborator
@artemgavrilov artemgavrilov commented Apr 29, 2025

https://perconadev.atlassian.net/browse/PG-1465

This PR adds CI workflow with ASAN(including LSAN) and UBSAN sanitizers. To make this workflow pass it required to modify couple tests that were affected by sanitizers influence. Running check-world takes 46 minutes, so it looks like overkill for workflow that we will run for every PR. Because of that I ended with pg_tde tests + PG regression tests.

One real issue were found with sanitizers in tde_keyring.c file (access uninitialized pointer)

@codecov-commenter
Copy link
codecov-commenter commented Apr 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.07%. Comparing base (7f2f26e) to head (5b7c987).

❌ Your project status has failed because the head coverage (80.07%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@                Coverage Diff                 @@
##           TDE_REL_17_STABLE     #285   +/-   ##
==================================================
  Coverage              80.07%   80.07%           
==================================================
  Files                     22       22           
  Lines                   2569     2570    +1     
  Branches                 402      403    +1     
==================================================
+ Hits                    2057     2058    +1     
  Misses                   434      434           
  Partials                  78       78           
Components Coverage Δ
access 83.06% <ø> (ø)
catalog 87.57% <ø> (ø)
common 92.42% <ø> (ø)
encryption 71.90% <ø> (ø)
keyring 72.07% <ø> (ø)
src 64.60% <ø> (ø)
smgr 97.08% <ø> (+0.02%) ⬆️
transam ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@artemgavrilov artemgavrilov force-pushed the PG-1465-sanitizers branch 7 times, most recently from 6fa0c9f to bf3ad43 Compare May 6, 2025 15:38
@artemgavrilov artemgavrilov changed the title [WIP] PG-1465 Sanitizers PG-1465 Add ASAN and UBSAN sanitizers May 7, 2025
@artemgavrilov artemgavrilov marked this pull request as ready for review May 7, 2025 13:54
@artemgavrilov artemgavrilov requested a review from dAdAbird as a code owner May 7, 2025 13:54
@@ -781,7 +781,10 @@ scan_key_provider_file(ProviderScanType scanType, void *scanKey, Oid dbOid)
providers_list = lappend(providers_list, keyring);
#else
if (providers_list == NULL)
{
providers_list = palloc_object(SimplePtrList);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should maybe use palloc0_object() here instead so that everything is null by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, make sense. Changed.

Add CI workflow with address and undefined behaviour santitzers.
Sanitizers increase time needed by server to shut down. That's not a
problem itself, but pg_ctl has problem that made this tests flacky. If
some child processes still alive at the moment when test tries to start
new instance whole test may stuck until timeout will be reached.

https://www.postgresql.org/message-id/1709793.1644523137%40sss.pgh.pa.us
Use palloc0_object to initialize tail and head pointers to zero value.
leak:preproc.y
leak:pgtypeslib/common.c
leak:ecpglib/memory.c
leak:kmip.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a ticket/tickets about this? Some of these supressions definitely seem like issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of them are not real issues, but "design features" or just allocation functions. I used timescale as a reference and they mute the same things: https://github.com/timescale/timescaledb/blob/main/scripts/suppressions/suppr_leak.txt

But I'm agree that this list should be reviewed. Probably some mutes scope can be reduced. Here is the ticket: https://perconadev.atlassian.net/browse/PG-1588

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kmip.c one is likely to be an actual bug. But, yeah, that can be outside the scope of this PR:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and strdup also seems to be a possible issue, or at least it can hide possible issues, as it suppresses any leaked strings created by strdup - even if it's a leak in postgres itself, we should look at that.

@@ -31,6 +31,7 @@

PGTDE::append_to_result_file("-- kill -9");
$node->kill9();
sleep(3); # sanitizers slow down the kill, let server die in peace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something but how can kill -9 be slow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the thread where exact same problem discussed: https://www.postgresql.org/message-id/flat/1709793.1644523137%40sss.pgh.pa.us#c4caef6ad812ecaceeb5b711b02ee8f4

So as I understood kill -9 could wait for program to terminate syscals/coredumps and it looks like ASAN doing something that slightly delays node death.


sanitize)
echo "Building with sanitize option"
export CFLAGS="-fsanitize=address -fsanitize=undefined -fno-omit-frame-pointer -fno-inline-functions"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you disable inlining? LLVM's documentation says nothing about that as far as I can see.

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#stack-traces-and-report-symbolization

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://clang.llvm.org/docs/AddressSanitizer.html#usage

To get a reasonable performance add -O1 or higher. To get nicer stack traces in error messages add -fno-omit-frame-pointer. To get perfect stack traces you may need to disable inlining (just use -O1) and tail call elimination (-fno-optimize-sibling-calls).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for ASAN. BTW it looks like it's worth to add -fno-optimize-sibling-calls as well.

sanitize)
echo "Building with sanitize option"
export CFLAGS="-fsanitize=address -fsanitize=undefined -fno-omit-frame-pointer -fno-inline-functions"
BUILD_TYPE=debug
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the best practices about optimization and sanitizers? Our production build will be -O2 but maybe that fucks up sanitizers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see information that -O2 could harm sanitizer checks, however lower optimization levels simplify report reviews. I hope there should be no difference between -O0 and -O2 in terms of leaks and UBs, am I wrong?

Copy link
Collaborator
@Andriciuc Andriciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for contribute and contributing changes to docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0