8000 Enable native memory profiling on the async-profiler by Baraa-Hasheesh · Pull Request #1323 · async-profiler/async-profiler · GitHub
[go: up one dir, main page]

Skip to content

Conversation

Baraa-Hasheesh
Copy link
Contributor

Description

Enable Native memory profiling on the async-profiler shared objects

Previously this profiling was removed as to solve #1266

This PR keeps that fix by reading the GOT entry before it is overwritten by the profiler & uses it to call the correct malloc via the dynamic linker

Related issues

N/A

10BC0

Motivation and context

Allow async-profiler to detect memory leaks inside it self

How has this been tested?

make test


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Baraa-Hasheesh Baraa-Hasheesh changed the title Self native mem profiling enable Enable native memory profiling on the async-profiler May 29, 2025
@Baraa-Hasheesh
Copy link
Contributor Author

I've been doing additional testing on this & found that it should also fix #1310.

This is due to how resolution of original memory allocation function is made, as it's done via the original GOT entry value which will contain the correct malloc to call for the shared object provided by the dynamic linker (For one profiler).

  • When the first profiler starts that will always be the real function to call provided by the linker.
  • When the second profiler is loaded that will be modified during the dlopen by the first profiler to call the first profiler hook
  • When the second profiler is started it will find the first profiler hook as the real function to call in the GOT entry

This means any future memory allocation will start to have the following flow after all 3 steps are concluded
malloc -> second profiler hook -> first profiler hook -> original malloc

The order of native memory profiling sessions start between profilers won't be important as the first profiler to start will always have the original malloc in it's GOT entries which will stop any potential infinite recursive chain from ever happening

if (ptr) free(ptr);

ptr = malloc(1);
if (ptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free perfectly works with NULL pointer. Code can be a lot simpler without if checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did oppressive some some sigfaults on MacOs that made me add all the existing if statements just for safety

I will double check which if statements can be removed (I believe I can remove all other than the realloc if check)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced the checks here 3dfc520

I've kept checks before calling realloc & after calling posix_memalign

I also moved the symbols to their own function as I noticed that MacOs was optimizing some of them out when removing the if checks,
This new function is Marked to not be optimized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, newer GCC versions still optimize calloc and aligned_alloc in this code, leading to a crash.
I came up with this implementation that retains all calls even with maximum optimization level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apangin the implementation does look great,
I will add it on my branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin applied here 376eeb0

ptr = aligned_alloc(1, 1);
if (ptr) free(ptr);

_orig_malloc = (malloc_t)*lib->findImport(im_malloc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not standard dlsym() approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using dlsym will cause an absolute symbol resolution rather than a lazy symbol
The resolved absolute symbol via dlsym could be the wrong one (Not the one we want to call)

Using RTLD_NEXT previously was doing that, as it was resolving the wrong malloc function to call

I did try using RTLD_DEFAULT & a dlopen handle to evaluate the symbols but both resulted in test failures

As such I decided on using the lazy dynamic symbols as it was the safest choice as it will be equivalent to the current state of async profiler of using the dynamic linker.

The dlopen handle caused the following test to fail (evaluating wrong malloc)
NativememTests.preloadMalloc/LD_PRELOAD+profiler_first
NativememTests.preloadMalloc/LD_PRELOAD+profiler_second
NativememTests.preloadMalloc/api_test

The RTLD_DEFAULT caused the following tests to fail (Infinite callback loop)
ativememTests.malloc_plt_dyn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that malloc_plt_dyn test indeed fails on Linux, depending on the compiler version and flags.
If the tests program has BIND_NOW flag, the test passes; if not, dlsym returns an address of PLT stub instead of the actual libc implementation.

Unfortunately, current code and the proposed findImport approach may also both fail depending on the compiler options. For example, if libasyncProfiler is compiled with -fno-plt flag, findImport will also return an address of PLT stub, resulting in subsequent recursion.

Copy link
Contributor Author
@Baraa-Hasheesh Baraa-Hasheesh Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin thanks for the follow up analysis on malloc_plt_dyn & dlsym

Regarding the compile with -fno-plt, I tried compiling the async-profiler with -fno-plt however I didn't encounter any infinite loops

make clean; CFLAGS_EXTRA=-fno-plt CXXFLAGS_EXTRA=-fno-plt make test

can you provide me what gcc version you're currently using & how the compilation was made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin I was doing some experimentation & managed to create an infinite loop on async-profiler master branch

This was done using the following command

CFLAGS_EXTRA="-fno-plt -Wl,-Bsymbolic" CXXFLAGS_EXTRA="-fno-plt -Wl,-Bsymbolic" make test TESTS=nativemem;  

Same command doesn't fail on this branch

From what I can tell, this happens as when the PLT stubs were removed, the async-profiler ended up using a reference for malloc that was later patched/changed by the profiler.

However this doesn't happen in this branch as the profiler saves the old Reference before it's changed by the profiler, hence avoiding any infinite loop possibility.

so effectively this change shouldn't have infinite loops as it doesn't depend on stubs existing or not,
It simply depends on the dynamic linker writing a correct value that would end up calling the correct malloc.

As long as a correct value is written, an infinite loop should never happen as we save that value/reference before any patching is done to use it as our final callback (Breaks any potential chain).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried make CXXFLAGS="-O3 -fno-plt" test
CXXFLAGS="-O3" alone breaks some tests, but -fno-plt breaks even more.
(note these are CXXFLAGS, not CXXFLAGS_EXTRA)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, this was already a problem in existing code. Your change does not make it worse, so I'm happy to accept it once remaining issues are fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin Thanks for the provided information

I've applied the latest feedback here 376eeb0

calloc_t _orig_calloc;
realloc_t _orig_realloc;
posix_memalign_t _orig_posix_memalign;
aligned_alloc_t _orig_aligned_alloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All file-local variables and functions must be declared static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled here a4a36cd

typedef void* (*calloc_t)(size_t,size_t);
typedef void* (*realloc_t)(void*, size_t);
typedef int (*posix_memalign_t)(void**, size_t, size_t);
typedef void* (*aligned_alloc_t)(size_t, size_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many typedefs. It's possible to avoid all these definitions just by declaring _orig_* function with a proper type once and then using decltype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled here a4a36cd

_orig_calloc = (calloc_t)*lib->findImport(im_calloc);
_orig_realloc = (realloc_t)*lib->findImport(im_realloc);
_orig_posix_memalign = (posix_memalign_t)*lib->findImport(im_posix_memalign);
_orig_aligned_alloc = (aligned_alloc_t)*lib->findImport(im_aligned_alloc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each function name is repeated 3 times, which makes it harder to read and to modify.
This is one the cases where macros are really good (althogh they are generally discouraged).
I'd write something like:

SAVE_ORIG(malloc);
SAVE_ORIG(free);
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C630

handled here a4a36cd

@apangin apangin merged commit 19ad42c into async-profiler:master Jun 3, 2025
29 of 34 checks passed
roy-soumadipta pushed a commit to roy-soumadipta/async-profiler that referenced this pull request Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0