-
Notifications
You must be signed in to change notification settings - Fork 936
Enable native memory profiling on the async-profiler #1323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable native memory profiling on the async-profiler #1323
Conversation
I've been doing additional testing on this & found that it should also fix #1310. This is due to how resolution of original memory allocation function is made, as it's done via the original GOT entry value which will contain the correct malloc to call for the shared object provided by the dynamic linker (For one profiler).
This means any future memory allocation will start to have the following flow after all 3 steps are concluded The order of native memory profiling sessions start between profilers won't be important as the first profiler to start will always have the |
src/mallocTracer.cpp
Outdated
if (ptr) free(ptr); | ||
|
||
ptr = malloc(1); | ||
if (ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
free
perfectly works with NULL pointer. Code can be a lot simpler without if
checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did oppressive some some sigfaults on MacOs that made me add all the existing if
statements just for safety
I will double check which if statements can be removed (I believe I can remove all other than the realloc
if check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced the checks here 3dfc520
I've kept checks before calling realloc & after calling posix_memalign
I also moved the symbols to their own function as I noticed that MacOs was optimizing some of them out when removing the if checks,
This new function is Marked to not be optimized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, newer GCC versions still optimize calloc
and aligned_alloc
in this code, leading to a crash.
I came up with this implementation that retains all calls even with maximum optimization level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @apangin the implementation does look great,
I will add it on my branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/mallocTracer.cpp
Outdated
ptr = aligned_alloc(1, 1); | ||
if (ptr) free(ptr); | ||
|
||
_orig_malloc = (malloc_t)*lib->findImport(im_malloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not standard dlsym()
approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using dlsym
will cause an absolute symbol resolution rather than a lazy symbol
The resolved absolute symbol via dlsym
could be the wrong one (Not the one we want to call)
Using RTLD_NEXT previously was doing that, as it was resolving the wrong malloc
function to call
I did try using RTLD_DEFAULT & a dlopen handle to evaluate the symbols but both resulted in test failures
As such I decided on using the lazy dynamic symbols as it was the safest choice as it will be equivalent to the current state of async profiler of using the dynamic linker.
The dlopen handle caused the following test to fail (evaluating wrong malloc)
NativememTests.preloadMalloc/LD_PRELOAD+profiler_first
NativememTests.preloadMalloc/LD_PRELOAD+profiler_second
NativememTests.preloadMalloc/api_test
The RTLD_DEFAULT caused the following tests to fail (Infinite callback loop)
ativememTests.malloc_plt_dyn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that malloc_plt_dyn
test indeed fails on Linux, depending on the compiler version and flags.
If the tests program has BIND_NOW
flag, the test passes; if not, dlsym
returns an address of PLT stub instead of the actual libc implementation.
Unfortunately, current code and the proposed findImport
approach may also both fail depending on the compiler options. For example, if libasyncProfiler is compiled with -fno-plt
flag, findImport
will also return an address of PLT stub, resulting in subsequent recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin thanks for the follow up analysis on malloc_plt_dyn
& dlsym
Regarding the compile with -fno-plt
, I tried compiling the async-profiler
with -fno-plt
however I didn't encounter any infinite loops
make clean; CFLAGS_EXTRA=-fno-plt CXXFLAGS_EXTRA=-fno-plt make test
can you provide me what gcc version you're currently using & how the compilation was made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin I was doing some experimentation & managed to create an infinite loop on async-profiler master branch
This was done using the following command
CFLAGS_EXTRA="-fno-plt -Wl,-Bsymbolic" CXXFLAGS_EXTRA="-fno-plt -Wl,-Bsymbolic" make test TESTS=nativemem;
Same command doesn't fail on this branch
From what I can tell, this happens as when the PLT stubs were removed, the async-profiler ended up using a reference for malloc that was later patched/changed by the profiler.
However this doesn't happen in this branch as the profiler saves the old Reference before it's changed by the profiler, hence avoiding any infinite loop possibility.
so effectively this change shouldn't have infinite loops as it doesn't depend on stubs existing or not,
It simply depends on the dynamic linker writing a correct value that would end up calling the correct malloc.
As long as a correct value is written, an infinite loop should never happen as we save that value/reference before any patching is done to use it as our final callback (Breaks any potential chain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried make CXXFLAGS="-O3 -fno-plt" test
CXXFLAGS="-O3"
alone breaks some tests, but -fno-plt
breaks even more.
(note these are CXXFLAGS
, not CXXFLAGS_EXTRA
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, this was already a problem in existing code. Your change does not make it worse, so I'm happy to accept it once remaining issues are fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/mallocTracer.cpp
Outdated
calloc_t _orig_calloc; | ||
realloc_t _orig_realloc; | ||
posix_memalign_t _orig_posix_memalign; | ||
aligned_alloc_t _orig_aligned_alloc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All file-local variables and functions must be declared static
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled here a4a36cd
src/mallocTracer.cpp
Outdated
typedef void* (*calloc_t)(size_t,size_t); | ||
typedef void* (*realloc_t)(void*, size_t); | ||
typedef int (*posix_memalign_t)(void**, size_t, size_t); | ||
typedef void* (*aligned_alloc_t)(size_t, size_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many typedefs. It's possible to avoid all these definitions just by declaring _orig_*
function with a proper type once and then using decltype
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled here a4a36cd
src/mallocTracer.cpp
Outdated
_orig_calloc = (calloc_t)*lib->findImport(im_calloc); | ||
_orig_realloc = (realloc_t)*lib->findImport(im_realloc); | ||
_orig_posix_memalign = (posix_memalign_t)*lib->findImport(im_posix_memalign); | ||
_orig_aligned_alloc = (aligned_alloc_t)*lib->findImport(im_aligned_alloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each function name is repeated 3 times, which makes it harder to read and to modify.
This is one the cases where macros are really good (althogh they are generally discouraged).
I'd write something like:
SAVE_ORIG(malloc);
SAVE_ORIG(free);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled here a4a36cd
Description
Enable Native memory profiling on the async-profiler shared objects
Previously this profiling was removed as to solve #1266
This PR keeps that fix by reading the GOT entry before it is overwritten by the profiler & uses it to call the correct malloc via the dynamic linker
Related issues
N/A
10BC0Motivation and context
Allow async-profiler to detect memory leaks inside it self
How has this been tested?
make test
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.