-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[python3-libraries] #2567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python3-libraries] #2567
Conversation
Wow great work, I'll give this a more thorough look soon.
Try passing the ASAN flag to configure like this oss-fuzz/projects/cpython3/build.sh Lines 12 to 15 in e8df83f
@gpshead might be able to weigh in on that, from a quick look at the diff it doesn't seem too intrusive: python/cpython@master...guidovranken:fuzzing Although stuff like using boost will definitely have to go if you want to merge it upstream. (Edit: whoops, missed all the changes to ceval.c, those might be a fair bit harder to swallow.) |
That |
Thanks, yep that should work. Speed is important, so the question is how to get this data into a C array as efficiently as possible, preferably without a chain of intermediate fun
8000
ctions. I suppose I can re-implement |
Changing the |
@ammaraskar I copied part of your build script and the leaks are now gone, thanks. @gpshead I'm using a sed-based solution to patch code coverage recording into upstream CPython. This ought to work for the foreseeable future. If it breaks, I'll fix it. @gpshead Would you say supporting Python 2 is worth the hassle in the face of its end-of-life? Is the current CC list correct? @kcc @jonathanmetzman FYI |
I wouldn't recommend spending any effort doing this for Python 2. |
Not sure why AFL doesn't detect code coverage (in the Travis build). Instrumentation is provided by regular |
Can you rerun Travis? I just changed |
Done! |
Thanks. It passes. Can we merge? |
Who will be fixing the bugs found by this? I see only one maintainer in the CC list: @gpshead Another question: should we remove view restrictions from this project, e.g. how it's done for LLVM: https://github.com/google/oss-fuzz/blob/master/projects/llvm/project.yaml#L25 |
I can't answer those questions but I've notified the Python oss-fuzz thread about this https://bugs.python.org/issue29505 |
Good call to ask for that. I've pointed security@python.org at your message as well suggesting that a few responders on there sign up. |
@Dor1s Maybe we can just merge and start fuzzing and add more people later, lest it stalls indefinitely like the Go fuzzers? Like I said I'll forward security bugs myself to security@python.org if needed. |
- "gps@google.com" | ||
- "alex.gaynor@gmail.com" | ||
- "ammar@ammaraskar.com" | ||
sanitizers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for not building with msan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I build with MSAN and I run I get:
MemorySanitizer:DEADLYSIGNAL
==12==ERROR: MemorySanitizer: SEGV on unknown address 0x7f15101d4000 (pc 0x0000004d9ee2 bp 0x000000000000 sp 0x7ffd18bb2b30 T12)
==12==The signal is caused by a READ memory access.
#0 0x4d9ee1 in __sanitizer::ForEachMappedRegion(link_map*, void (*)(void const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:1160:3
#1 0x51fb26 in __interceptor_dlopen /src/llvm/projects/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:6033:3
#2 0xb602e8 in _PyImport_FindSharedFuncptr /src/cpython/./Python/dynload_shlib.c:99:14
#3 0xa4b277 in _PyImport_LoadDynamicModuleWithSpec /src/cpython/./Python/importdl.c:134:18
#4 0xa49d08 in _imp_create_dynamic_impl /src/cpython/Python/import.c:2269:11
#5 0xa46fca in _imp_create_dynamic /src/cpython/Python/clinic/import.c.h:330:20
#6 0x6c09e4 in cfunction_vectorcall_FASTCALL /src/cpython/Objects/methodobject.c:421:24
#7 0x59d8dc in PyVectorcall_Call /src/cpython/Objects/call.c:206:16
#8 0x97a476 in do_call_core /src/cpython/Python/ceval.c:5118:9
#9 0x95840e in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3667:22
#10 0x97eff6 in _PyEval_EvalCodeWithName /src/cpython/Python/ceval.c:4410:14
#11 0x5a015b in _PyFunction_Vectorcall /src/cpython/Objects/call.c:357:12
#12 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#13 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#14 0x958893 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3574:23
#15 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#16 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#17 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#18 0x94a2c3 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3591:23
#19 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#20 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#21 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#22 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
#23 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#24 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#25 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#26 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
#27 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#28 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#29 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#30 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
#31 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#32 0x5a4249 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#33 0x5a5386 in object_vacall /src/cpython/Objects/call.c:793:14
#34 0x5a5a0b in _PyObject_CallMethodIdObjArgs /src/cpython/Objects/call.c:880:24
#35 0xa425ad in import_find_and_load /src/cpython/Python/import.c:1741:11
#36 0xa3f32b in PyImport_ImportModuleLevelObject /src/cpython/Python/import.c:1843:15
#37 0x976338 in import_name /src/cpython/Python/ceval.c:5274:15
#38 0x93fd6d in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3081:19
#39 0x97eff6 in _PyEval_EvalCodeWithName /src/cpython/Python/ceval.c:4410:14
#40 0x93384a in PyEval_EvalCodeEx /src/cpython/Python/ceval.c:4439:12
#41 0x93359a in PyEval_EvalCode /src/cpython/Python/ceval.c:714:12
#42 0x1028b13 in builtin_exec_impl /src/cpython/Python/bltinmodule.c:1032:13
#43 0x101e79e in builtin_exec /src/cpython/Python/clinic/bltinmodule.c.h:396:20
#44 0x6c09e4 in cfunction_vectorcall_FASTCALL /src/cpython/Objects/methodobject.c:421:24
#45 0x59d8dc in PyVectorcall_Call /src/cpython/Objects/call.c:206:16
#46 0x97a476 in do_call_core /src/cpython/Python/ceval.c:5118:9
#47 0x95840e in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3667:22
#48 0x97eff6 in _PyEval_EvalCodeWithName /src/cpython/Python/ceval.c:4410:14
#49 0x5a015b in _PyFunction_Vectorcall /src/cpython/Objects/call.c:357:12
#50 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#51 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#52 0x958893 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3574:23
#53 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#54 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#55 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#56 0x94a2c3 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3591:23
#57 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#58 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#59 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#60 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
#61 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#62 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#63 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#64 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
#65 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#66 0x5a4249 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#67 0x5a5386 in object_vacall /src/cpython/Objects/call.c:793:14
#68 0x5a5a0b in _PyObject_CallMethodIdObjArgs /src/cpython/Objects/call.c:880:24
#69 0xa425ad in import_find_and_load /src/cpython/Python/import.c:1741:11
#70 0xa3f32b in PyImport_ImportModuleLevelObject /src/cpython/Python/import.c:1843:15
#71 0x101a979 in builtin___import__ /src/cpython/Python/bltinmodule.c:278:12
#72 0x59ed9e in cfunction_call_varargs /src/cpython/Objects/call.c:385:18
#73 0x59be58 in _PyObject_MakeTpCall /src/cpython/Objects/call.c:170:18
#74 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#75 0x94628b in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3622:19
#76 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#77 0x998074 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#78 0x98a64f in _PyObject_CallOneArg /src/cpython/./Include/cpython/abstract.h:145:12
#79 0x989696 in _PyCodec_Lookup /src/cpython/Python/codecs.c:150:18
#80 0x98c346 in _PyCodec_LookupTextEncoding /src/cpython/Python/codecs.c:528:13
#81 0x998978 in codec_getitem_checked /src/cpython/Python/codecs.c:569:13
#82 0x98cab2 in _PyCodec_DecodeText /src/cpython/Python/codecs.c:608:15
#83 0x7d2189 in PyUnicode_Decode /src/cpython/Objects/unicodeobject.c:3413:15
#84 0x7d10a5 in PyUnicode_FromEncodedObject /src/cpython/Objects/unicodeobject.c:3262:16
#85 0x57b3f6 in bytes_decode /src/cpython/Objects/clinic/bytesobject.c.h:601:20
#86 0xebad4e in method_vectorcall_FASTCALL_KEYWORDS /src/cpython/Objects/descrobject.c:371:24
#87 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
#88 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
#89 0x94a2c3 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3591:23
#90 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
#91 0x59d8dc in PyVectorcall_Call /src/cpython/Objects/call.c:206:16
#92 0x59dc6d in PyObject_Call /src/cpython/Objects/call.c:237:16
#93 0x5a0f5e in PyEval_CallObjectWithKeywords /src/cpython/Objects/call.c:452:16
#94 0x54c515 in LLVMFuzzerTestOneInput /src/python-library-fuzzers/fuzzer.cpp:123:14
#95 0x481181 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:553:15
#96 0x4809c5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:469:3
#97 0x483287 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:766:7
#98 0x4835f9 in fuzzer::Fuzzer::Loop(std::Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:793:3
#99 0x471748 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:825:6
#100 0x49abf2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
#101 0x7f150cd3782f in __libc_start_main (/lib/x86_64-linux
Any idea how to fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSAN builds are extremely complicated. You technically need to build all libraries that the msan compiled binary links against with MSAN as well. And/Or only be using libraries that the memory sanitizer is fully aware of all behaviors of so it can ignore behavior stemming from code within them.
Attempting to get a stable MSAN build and stable MSAN build environment is a project of its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know, but if lack of instrumentation is the problem, MSAN fuzzers will normally just crash on false positives. This, however, is a segmentation fault inside MSAN which I've never seen before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah yeah, we get away with this in the cpython3 build:
- memory |
because we import a very small subset of the modules. I'm guessing what's happening is one of the libraries you're importing eventually imports a C extension. This causes a
dlopen
and some non-MSAN instrumented library is causing a false positive?
However the stack trace suspiciously looks like it might be an MSAN bug.
For what it's worth I'm also fine with triaging bugs found by this fuzzer to upstream python. I like this approach of being able to write fuzzers in python itself. Just for reference, how many executions/second do you get for libfuzzer with asan just doing |
I noticed the fuzzers were building, but not running, presumably because I used hardcoded paths to $OUT, which exist during the build, but not during the run, so I've resolved this in https://github.com/guidovranken/python-library-fuzzers and I expect the fuzzers to run after the next daily build. |
CC @alex @ammaraskar
This is a fuzzer for Python libraries. Code coverage at the Python level is measured by tracking the executed blocks in the interpreter. This is carried over to libFuzzer using extra counters. On top of that, the Python core code is also instrumented (and sanitized).
This should bring to light interpreter memory bugs as well as standard library bugs (like hangs, or slice OOB access).
It is very fast and harnesses for library code are very simple, eg. this is the code for fuzzing the HTTP client:
Adding more fuzzers is as easy as creating a new
.py
file. No additional C code is required.Here's one bug it found so far: https://bugs.python.org/issue37461
I've copied over the persons listed in https://github.com/google/oss-fuzz/blob/master/projects/cpython3/project.yaml. If this is incorrect I will change it.
When I build and run using
infra/helper.py
, I get memory leaks for the json and email fuzzers. I think these are false positives. Can we disable leak detection?This project currently uses a modified CPython to make Python-level instrumentation possible. If there is interest from your side in merging this project, I will reach out to the Python developers to integrate this patch into the master branch.