8000 [python3-libraries] by guidovranken · Pull Request #2567 · google/oss-fuzz · GitHub
[go: up one dir, main page]

Skip to content

[python3-libraries] #2567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 10, 2019
Merged

[python3-libraries] #2567

merged 3 commits into from
Jul 10, 2019

Conversation

guidovranken
Copy link
Contributor
@guidovranken guidovranken commented Jul 2, 2019

CC @alex @ammaraskar

This is a fuzzer for Python libraries. Code coverage at the Python level is measured by tracking the executed blocks in the interpreter. This is carried over to libFuzzer using extra counters. On top of that, the Python core code is also instrumented (and sanitized).

This should bring to light interpreter memory bugs as well as standard library bugs (like hangs, or slice OOB access).

It is very fast and harnesses for library code are very simple, eg. this is the code for fuzzing the HTTP client:

from http import client
import io

class Sock(object):
    def __init__(self, data):
        self.data = data
    def makefile(self, mode):
        return io.BytesIO(self.data)

def FuzzerRunOne(FuzzerInput):
    response = client.HTTPResponse(Sock(FuzzerInput))
    try:
        response.begin()
    except:
        pass

Adding more fuzzers is as easy as creating a new .py file. No additional C code is required.

Here's one bug it found so far: https://bugs.python.org/issue37461

I've copied over the persons listed in https://github.com/google/oss-fuzz/blob/master/projects/cpython3/project.yaml. If this is incorrect I will change it.

When I build and run using infra/helper.py, I get memory leaks for the json and email fuzzers. I think these are false positives. Can we disable leak detection?

This project currently uses a modified CPython to make Python-level instrumentation possible. If there is interest from your side in merging this project, I will reach out to the Python developers to integrate this patch into the master branch.

@ammaraskar
Copy link
Contributor
ammaraskar commented Jul 2, 2019

Wow great work, I'll give this a more thorough look soon.

I get memory leaks for the json and email fuzzers. I think these are false positives.

Try passing the ASAN flag to configure like this

case $SANITIZER in
address)
FLAGS="--with-address-sanitizer"
;;
and see if it still leaks.

This project currently uses a modified CPython to make Python-level instrumentation possible. If there is interest from your side in merging this project, I will reach out to the Python developers to integrate this patch into the master branch.

@gpshead might be able to weigh in on that, from a quick look at the diff it doesn't seem too intrusive: python/cpython@master...guidovranken:fuzzing Although stuff like using boost will definitely have to go if you want to merge it upstream.

(Edit: whoops, missed all the changes to ceval.c, those might be a fair bit harder to swallow.)

@gpshead
Copy link
Contributor
gpshead commented Jul 3, 2019

That ceval.c modification is invasive. Instead of manually annotating every bytecode dispatch with a C function call, could you make use of sys.settrace() https://docs.python.org/3.8/library/sys.html#sys.settrace in opcode mode to capture such data?

@guidovranken
Copy link
Contributor Author

That ceval.c modification is invasive. Instead of manually annotating every bytecode dispatch with a C function call, could you make use of sys.settrace() https://docs.python.org/3.8/library/sys.html#sys.settrace in opcode mode to capture such data?

Thanks, yep that should work. Speed is important, so the question is how to get this data into a C array as efficiently as possible, preferably without a chain of intermediate fun 8000 ctions. I suppose I can re-implement sys_settrace, trace_trampoline, call_trampoline and intercept the data I need there. Another idea is to just change the TARGET macro if FUZZING is defined.. Or I can do it on the fly with sed :). Do you have any suggestions?

@gpshead
Copy link
Contributor
gpshead commented Jul 3, 2019

Changing the TARGET macro when building a fuzzing enabled interpreter sounds like a convenient hack...

@guidovranken
Copy link
Contributor Author

@ammaraskar I copied part of your build script and the leaks are now gone, thanks.

@gpshead I'm using a sed-based solution to patch code coverage recording into upstream CPython. This ought to work for the foreseeable future. If it breaks, I'll fix it.

@gpshead Would you say supporting Python 2 is worth the hassle in the face of its end-of-life?

Is the current CC list correct?
With the current set of fuzzers, I can manually repost bug reports to the Python issue tracker.

@kcc @jonathanmetzman FYI fuzzer-email will currently timeout on the seed corpus due to https://bugs.python.org/issue37461

@gpshead
Copy link
Contributor
gpshead commented Jul 3, 2019

I wouldn't recommend spending any effort doing this for Python 2.

@guidovranken
Copy link
Contributor Author
guidovranken commented Jul 3, 2019

Not sure why AFL doesn't detect code coverage (in the Travis build). Instrumentation is provided by regular -fsanitize.... flags and extra counters, and AFL should detect the former. Personally I can do without AFL for now..

@guidovranken
Copy link
Contributor Author

Can you rerun Travis? I just changed -fsanitize=fuzzer to $LIB_FUZZING_ENGINE, which ought the be the reason AFL fails.

@inferno-chromium
Copy link
Contributor

Done!

@guidovranken
Copy link
Contributor Author

Done!

Thanks. It passes. Can we merge?

@Dor1s
Copy link
Contributor
Dor1s commented Jul 8, 2019

Who will be fixing the bugs found by this? I see only one maintainer in the CC list: @gpshead

Another question: should we remove view restrictions from this project, e.g. how it's done for LLVM: https://github.com/google/oss-fuzz/blob/master/projects/llvm/project.yaml#L25

@guidovranken
Copy link
Contributor Author

I can't answer those questions but I've notified the Python oss-fuzz thread about this https://bugs.python.org/issue29505

@gpshead
Copy link
Contributor
gpshead commented Jul 8, 2019

Good call to ask for that. I've pointed security@python.org at your message as well suggesting that a few responders on there sign up.

@guidovranken
Copy link
Contributor Author

@Dor1s Maybe we can just merge and start fuzzing and add more people later, lest it stalls indefinitely like the Go fuzzers? Like I said I'll forward security bugs myself to security@python.org if needed.

- "gps@google.com"
- "alex.gaynor@gmail.com"
- "ammar@ammaraskar.com"
sanitizers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for not building with msan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I build with MSAN and I run I get:

MemorySanitizer:DEADLYSIGNAL
==12==ERROR: MemorySanitizer: SEGV on unknown address 0x7f15101d4000 (pc 0x0000004d9ee2 bp 0x000000000000 sp 0x7ffd18bb2b30 T12)
==12==The signal is caused by a READ memory access.
    #0 0x4d9ee1 in __sanitizer::ForEachMappedRegion(link_map*, void (*)(void const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:1160:3
    #1 0x51fb26 in __interceptor_dlopen /src/llvm/projects/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:6033:3
    #2 0xb602e8 in _PyImport_FindSharedFuncptr /src/cpython/./Python/dynload_shlib.c:99:14
    #3 0xa4b277 in _PyImport_LoadDynamicModuleWithSpec /src/cpython/./Python/importdl.c:134:18
    #4 0xa49d08 in _imp_create_dynamic_impl /src/cpython/Python/import.c:2269:11
    #5 0xa46fca in _imp_create_dynamic /src/cpython/Python/clinic/import.c.h:330:20
    #6 0x6c09e4 in cfunction_vectorcall_FASTCALL /src/cpython/Objects/methodobject.c:421:24
    #7 0x59d8dc in PyVectorcall_Call /src/cpython/Objects/call.c:206:16
    #8 0x97a476 in do_call_core /src/cpython/Python/ceval.c:5118:9
    #9 0x95840e in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3667:22
    #10 0x97eff6 in _PyEval_EvalCodeWithName /src/cpython/Python/ceval.c:4410:14
    #11 0x5a015b in _PyFunction_Vectorcall /src/cpython/Objects/call.c:357:12
    #12 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #13 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #14 0x958893 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3574:23
    #15 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #16 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #17 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #18 0x94a2c3 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3591:23
    #19 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #20 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #21 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #22 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
    #23 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #24 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #25 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #26 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
    #27 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #28 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #29 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #30 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
    #31 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #32 0x5a4249 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #33 0x5a5386 in object_vacall /src/cpython/Objects/call.c:793:14
    #34 0x5a5a0b in _PyObject_CallMethodIdObjArgs /src/cpython/Objects/call.c:880:24
    #35 0xa425ad in import_find_and_load /src/cpython/Python/import.c:1741:11
    #36 0xa3f32b in PyImport_ImportModuleLevelObject /src/cpython/Python/import.c:1843:15
    #37 0x976338 in import_name /src/cpython/Python/ceval.c:5274:15
    #38 0x93fd6d in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3081:19
    #39 0x97eff6 in _PyEval_EvalCodeWithName /src/cpython/Python/ceval.c:4410:14
    #40 0x93384a in PyEval_EvalCodeEx /src/cpython/Python/ceval.c:4439:12
    #41 0x93359a in PyEval_EvalCode /src/cpython/Python/ceval.c:714:12
    #42 0x1028b13 in builtin_exec_impl /src/cpython/Python/bltinmodule.c:1032:13
    #43 0x101e79e in builtin_exec /src/cpython/Python/clinic/bltinmodule.c.h:396:20
    #44 0x6c09e4 in cfunction_vectorcall_FASTCALL /src/cpython/Objects/methodobject.c:421:24
    #45 0x59d8dc in PyVectorcall_Call /src/cpython/Objects/call.c:206:16
    #46 0x97a476 in do_call_core /src/cpython/Python/ceval.c:5118:9
    #47 0x95840e in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3667:22
    #48 0x97eff6 in _PyEval_EvalCodeWithName /src/cpython/Python/ceval.c:4410:14
    #49 0x5a015b in _PyFunction_Vectorcall /src/cpython/Objects/call.c:357:12
    #50 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #51 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #52 0x958893 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3574:23
    #53 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #54 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #55 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #56 0x94a2c3 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3591:23
    #57 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #58 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #59 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #60 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
    #61 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #62 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #63 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #64 0x937937 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3606:19
    #65 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #66 0x5a4249 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #67 0x5a5386 in object_vacall /src/cpython/Objects/call.c:793:14
    #68 0x5a5a0b in _PyObject_CallMethodIdObjArgs /src/cpython/Objects/call.c:880:24
    #69 0xa425ad in import_find_and_load /src/cpython/Python/import.c:1741:11
    #70 0xa3f32b in PyImport_ImportModuleLevelObject /src/cpython/Python/import.c:1843:15
    #71 0x101a979 in builtin___import__ /src/cpython/Python/bltinmodule.c:278:12
    #72 0x59ed9e in cfunction_call_varargs /src/cpython/Objects/call.c:385:18
    #73 0x59be58 in _PyObject_MakeTpCall /src/cpython/Objects/call.c:170:18
    #74 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #75 0x94628b in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3622:19
    #76 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #77 0x998074 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #78 0x98a64f in _PyObject_CallOneArg /src/cpython/./Include/cpython/abstract.h:145:12
    #79 0x989696 in _PyCodec_Lookup /src/cpython/Python/codecs.c:150:18
    #80 0x98c346 in _PyCodec_LookupTextEncoding /src/cpython/Python/codecs.c:528:13
    #81 0x998978 in codec_getitem_checked /src/cpython/Python/codecs.c:569:13
    #82 0x98cab2 in _PyCodec_DecodeText /src/cpython/Python/codecs.c:608:15
    #83 0x7d2189 in PyUnicode_Decode /src/cpython/Objects/unicodeobject.c:3413:15
    #84 0x7d10a5 in PyUnicode_FromEncodedObject /src/cpython/Objects/unicodeobject.c:3262:16
    #85 0x57b3f6 in bytes_decode /src/cpython/Objects/clinic/bytesobject.c.h:601:20
    #86 0xebad4e in method_vectorcall_FASTCALL_KEYWORDS /src/cpython/Objects/descrobject.c:371:24
    #87 0x979019 in _PyObject_Vectorcall /src/cpython/./Include/cpython/abstract.h:107:21
    #88 0x979468 in call_function /src/cpython/Python/ceval.c:5098:13
    #89 0x94a2c3 in _PyEval_EvalFrameDefault /src/cpython/Python/ceval.c:3591:23
    #90 0x5a0655 in function_code_fastcall /src/cpython/Objects/call.c:293:14
    #91 0x59d8dc in PyVectorcall_Call /src/cpython/Objects/call.c:206:16
    #92 0x59dc6d in PyObject_Call /src/cpython/Objects/call.c:237:16
    #93 0x5a0f5e in PyEval_CallObjectWithKeywords /src/cpython/Objects/call.c:452:16
    #94 0x54c515 in LLVMFuzzerTestOneInput /src/python-library-fuzzers/fuzzer.cpp:123:14
    #95 0x481181 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:553:15
    #96 0x4809c5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:469:3
    #97 0x483287 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:766:7
    #98 0x4835f9 in fuzzer::Fuzzer::Loop(std::Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:793:3
    #99 0x471748 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:825:6
    #100 0x49abf2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
    #101 0x7f150cd3782f in __libc_start_main (/lib/x86_64-linux

Any idea how to fix that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSAN builds are extremely complicated. You technically need to build all libraries that the msan compiled binary links against with MSAN as well. And/Or only be using libraries that the memory sanitizer is fully aware of all behaviors of so it can ignore behavior stemming from code within them.

Attempting to get a stable MSAN build and stable MSAN build environment is a project of its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know, but if lack of instrumentation is the problem, MSAN fuzzers will normally just crash on false positives. This, however, is a segmentation fault inside MSAN which I've never seen before.

Copy link
Contributor
@ammaraskar ammaraskar Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah yeah, we get away with this in the cpython3 build:


because we import a very small subset of the modules. I'm guessing what's happening is one of the libraries you're importing eventually imports a C extension. This causes a dlopen and some non-MSAN instrumented library is causing a false positive?

However the stack trace suspiciously looks like it might be an MSAN bug.

@ammaraskar
Copy link
Contributor

For what it's worth I'm also fine with triaging bugs found by this fuzzer to upstream python. I like this approach of being able to write fuzzers in python itself.

Just for reference, how many executions/second do you get for libfuzzer with asan just doing json.loads(x) where x is the fuzzing input as a python bytes object? (Could you compare it with the current fuzz_json_loads) I'm curious as to if calling the target through a python method and instrumenting ceval incurs much overhead.

8000

@Dor1s Dor1s merged commit 8aee789 into google:master Jul 10, 2019
@guidovranken
Copy link
Contributor Author

I noticed the fuzzers were building, but not running, presumably because I used hardcoded paths to $OUT, which exist during the build, but not during the run, so I've resolved this in https://github.com/guidovranken/python-library-fuzzers and I expect the fuzzers to run after the next daily build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0