8000 bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) by ssbr · Pull Request #2878 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) #2878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Sep 6, 2017
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8316e8d
Add basic fuzz tests for a few common builtin functions.
ssbr Jul 21, 2017
f47f875
Remove fuzzing of hash() per comment by kcc / oss-fuzz-team.
ssbr Jul 25, 2017
d112252
Move LLVMFuzzerTestOneInput into cpython and tweak how test discovery…
ssbr Jul 25, 2017
cb9cdc0
Move the fuzzer to C++ so that it builds.
ssbr Jul 25, 2017
7028614
Run the fuzz smoke tests on a little more input, just for kicks.
ssbr Jul 26, 2017
2b34f07
Make the _fuzz module optional.
ssbr Jul 26, 2017
f77be65
Actually run the fuzz tests...
ssbr Jul 26, 2017
778c827
Use unittest.main instead of test.support.
ssbr Jul 26, 2017
bc3d33f
Use C-style comments. (For porting to Python 2.7 and C89).
ssbr Jul 27, 2017
dee024f
Fix build break I accidentally introduced in f77be65.
ssbr Jul 27, 2017
9a00b23
Add a little detail on what the heck this stuff even is, in readme.
ssbr Jul 27, 2017
d542130
s/fuzzer.c/fuzzer.cpp/
ssbr Jul 27, 2017
9f16dd4
fuzzer.cpp -> fuzzer.c in faith that I can make it build.
8000 ssbr Aug 3, 2017
aa6d784
Make _fuzz required, so that tests never fail on a successful build.
ssbr Aug 23, 2017
fa0af73
Fix the windows test failure by just not testing on windows. :)
ssbr Aug 23, 2017
4af5c11
NEWS entry thingermajig.
ssbr Aug 24, 2017
fb017ed
Blacklist test_fuzz from coverage tests. It seems to fail with a Memo…
ssbr Aug 24, 2017
c4eda5d
Remove outdated comment.
ssbr Aug 24, 2017
83967f9
Fix incorrect mod for base (should be % 37), and document why we're t…
ssbr Aug 25, 2017
52dccc2
Use more idiomatic NULL checks rather than PyErr_Occurred().
ssbr Aug 25, 2017
4327fd9
Attempt to explain a little more why these exist / what the relations…
ssbr Aug 25, 2017
6da8e97
Renamed _fuzz to _xxtestfuzz and cleanup.
gpshead Sep 6, 2017
43620ae
don't skip test_fuzz (besides, it was renamed)
gpshead Sep 6, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions Lib/test/test_fuzz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import unittest
from test import support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Deleted support import.)

8000
import _fuzz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard dependency on the _fuzz module suggests that this file simply won't work unless _fuzz is available, and AFAICT it won't be available when compiling with gcc or MSVC++. You need to make sure these tests are skipped when the _fuzz module is unavailable.


class TestFuzz(unittest.TestCase):

def test_fuzz(self):
"""Run the fuzz tests on sample input.

This isn't meaningful and only checks it doesn't crash.
"""
_fuzz.run(b"")
_fuzz.run(b"\0")
_fuzz.run(b"{")
_fuzz.run(b" ")
_fuzz.run(b"x")
_fuzz.run(b"1")

if __name__ == "__main__":
support.run_unittest(TestFuzz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the usual unittest.main(verbosity=<select>) not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cribbed this from an existing test, (I assumed CPython had some weird test framework), but I see I must've picked the wrong test to crib from. :)

Fixed.

36 changes: 36 additions & 0 deletions Modules/_fuzz/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Fuzz Tests for CPython
======================

These fuzz tests are designed to be included in Google's `oss-fuzz`_ project.

Adding a new fuzz test
----------------------

Add the test name on a new line in ``fuzz_tests.txt``.

In ``fuzzer.c``, add a function to be run::

int $test_name (const char* data, size_t size) {
...
return 0;
}


And invoke it from ``LLVMFuzzerTestOneInput``::

#if _Py_FUZZ_YES(fuzz_builtin_float)
rv |= _run_fuzz(data, size, fuzz_builtin_float);
#endif

``LLVMFuzzerTestOneInput`` will run in oss-fuzz, with each test in
``fuzz_tests.txt`` run separately.

What makes a good fuzz test
---------------------------

Libraries written in C that might handle untrusted data are worthwhile. The
more complex the logic (e.g. parsing), the more likely this is to be a useful
fuzz test. See the existing examples for reference, and refer to the
`oss-fuzz`_ docs.

.. _oss-fuzz: https://github.com/google/oss-fuzz
52 changes: 52 additions & 0 deletions Modules/_fuzz/_fuzzmodule.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#include <Python.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this module? It's only usage seems to be for the tests, and it looks like it has as much code as the fuzzers themselves!

Copy link
Contributor Author
@ssbr ssbr Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but this would be like 2 lines in Cython. It's just the typical boilerplate of extension modules, nothing to write home about, right?

We don't need it. The thing we need is to make sure that the fuzz tests are maintained and don't break. That is, they need to be in CI for Python, not just run in the fuzz suite in oss-fuzz.

For most ways it could break, it would be enough just to build them, but not run them. I think it is a good idea to also run them. Although there's a cost: the macro silliness to make them runnable individually and collectively probably makes it easy to screw up, so I'm not happy about it. (Every fuzz test name is repeated like four times.) The strategy you mentioned above would make it less error prone, at the cost of losing the ability to call them from a unit test like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me what I see in this code is pretty standard CPython API calls. That raises the question of if there is a good reason to check these in and keep them working as part of CPython or if this code should be a stand alone module available on pypi that people could use to launch fuzz tests of any Python interpreter it is built with. The answer to that question likely comes from answering how oss-fuzz the fuzz testing service works. If it monitors our github repo and reruns fuzz tests as things change in cpython, including these here may make sense. I think a comment describing the situation and why this exists here with links to relevant oss-fuzz documentation (either in here or in the _fuzz/README.rst) would help that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a little to README.rst, but to summarize and expand here:

You're correct that oss-fuzz regularly pulls from CPython and runs the fuzz tests locally.

oss-fuzz prefers to have the tests be a tested part of the upstream project so that they don't break. Certainly not all of these fuzz tests are using the public API (e.g._Py_HashBytes), and there will be more in the future. Those are more likely to they might break in the future -- having them here ensures that breaking them would break the tests, and they will always be good to run, hopefully.

We could keep the public-API-using fuzz tests separate in oss-fuzz, but they wouldn't really like that, and that actually splits up the tests in a confusing way -- some would be here, some would be in the oss-fuzz repo. I think they're better off all being here.

#include <stdlib.h>
#include <inttypes.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);

static PyObject* _fuzz_run(PyObject* self, PyObject* args) {
const char* buf;
size_t size;
if (!PyArg_ParseTuple(args, "s#", &buf, &size)) {
return NULL;
}
int rv = LLVMFuzzerTestOneInput((const uint8_t*)buf, size);
if (PyErr_Occurred()) {
return NULL;
}
if (rv != 0) {
// Nonzero return codes are reserved for future use.
PyErr_Format(
PyExc_RuntimeError, "Nonzero return code from fuzzer: %d", rv);
return NULL;
}
Py_RETURN_NONE;
}

static PyMethodDef module_methods[] = {
{"run", (PyCFunction)_fuzz_run, METH_VARARGS, ""},
{NULL},
};

static struct PyModuleDef _fuzzmodule = {
PyModuleDef_HEAD_INIT,
"_fuzz",
NULL,
0,
module_methods,
NULL,
NULL,
NULL,
NULL
};

PyMODINIT_FUNC
PyInit__fuzz(void)
{
PyObject *m = NULL;

if ((m = PyModule_Create(&_fuzzmodule)) == NULL) {
return NULL;
}
return m;
}
3 changes: 3 additions & 0 deletions Modules/_fuzz/fuzz_tests.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
fuzz_builtin_float
fuzz_builtin_int
fuzz_builtin_unicode
118 changes: 118 additions & 0 deletions Modules/_fuzz/fuzzer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
// A fuzz test for CPython.
//
// Unusually for CPython, this is written in C++ for the benefit of linking with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unusually"? This would be literally the first C++ code shipped by CPython that ran inside the interpreter. (The existing CPP files shipped by CPython are all for Windows support, and all run outside the interpreter.) It's my understanding that CPython has zero C++ in it very much on purpose. I understand why you use C++ here, but I'd want to get BDFL blessing before I merged this file.

Also, a quick perusal of the documentation for libfuzzer suggests a strong dependency on Clang. CPython supports at least two other compilers--gcc and Microsoft Visual C++--and supports at least one major platform where Clang is not a supported compiler (Windows). Can you confirm that the build process and test suite will degrade gracefully when compiling with gcc or MSVC++?

Copy link
Contributor Author
@ssbr ssbr Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be literally the first C++ code shipped by CPython that ran inside the interpreter.

The only reason it even "ships" is because I want to run it in tests. Is there a way to include it in test builds but exclude it from the release, or does it not make much difference?

I understand why you use C++ here

To be clear: I actually don't understand myself why I use C++ here. Isn't there a way to run the compiler on a C file and link it with something else that uses C++? My experience with running the C/C++ compilers is minimal, and I am pretty sure we can use C here, somehow.

I definitely don't want this PR to come across as "we must use C++", but rather -- I did it in C++ to get it working so I could mail something out, but I'd really appreciate learning how to do it in C instead. (Most of my time in this PR was actually spent getting it building with oss-fuzz's environment, sadly.)

See commit cb9cdc0 for more details on what fails to build and why.

Also, a quick perusal of the documentation for libfuzzer suggests a strong dependency on Clang. CPython supports at least two other compilers--gcc and Microsoft Visual C++--and supports at least one major platform where Clang is not a supported compiler (Windows). Can you confirm that the build process and test suite will degrade gracefully when compiling with gcc or MSVC++?

I can do one better: nothing about this actually depends on libFuzzer, so this should compile and run on every platform (that has C++ support).

Of course, adding a dependency on C++ support to pass tests is, per above, probably not acceptable. My hope is we can get it working in C, otherwise I will make the test fail gracefully in the absence of a working C++ compiler.

Some background might help: tl;dr is that to support fuzzing / libFuzzer, all you need to do is export a specific function, LLVMFuzzerTestOneInput. libFuzzer actually provides a main() which calls that function for fuzz tests. We aren't doing that here though, we're just defining and testing LLVMFuzzerTestOneInput.

In other words, to write a fuzzer, you define LLVMFuzzerTestOneInput in foo.cpp and then use clang++ foo.cpp -lfuzzer to compile the fuzz test to an executable binary. If you want to test that LLVMFuzzerTestOneInput is remotely valid in your continuous integration, separately from running it in a fuzz test, you can call it from your unit tests.

So the only relationship this code has with libFuzzer is that it exports a function that libFuzzer expects to exist. It doesn't actually depend on libFuzzer, and isn't built against libFuzzer within Python continuous integration. It works on all platforms you'd expect it to -- it's just a C++ file that calls the regular CPython API. It is only built against libFuzzer in the oss-fuzz project, which will actually check out a copy of cpython at the master revision and compile the fuzz tests against libFuzzer, and run them.

And to go full circle, this is the core of why it was written in C++ instead of C -- if I compile a C file with "-lfuzzer" it completely blows up because libFuzzer still depends on libstdc++ and I don't know how to compile with C but still link with C++ stuff. (Or something like that. I'm a C++/C noob, I use blaze/bazel for everything.)

I hope that gives enough background that this makes a little more sense. And I'll try to write this up a bit in README.rst once I get back to my work computer tomorrow!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small note, but it's probably insufficient. Fortunately, we do link to the oss-fuzz docs if people are confused.

// libFuzzer.
//
// The only exposed function is LLVMFuzzerTestOneInput, which is called by
// fuzzers and by the _fuzz module for smoke tests.
//
// To build exactly one fuzz test, as when running in oss-fuzz etc.,
// build with -D _Py_FUZZ_ONE and -D _Py_FUZZ_<test_name>. e.g. to build
// LLVMFuzzerTestOneInput to only run "fuzz_builtin_float", build this file with
// -D _Py_FUZZ_ONE -D _Py_FUZZ_fuzz_builtin_float.
//
// See the source code for LLVMFuzzerTestOneInput for details.

#include <Python.h>
#include <stdlib.h>
#include <inttypes.h>

// Fuzz PyFloat_FromString as a proxy for float(str).
static int fuzz_builtin_float(const char* data, size_t size) {
PyObject* s = PyBytes_FromStringAndSize(data, size);
if (s == NULL) return 0;
PyObject* f = PyFloat_FromString(s);
if (PyErr_Occurred() && PyErr_ExceptionMatches(PyExc_ValueError)) {
PyErr_Clear();
}

Py_XDECREF(f);
Py_DECREF(s);
return 0;
}

// Fuzz PyLong_FromUnicodeObject as a proxy for int(str).
static int fuzz_builtin_int(const char* data, size_t size) {
int base = _Py_HashBytes(data, size) % 36;
if (base == 1) {
base = 0;
}
if (base == -1) {
return 0; // An error occurred, bail early.
}
if (base < 0) {
base = -base;
}

PyObject* s = PyUnicode_FromStringAndSize(data, size);
if (PyErr_Occurred()) {
if (PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)) {
PyErr_Clear();
}
return 0;
}
PyObject* l = PyLong_FromUnicodeObject(s, base);
if (PyErr_Occurred() && PyErr_ExceptionMatches(PyExc_ValueError)) {
PyErr_Clear();
}
PyErr_Clear();
Py_XDECREF(l);
Py_DECREF(s);
return 0;
}
10000
// Fuzz PyUnicode_FromStringAndSize as a proxy for unicode(str).
static int fuzz_builtin_unicode(const char* data, size_t size) {
PyObject* s = PyUnicode_FromStringAndSize(data, size);
if (PyErr_Occurred() && PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)) {
PyErr_Clear();
}
Py_XDECREF(s);
return 0;
}

// Run fuzzer and abort on failure.
static int _run_fuzz(const uint8_t *data, size_t size, int(*fuzzer)(const char* , size_t)) {
int rv = fuzzer("", 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this should be calling it with (data, size)?

Copy link
Contributor Author
@ssbr ssbr Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Worst kind of refactoring error. Thanks for the catch! (Fixed in latest commit.)

if (PyErr_Occurred()) {
// Fuzz tests should handle expected errors for themselves.
PyErr_Print();
abort();
}
// Someday the return value might mean something, propagate it.
return rv;
}

// CPython generates a lot of leak warnings for whatever reason.
extern "C" int __lsan_is_turned_off(void) { return 1; }

// Fuzz test interface.
// This returns the bitwise or of all fuzz test's return values.
//
// All fuzz tests must return 0, as all nonzero return codes are reserved for
// future use -- we propagate the return values for that future case.
// (And we bitwise or when running multiple tests to verify that normally we
// only return 0.)
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (!Py_IsInitialized()) {
// LLVMFuzzerTestOneInput is called repeatedly from the same process, with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is LLVMFuzzerInitialize not usable?

Copy link
Contributor Author
@ssbr ssbr Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so, for the record: I didn't know about LLVMFuzzerInitialize / it wasn't documented where I was looking.

I googled it to see how dumb I was, and truthfully I still can't find where it's documented, but I did find this thread:

https://codereview.chromium.org/2900373002

To quote it:

This function should only be used if argv is needed, otherwise libfuzzer
best practice is to just use static initialization in
LLVMFuzzerTestOneInput [1].

So actually, I think this is reasonable: we would use a static variable to guard the initialization, but actually, it's useful to do the Py_IsInitialized check instead, because then we can call it from CPython in the unit test without double-initializing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I didn't even notice the link to docs.

http://llvm.org/docs/LibFuzzer.html#startup-initialization

So the official word is:

Alternatively, you may define an optional init function and it will receive the program arguments that you can read and modify. Do this only if you really need to access argv/argc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 -- I wonder where we file a bug on libFuzzer to document this, the OpenSSL fuzzers use LLVMFuzzerInitialize in precisely the way this counsels against.

// no separate initialization phase, sadly, so we need to initialize CPython
// ourselves on the first run.
Py_InitializeEx(0);
}

int rv = 0;

#define _Py_FUZZ_YES(test_name) (defined(_Py_FUZZ_##test_name) || !defined(_Py_FUZZ_ONE))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to make these multiple C files that define their fuzz function and then #include the common bits?

Copy link
Contributor Author
@ssbr ssbr Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the suggestion. Care to elaborate a bit?

It might be worth looking at commit d112252 to see how test discovery was when I first hacked something together. I couldn't find any solution I liked, but I'm no C expert. (In Python this would be no sweat!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was to have one file per fuzzer, and each of which defines:

static int  pyfuzz(const char* data, size_t size) {
    // ...
}

#include "fuzz_template.c"

Where fuzz_template.c defines LLVMFuzzerTestOneInput and calls pyfuzz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These won't link together due to ODR violation -- so, the reason I didn't do it that way was so that I could run the code at test-time. (Which leaves your other question below...)

#if _Py_FUZZ_YES(fuzz_builtin_float)
rv |= _run_fuzz(data, size, fuzz_builtin_float);
#endif
#if _Py_FUZZ_YES(fuzz_builtin_int)
rv |= _run_fuzz(data, size, fuzz_builtin_int);
#endif
#if _Py_FUZZ_YES(fuzz_builtin_unicode)
rv |= _run_fuzz(data, size, fuzz_builtin_unicode);
#endif
#undef _Py_FUZZ_YES
return rv;
}
7 changes: 7 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -715,6 +715,13 @@ def detect_modules(self):
# syslog daemon interface
exts.append( Extension('syslog', ['syslogmodule.c']) )

# Fuzz tests.
exts.append( Extension(
'_fuzz',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents the _fuzz extension module from being installed and shipped with CPython? It seems like something needed only at build + test time in order to prevent bitrot of code that is primarily used outside of our own test suite. Not something to be distributed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. I don't think it should be distributed with Python, but I don't know how to stop that while still keeping it built for tests. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the _fuzz module and it's corresponding test_fuzz.py add any real value. I expect you could make fuzzer.c be compile-only (with no attempt to link it into the main binary) with a Makefile.pre.in modification which should be good enough. At that point it could live in the Programs/ directory next to with _testembed.c with which it seems more akin to instead of under Modules/

A test that calling the fuzz function with a trivial input in our own test suite not crashing is not so important as the oss-fuzz project by its very nature is going to be doing that so we'd be notified. That we at least keep the code compiling is pretty good.

Copy link
Contributor Author
@ssbr ssbr Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we can do this. It isn't ideal / isn't what oss-fuzz would like us to do. See e.g. this doc: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

AIUI the oss-fuzz project doesn't want to be in the game of reporting broken tests, only of reporting actual security issues. (Unfortunately, I'm already deviating fairly substantially from what's ideal, but I'd like to be as close as possible).

(In particular, I think it is adding value: we should separate fuzz tests from unit tests / smoke tests so that reporting is easier and breakages less common.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, talking to @ncoghlan at the dev. sprint this week, we've got other modules that are built and effectively "shipped" but are really used for testing/development only. so this isn't the first, second, or even third such thing. _test_capi is one of them. xxlimited is another. so i'm going to propose _xx_test_fuzz as the name to "Use all the naming idioms!" . Python distribution creators are expected to realize that isn't needed in the main interpreter runtime package.

... let me rename things and look at this in whole again from that point of view.

['_fuzz/_fuzzmodule.c', '_fuzz/fuzzer.cpp'],
optional=True)
)

#
# Here ends the simple stuff. From here on, modules need certain
# libraries, are platform-specific, or present other surprises.
Expand Down
0