-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) #2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
8316e8d
f47f875
d112252
cb9cdc0
7028614
2b34f07
f77be65
778c827
bc3d33f
dee024f
9a00b23
d542130
9f16dd4
aa6d784
fa0af73
4af5c11
fb017ed
c4eda5d
83967f9
52dccc2
4327fd9
6da8e97
43620ae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
import unittest | ||
from test import support | ||
import _fuzz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This hard dependency on the _fuzz module suggests that this file simply won't work unless _fuzz is available, and AFAICT it won't be available when compiling with gcc or MSVC++. You need to make sure these tests are skipped when the _fuzz module is unavailable. |
||
|
||
class TestFuzz(unittest.TestCase): | ||
|
||
def test_fuzz(self): | ||
"""Run the fuzz tests on sample input. | ||
|
||
This isn't meaningful and only checks it doesn't crash. | ||
""" | ||
_fuzz.run(b"") | ||
_fuzz.run(b"\0") | ||
_fuzz.run(b"{") | ||
_fuzz.run(b" ") | ||
_fuzz.run(b"x") | ||
_fuzz.run(b"1") | ||
|
||
if __name__ == "__main__": | ||
support.run_unittest(TestFuzz) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will the usual There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I cribbed this from an existing test, (I assumed CPython had some weird test framework), but I see I must've picked the wrong test to crib from. :) Fixed. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
Fuzz Tests for CPython | ||
====================== | ||
|
||
These fuzz tests are designed to be included in Google's `oss-fuzz`_ project. | ||
|
||
Adding a new fuzz test | ||
---------------------- | ||
|
||
Add the test name on a new line in ``fuzz_tests.txt``. | ||
|
||
In ``fuzzer.c``, add a function to be run:: | ||
|
||
int $test_name (const char* data, size_t size) { | ||
... | ||
return 0; | ||
} | ||
|
||
|
||
And invoke it from ``LLVMFuzzerTestOneInput``:: | ||
|
||
#if _Py_FUZZ_YES(fuzz_builtin_float) | ||
rv |= _run_fuzz(data, size, fuzz_builtin_float); | ||
#endif | ||
|
||
``LLVMFuzzerTestOneInput`` will run in oss-fuzz, with each test in | ||
``fuzz_tests.txt`` run separately. | ||
|
||
What makes a good fuzz test | ||
--------------------------- | ||
|
||
Libraries written in C that might handle untrusted data are worthwhile. The | ||
more complex the logic (e.g. parsing), the more likely this is to be a useful | ||
fuzz test. See the existing examples for reference, and refer to the | ||
`oss-fuzz`_ docs. | ||
|
||
.. _oss-fuzz: https://github.com/google/oss-fuzz |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
#include <Python.h> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need this module? It's only usage seems to be for the tests, and it looks like it has as much code as the fuzzers themselves! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, but this would be like 2 lines in Cython. It's just the typical boilerplate of extension modules, nothing to write home about, right? We don't need it. The thing we need is to make sure that the fuzz tests are maintained and don't break. That is, they need to be in CI for Python, not just run in the fuzz suite in oss-fuzz. For most ways it could break, it would be enough just to build them, but not run them. I think it is a good idea to also run them. Although there's a cost: the macro silliness to make them runnable individually and collectively probably makes it easy to screw up, so I'm not happy about it. (Every fuzz test name is repeated like four times.) The strategy you mentioned above would make it less error prone, at the cost of losing the ability to call them from a unit test like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me what I see in this code is pretty standard CPython API calls. That raises the question of if there is a good reason to check these in and keep them working as part of CPython or if this code should be a stand alone module available on pypi that people could use to launch fuzz tests of any Python interpreter it is built with. The answer to that question likely comes from answering how oss-fuzz the fuzz testing service works. If it monitors our github repo and reruns fuzz tests as things change in cpython, including these here may make sense. I think a comment describing the situation and why this exists here with links to relevant oss-fuzz documentation (either in here or in the _fuzz/README.rst) would help that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a little to README.rst, but to summarize and expand here: You're correct that oss-fuzz regularly pulls from CPython and runs the fuzz tests locally. oss-fuzz prefers to have the tests be a tested part of the upstream project so that they don't break. Certainly not all of these fuzz tests are using the public API (e.g._Py_HashBytes), and there will be more in the future. Those are more likely to they might break in the future -- having them here ensures that breaking them would break the tests, and they will always be good to run, hopefully. We could keep the public-API-using fuzz tests separate in oss-fuzz, but they wouldn't really like that, and that actually splits up the tests in a confusing way -- some would be here, some would be in the oss-fuzz repo. I think they're better off all being here. |
||
#include <stdlib.h> | ||
#include <inttypes.h> | ||
|
||
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); | ||
|
||
static PyObject* _fuzz_run(PyObject* self, PyObject* args) { | ||
const char* buf; | ||
size_t size; | ||
if (!PyArg_ParseTuple(args, "s#", &buf, &size)) { | ||
return NULL; | ||
} | ||
int rv = LLVMFuzzerTestOneInput((const uint8_t*)buf, size); | ||
if (PyErr_Occurred()) { | ||
return NULL; | ||
} | ||
if (rv != 0) { | ||
// Nonzero return codes are reserved for future use. | ||
PyErr_Format( | ||
PyExc_RuntimeError, "Nonzero return code from fuzzer: %d", rv); | ||
return NULL; | ||
} | ||
Py_RETURN_NONE; | ||
} | ||
|
||
static PyMethodDef module_methods[] = { | ||
{"run", (PyCFunction)_fuzz_run, METH_VARARGS, ""}, | ||
{NULL}, | ||
}; | ||
|
||
static struct PyModuleDef _fuzzmodule = { | ||
PyModuleDef_HEAD_INIT, | ||
"_fuzz", | ||
NULL, | ||
0, | ||
module_methods, | ||
NULL, | ||
NULL, | ||
NULL, | ||
NULL | ||
}; | ||
|
||
PyMODINIT_FUNC | ||
PyInit__fuzz(void) | ||
{ | ||
PyObject *m = NULL; | ||
|
||
if ((m = PyModule_Create(&_fuzzmodule)) == NULL) { | ||
return NULL; | ||
} | ||
return m; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
fuzz_builtin_float | ||
fuzz_builtin_int | ||
fuzz_builtin_unicode |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
// A fuzz test for CPython. | ||
// | ||
// Unusually for CPython, this is written in C++ for the benefit of linking with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Unusually"? This would be literally the first C++ code shipped by CPython that ran inside the interpreter. (The existing CPP files shipped by CPython are all for Windows support, and all run outside the interpreter.) It's my understanding that CPython has zero C++ in it very much on purpose. I understand why you use C++ here, but I'd want to get BDFL blessing before I merged this file. Also, a quick perusal of the documentation for libfuzzer suggests a strong dependency on Clang. CPython supports at least two other compilers--gcc and Microsoft Visual C++--and supports at least one major platform where Clang is not a supported compiler (Windows). Can you confirm that the build process and test suite will degrade gracefully when compiling with gcc or MSVC++? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The only reason it even "ships" is because I want to run it in tests. Is there a way to include it in test builds but exclude it from the release, or does it not make much difference?
To be clear: I actually don't understand myself why I use C++ here. Isn't there a way to run the compiler on a C file and link it with something else that uses C++? My experience with running the C/C++ compilers is minimal, and I am pretty sure we can use C here, somehow. I definitely don't want this PR to come across as "we must use C++", but rather -- I did it in C++ to get it working so I could mail something out, but I'd really appreciate learning how to do it in C instead. (Most of my time in this PR was actually spent getting it building with oss-fuzz's environment, sadly.) See commit cb9cdc0 for more details on what fails to build and why.
I can do one better: nothing about this actually depends on libFuzzer, so this should compile and run on every platform (that has C++ support). Of course, adding a dependency on C++ support to pass tests is, per above, probably not acceptable. My hope is we can get it working in C, otherwise I will make the test fail gracefully in the absence of a working C++ compiler. Some background might help: tl;dr is that to support fuzzing / libFuzzer, all you need to do is export a specific function, In other words, to write a fuzzer, you define So the only relationship this code has with libFuzzer is that it exports a function that libFuzzer expects to exist. It doesn't actually depend on libFuzzer, and isn't built against libFuzzer within Python continuous integration. It works on all platforms you'd expect it to -- it's just a C++ file that calls the regular CPython API. It is only built against libFuzzer in the oss-fuzz project, which will actually check out a copy of cpython at the master revision and compile the fuzz tests against libFuzzer, and run them. And to go full circle, this is the core of why it was written in C++ instead of C -- if I compile a C file with "-lfuzzer" it completely blows up because libFuzzer still depends on libstdc++ and I don't know how to compile with C but still link with C++ stuff. (Or something like that. I'm a C++/C noob, I use blaze/bazel for everything.) I hope that gives enough background that this makes a little more sense. And I'll try to write this up a bit in README.rst once I get back to my work computer tomorrow! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a small note, but it's probably insufficient. Fortunately, we do link to the oss-fuzz docs if people are confused. |
||
// libFuzzer. | ||
// | ||
// The only exposed function is LLVMFuzzerTestOneInput, which is called by | ||
// fuzzers and by the _fuzz module for smoke tests. | ||
// | ||
// To build exactly one fuzz test, as when running in oss-fuzz etc., | ||
// build with -D _Py_FUZZ_ONE and -D _Py_FUZZ_<test_name>. e.g. to build | ||
// LLVMFuzzerTestOneInput to only run "fuzz_builtin_float", build this file with | ||
// -D _Py_FUZZ_ONE -D _Py_FUZZ_fuzz_builtin_float. | ||
// | ||
// See the source code for LLVMFuzzerTestOneInput for details. | ||
|
||
#include <Python.h> | ||
#include <stdlib.h> | ||
#include <inttypes.h> | ||
|
||
// Fuzz PyFloat_FromString as a proxy for float(str). | ||
static int fuzz_builtin_float(const char* data, size_t size) { | ||
PyObject* s = PyBytes_FromStringAndSize(data, size); | ||
if (s == NULL) return 0; | ||
PyObject* f = PyFloat_FromString(s); | ||
if (PyErr_Occurred() && PyErr_ExceptionMatches(PyExc_ValueError)) { | ||
PyErr_Clear(); | ||
} | ||
|
||
Py_XDECREF(f); | ||
Py_DECREF(s); | ||
return 0; | ||
} | ||
|
||
// Fuzz PyLong_FromUnicodeObject as a proxy for int(str). | ||
static int fuzz_builtin_int(const char* data, size_t size) { | ||
int base = _Py_HashBytes(data, size) % 36; | ||
if (base == 1) { | ||
base = 0; | ||
} | ||
if (base == -1) { | ||
return 0; // An error occurred, bail early. | ||
} | ||
if (base < 0) { | ||
base = -base; | ||
} | ||
|
||
PyObject* s = PyUnicode_FromStringAndSize(data, size); | ||
if (PyErr_Occurred()) { | ||
if (PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)) { | ||
PyErr_Clear(); | ||
} | ||
return 0; | ||
} | ||
PyObject* l = PyLong_FromUnicodeObject(s, base); | ||
if (PyErr_Occurred() && PyErr_ExceptionMatches(PyExc_ValueError)) { | ||
PyErr_Clear(); | ||
} | ||
PyErr_Clear(); | ||
Py_XDECREF(l); | ||
Py_DECREF(s); | ||
return 0; | ||
} | ||
10000
|
||
// Fuzz PyUnicode_FromStringAndSize as a proxy for unicode(str). | ||
static int fuzz_builtin_unicode(const char* data, size_t size) { | ||
PyObject* s = PyUnicode_FromStringAndSize(data, size); | ||
if (PyErr_Occurred() && PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)) { | ||
PyErr_Clear(); | ||
} | ||
Py_XDECREF(s); | ||
return 0; | ||
} | ||
|
||
// Run fuzzer and abort on failure. | ||
static int _run_fuzz(const uint8_t *data, size_t size, int(*fuzzer)(const char* , size_t)) { | ||
int rv = fuzzer("", 0); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume this should be calling it with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yikes. Worst kind of refactoring error. Thanks for the catch! (Fixed in latest commit.) |
||
if (PyErr_Occurred()) { | ||
// Fuzz tests should handle expected errors for themselves. | ||
PyErr_Print(); | ||
abort(); | ||
} | ||
// Someday the return value might mean something, propagate it. | ||
return rv; | ||
} | ||
|
||
// CPython generates a lot of leak warnings for whatever reason. | ||
extern "C" int __lsan_is_turned_off(void) { return 1; } | ||
|
||
// Fuzz test interface. | ||
// This returns the bitwise or of all fuzz test's return values. | ||
// | ||
// All fuzz tests must return 0, as all nonzero return codes are reserved for | ||
// future use -- we propagate the return values for that future case. | ||
// (And we bitwise or when running multiple tests to verify that normally we | ||
// only return 0.) | ||
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | ||
if (!Py_IsInitialized()) { | ||
// LLVMFuzzerTestOneInput is called repeatedly from the same process, with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK so, for the record: I didn't know about I googled it to see how dumb I was, and truthfully I still can't find where it's documented, but I did find this thread: https://codereview.chromium.org/2900373002 To quote it:
So actually, I think this is reasonable: we would use a static variable to guard the initialization, but actually, it's useful to do the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I didn't even notice the link to docs. http://llvm.org/docs/LibFuzzer.html#startup-initialization So the official word is:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 -- I wonder where we file a bug on libFuzzer to document this, the OpenSSL fuzzers use |
||
// no separate initialization phase, sadly, so we need to initialize CPython | ||
// ourselves on the first run. | ||
Py_InitializeEx(0); | ||
} | ||
|
||
int rv = 0; | ||
|
||
#define _Py_FUZZ_YES(test_name) (defined(_Py_FUZZ_##test_name) || !defined(_Py_FUZZ_ONE)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason not to make these multiple C files that define their There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think I understand the suggestion. Care to elaborate a bit? It might be worth looking at commit d112252 to see how test discovery was when I first hacked something together. I couldn't find any solution I liked, but I'm no C expert. (In Python this would be no sweat!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My suggestion was to have one file per fuzzer, and each of which defines: static int pyfuzz(const char* data, size_t size) {
// ...
}
#include "fuzz_template.c" Where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These won't link together due to ODR violation -- so, the reason I didn't do it that way was so that I could run the code at test-time. (Which leaves your other question below...) |
||
#if _Py_FUZZ_YES(fuzz_builtin_float) | ||
rv |= _run_fuzz(data, size, fuzz_builtin_float); | ||
#endif | ||
#if _Py_FUZZ_YES(fuzz_builtin_int) | ||
rv |= _run_fuzz(data, size, fuzz_builtin_int); | ||
#endif | ||
#if _Py_FUZZ_YES(fuzz_builtin_unicode) | ||
rv |= _run_fuzz(data, size, fuzz_builtin_unicode); | ||
#endif | ||
#undef _Py_FUZZ_YES | ||
return rv; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -715,6 +715,13 @@ def detect_modules(self): | |
# syslog daemon interface | ||
exts.append( Extension('syslog', ['syslogmodule.c']) ) | ||
|
||
# Fuzz tests. | ||
exts.append( Extension( | ||
'_fuzz', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What prevents the _fuzz extension module from being installed and shipped with CPython? It seems like something needed only at build + test time in order to prevent bitrot of code that is primarily used outside of our own test suite. Not something to be distributed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing. I don't think it should be distributed with Python, but I don't know how to stop that while still keeping it built for tests. Any suggestions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure the A test that calling the fuzz function with a trivial input in our own test suite not crashing is not so important as the oss-fuzz project by its very nature is going to be doing that so we'd be notified. That we at least keep the code compiling is pretty good. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, we can do this. It isn't ideal / isn't what oss-fuzz would like us to do. See e.g. this doc: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing AIUI the oss-fuzz project doesn't want to be in the game of reporting broken tests, only of reporting actual security issues. (Unfortunately, I'm already deviating fairly substantially from what's ideal, but I'd like to be as close as possible). (In particular, I think it is adding value: we should separate fuzz tests from unit tests / smoke tests so that reporting is easier and breakages less common.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, talking to @ncoghlan at the dev. sprint this week, we've got other modules that are built and effectively "shipped" but are really used for testing/development only. so this isn't the first, second, or even third such thing. ... let me rename things and look at this in whole again from that point of view. |
||
['_fuzz/_fuzzmodule.c', '_fuzz/fuzzer.cpp'], | ||
optional=True) | ||
) | ||
|
||
# | ||
# Here ends the simple stuff. From here on, modules need certain | ||
# libraries, are platform-specific, or present other surprises. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Deleted support import.)