-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
WIP, TST: add static analysis to CI #12295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The static analysis complains about SWIG a lot -- is that still being used? If so, tests needed? |
@@ -247,7 +247,6 @@ arr__monotonicity(PyObject *NPY_UNUSED(self), PyObject *args, PyObject *kwds) | |||
monotonic = check_array_monotonic( | |||
(const double *)PyArray_DATA(arr_x), len_x); | |||
NPY_END_THREADS | |||
Py_DECREF(arr_x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was flagged, but maybe incorrectly.
@@ -258,7 +258,7 @@ NPY_TITLE_KEY_check(PyObject *key, PyObject *value) | |||
} | |||
|
|||
/* Macro, for backward compat with "if NPY_TITLE_KEY(key, value) { ..." */ | |||
#define NPY_TITLE_KEY(key, value) (NPY_TITLE_KEY_check((key), (value))) | |||
#define NPY_TITLE_KEY(key, value) NPY_TITLE_KEY_check((key), (value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the backwards compat that is the reason for this existing in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in users or downstream projects needs to be able to use the macro without parentheses in the C language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean the |
Yeah @stefanv was asking about that |
It is still being used and maintained, there have been six commits so far this year. |
numpy/core/src/multiarray/alloc.c
Outdated
_PyDataMem_eventhook_user_data); | ||
} | ||
(*_PyDataMem_eventhook)(NULL, result, size, | ||
_PyDataMem_eventhook_user_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't safe. This is https://en.wikipedia.org/wiki/Double-checked_locking, because NPY_ALLOW_C_API
is essentially PyGILState_Ensure()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, never heard of it -- the linked article isn't exactly a big endorsement though:
The pattern, when implemented in some language/hardware combinations, can be unsafe. At times, it can be considered an anti-pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I can find arguing against it is that it's not suitable for safe static initialization. We're using it in the following context though:
// only acquire the GIL if we need it
if (_python_callback) {
acquire_gil();
// check that the callback was not deregistered while waiting for the GIL
if (_python_callback) {
_python_callback()
}
release_gil();
}
If we wanted to drop the idiom, it would become the slower
acquire_gil();
// check that the callback was not deregistered while waiting for the GIL
if (_python_callback) {
_python_callback()
}
release_gil();
Removing the outer if
is a performance hit, removing the inner if
is completely unsafe.
numpy/linalg/lapack_lite/f2c_blas.c
Outdated
@@ -380,7 +380,6 @@ static doublecomplex c_b1078 = {1.,0.}; | |||
static logical nota, notb; | |||
static complex temp; | |||
static logical conja, conjb; | |||
static integer ncola; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated code - it doesn't make any sense to:
- Run the static analyzer against it
- Change it based on the results of the analyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only fix here that we w 8000 ant to keep is:
-if NPY_TITLE_KEY(key, value) {
+if (NPY_TITLE_KEY(key, value)) {
All the other fixes seem harmful to varying degrees.
azure-pipelines.yml
Outdated
@@ -61,6 +61,44 @@ jobs: | |||
inputs: | |||
testResultsFiles: '**/test-*.xml' | |||
testRunTitle: 'Publish test results for Python $(python.version)' | |||
- job: static_analysis | |||
pool: | |||
vmIMage: macOS-10.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for this being macOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing locally on that platform. But yeah, I share a lot of the concerns noted here. I guess we could move to linux if we want, although maybe better to prioritize showing that this can actually do something useful first.
azure-pipelines.yml
Outdated
displayName: 'run cppcheck static analyzer' | ||
# start by checking for bugs flagged by cppcheck with (error), but | ||
# ignore the large number of "order of evaluation of side effects" issues | ||
# detected in lapack_lite (NOTE: are those real issues?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of this? It's very plausible that f2c
produces bad output, and also very plausible that the original fortran (which is a fairly old version of lapack) has the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[numpy/linalg/lapack_lite/f2c_c_lapack.c:24393]: (error) Expression 'r__1=d__[finish],(double)((r__1)>=0?r__1:-(r__1))' depends on order of evaluation of side effects
(error)
should mean bug
The full raw cppcheck
output was linked in the original PR message
# ignore the large number of "order of evaluation of side effects" issues | ||
# detected in lapack_lite (NOTE: are those real issues?) | ||
# we also ignore "Shifting signed 32-bit value by 31 bits is undefined behaviour" | ||
# based on comment in arrayobject.h, although this is probably bad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use // cppcheck-suppress
for this instead?
I think it might make more sense longer term to parse the xml output format |
I've labelled this with a few "needs work"-style tags. The changes were required for the CI to pass so the grepping needs to be adjusted accordingly when most of the changes are reversed. Maybe there are better things to target in the static analysis, or maybe there just isn't that much to be gained by it anyway given the complication of the Python C API that the check doesn't know much / anything about. |
Note that LGTM already provides some very basic static analysis: https://lgtm.com/projects/g/numpy/numpy/alerts/?mode=tree&lang=cpp |
* add open source cppcheck analysis to Azure CI; the results of the static analysis are parsed using grep with success based on return codes indicating absence of various issues in the code base * there is still room to expand the issues we parse for--initially limited in scope to a few select static analysis issues, for which the code has been adjusted accordingly in this PR
566b080
to
940ac36
Compare
Major revisions based on feedback. No longer needs to add a new CI job since it runs faster with various path exclusions. Could still do with some more compelling identification of issues though--I spent most of my time adding things we want to ignore to the suppressions file. |
In that case, should we add some testing for it, in the form of building a simple SWIG-wrapped extension? |
That's very low on the prio list I'd say. Given the very low rate of change, testing changes by hand works fine. We average 1-2 commits on numpy.i per year (excluding typos/whitespace stuff). |
The analyzer found a case where |
This PR of mine was basically a disaster. I'd like to see static analysis working in a more helpful way, but will have to get back to that with a well thought out approach. So, I'll close this. |
Adds the open source cppcheck static code analyzer to an Azure CI job & parses a subset of the issues that can be detected for now using
grep
exit codes for success, a bit like our compiler warning checks.This PR also includes the attempted fixes to satisfy the above subset of the static analysis checks, and the raw static analysis log file contains more issues with tags in parentheses like
(performance)
,(portability)
and so on.The full static analysis job is a few minutes faster than the full test suite jobs, with
cppcheck
itself requiring around 6 minutes to recursively probe our codebase with all flags active.There are some corners of our codebase like
lapack_lite
,swig
, andrandom
for which I've intentionally avoided diving in too much, at least for now, despite the static analyzer going nuts on those.