8000 bpo-36876: Add a tool that identifies unsupported global C variables. by ericsnowcurrently · Pull Request #15877 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-36876: Add a tool that identifies unsupported global C variables. #15877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 118 commits into from
Sep 11, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
9cdba01
Bootstrap the "c-statics" script under the test suite.
ericsnowcurrently Jun 21, 2019
cfdf57f
Stub out the "check" and "show" commands.
ericsnowcurrently Jun 22, 2019
716b13d
Add a README.
ericsnowcurrently Jun 22, 2019
ff1b447
Fix a filename.
ericsnowcurrently Jun 22, 2019
7e7d52d
Run the full check as part of the test suite.
ericsnowcurrently Jun 22, 2019
74d03d4
Frame out the functional tests.
ericsnowcurrently Jun 22, 2019
34d2c7c
Ignore the argparse output in tests.
ericsnowcurrently Jun 22, 2019
e8e2227
Move relevant tests to test_cg/.
ericsnowcurrently Jun 22, 2019
69541bf
Add StaticVar.
ericsnowcurrently Jun 22, 2019
cab1fc8
Add a fake find.statics().
ericsnowcurrently Jun 22, 2019
4bf5299
Add show.basic().
ericsnowcurrently Jun 22, 2019
c74360f
Implement cmd_show().
ericsnowcurrently Jun 22, 2019
b94833d
Add a note about the normalized vartype format.
ericsnowcurrently Jun 22, 2019
6fecaad
Do not run tests for find.statics() yet.
ericsnowcurrently Jun 22, 2019
8a53e69
Implement cmd_check().
ericsnowcurrently Jun 22, 2019
98d3737
Add scan.iter_statics().
ericsnowcurrently Jun 22, 2019
28e0f99
Add supported.is_supported().
ericsnowcurrently Jun 22, 2019
1d6696c
Add find.statics().
ericsnowcurrently Jun 22, 2019
7e09aa4
Sort the output.
ericsnowcurrently Jun 23, 2019
265098d
Implement scan.iter_statics().
ericsnowcurrently Jun 23, 2019
ac70e2c
Add cg.files.iter_files().
ericsnowcurrently Jun 29, 2019
587907c
Add cg.info.Symbol.
ericsnowcurrently Jun 29, 2019
6ff136f
Properly handle local variables.
ericsnowcurrently Jun 29, 2019
9faeba9
Implement cg.parse.iter_variables().
ericsnowcurrently Jun 29, 2019
ae5c872
Start the implementation for iter_statements().
ericsnowcurrently Jul 1, 2019
0687e09
Deal with comments.
ericsnowcurrently Jul 1, 2019
a50aa32
Handle functions in iter_global_declarations().
ericsnowcurrently Jul 2, 2019
044a1f4
Implement basic variable support.
ericsnowcurrently Jul 2, 2019
c7833c1
Implement parse_func().
ericsnowcurrently Jul 2, 2019
2c164cd
Implement parse_var().
ericsnowcurrently Jul 2, 2019
84cd77e
Ignore files in Include/cpython.
ericsnowcurrently Jul 4, 2019
dc471a3
Rename the c-globals tool directory to be more generic.
ericsnowcurrently Jul 5, 2019
a9c27c5
Add imports_under_tool() CM.
ericsnowcurrently Jul 5, 2019
60dee31
Move the code under Tools (and the tests under test_tools).
ericsnowcurrently Jul 5, 2019
c0f1313
Split up the files properly.
ericsnowcurrently Jul 5, 2019
5410467
Factor out c_parser/{source,util}.py and c_symbols/local.py.
ericsnowcurrently Jul 5, 2019
caac1a9
Factor out preprocessor.py.
ericsnowcurrently Jul 5, 2019
928cc96
Factor out _NTBase.
ericsnowcurrently Jul 5, 2019
baf2bce
Add the wrapped_arg_combos() testing helper.
ericsnowcurrently Jul 10, 2019
57ec889
Add preprocessor.iter_lines().
ericsnowcurrently Jul 26, 2019
400b388
Remove line continuations and comments in preprocessor.iter_lines().
ericsnowcurrently Jul 27, 2019
13eb7b1
Fix minor formatting.
ericsnowcurrently Aug 2, 2019
182e3f9
Handle the "ignored" and "known" args to statics().
ericsnowcurrently Aug 2, 2019
ec88a4c
Factor out statics_from_symbols().
ericsnowcurrently Aug 2, 2019
a269b15
StaticVar -> Variable.
ericsnowcurrently Aug 2, 2019
842ef7c
Add statics_from_declarations().
ericsnowcurrently Aug 2, 2019
853768a
Track the per-variable preprocessor conditions.
ericsnowcurrently Aug 2, 2019
5f1b8b9
Add util.Slot (a descriptor).
ericsnowcurrently Aug 23, 2019
b850c05
Add util.classonly (a la classmethod).
ericsnowcurrently Aug 23, 2019
cde5b32
Add util._NOT_SET.
ericsnowcurrently Aug 23, 2019
11377dd
Add _NTBase.from_raw().
ericsnowcurrently Aug 23, 2019
acbedc7
Add info.ID.
ericsnowcurrently Aug 23, 2019
08310bf
Use ID in Symbol.
ericsnowcurrently Aug 23, 2019
1d5e9bf
Use ID in Variable.
ericsnowcurrently Aug 23, 2019
0c56fd9
Add a note about a "conditions" slot for info.ID.
ericsnowcurrently Aug 23, 2019
9563689
Require Variable to have filename set.
ericsnowcurrently Aug 23, 2019 8000
7ad89b2
"???" means "unknown".
ericsnowcurrently Aug 23, 2019
2b2bc66
Make find_local_symbol() a "public" argument.
ericsnowcurrently Aug 24, 2019
1316a3b
Add info.ID.islocal.
ericsnowcurrently Aug 24, 2019
4bdd9a6
Factor out c_analyzer_common package.
ericsnowcurrently Aug 24, 2019
5d52d9f
Move info.Symbol under c_symbols.
ericsnowcurrently Aug 24, 2019
d7d0d91
Do not allow Variable.id to have UNKNOWN in it.
ericsnowcurrently Aug 24, 2019
f662b63
Cache Variable.isstatic.
ericsnowcurrently Aug 24, 2019
e963103
Move known_from_file() to common/known.py.
ericsnowcurrently Aug 26, 2019
2cf6d6a
Add look_up_known_symbol() and symbols_to_variables().
ericsnowcurrently Aug 26, 2019
3d80342
Move files.py under common.
ericsnowcurrently Aug 26, 2019
d58c7c4
known_from_file() -> known.from_file().
ericsnowcurrently Aug 26, 2019
2fca1e7
Make dirnames optional.
ericsnowcurrently Aug 26, 2019
1e34a1a
Drop c_statics.scan.
ericsnowcurrently Aug 26, 2019
3f2e4f6
Implement known.from_file() and ignored_from_file().
ericsnowcurrently Aug 26, 2019
3e1657a
Fix REPO_ROOT.
ericsnowcurrently Aug 27, 2019
a2779b4
Various minor fixes to get to stable.
ericsnowcurrently Aug 27, 2019
89b887a
Add util.read_tsv() and util.write_tsv().
ericsnowcurrently Aug 27, 2019
89a02c3
Treat "-" in .tsv file as None.
ericsnowcurrently Aug 27, 2019
697ab59
Fix the statics_from_binary() tests.
ericsnowcurrently Aug 27, 2019
66a69c4
Add code to generate known.tsv.
ericsnowcurrently Aug 27, 2019
411aadd
Fix a typo in get_resolver().
ericsnowcurrently Aug 27, 2019
3bfb345
Fix a typo in _find_statics().
ericsnowcurrently Aug 27, 2019
9d731e7
Implement is_supported() (first pass).
ericsnowcurrently Aug 27, 2019
021be3c
Show the vartype in the basic output format.
ericsnowcurrently Aug 27, 2019
f620135
Use the underlying ID for the hash of Symbol and Variable.
ericsnowcurrently Aug 28, 2019
e7a029d
Move constants out of c_statics.__init__.
ericsnowcurrently Aug 28, 2019
b99c9f2
Fix a typo.
ericsnowcurrently Aug 28, 2019
3aeb32d
Clean up iter_files().
ericsnowcurrently Aug 28, 2019
8a97e50
Fix typos.
ericsnowcurrently Aug 31, 2019
51d5a19
Minor fix to find/resolve.
ericsnowcurrently Sep 2, 2019
19b9f97
Include ID in error message.
ericsnowcurrently Sep 2, 2019
a9c68c5
Add some "naive" parsing tools.
ericsnowcurrently Sep 2, 2019
2af6781
Expand the capability of the "known" generator.
ericsnowcurrently Sep 2, 2019
aa8495a
Update the "known" variables with generated values.
ericsnowcurrently Sep 2, 2019
921a465
Consider all known variables as static.
ericsnowcurrently Sep 2, 2019
48b2460
Distinguish "static" vars in output.
ericsnowcurrently Sep 3, 2019
c0a631b
Special-case variables named "id".
ericsnowcurrently Sep 6, 2019
3989a69
Fix a test.
ericsnowcurrently Sep 6, 2019
8118454
Print totals.
ericsnowcurrently Sep 6, 2019
cae5e3a
Fail if we couldn't find any of the symbols.
ericsnowcurrently Sep 6, 2019
87ebe87
Factor out _check_results().
ericsnowcurrently Sep 8, 2019
91e4fb1
Fill in gaps in known.tsv.
ericsnowcurrently Sep 8, 2019 8000
21c4175
Mark _Py_IDENTIFIER() as unsupported.
ericsnowcurrently Sep 8, 2019
a245849
Keep "static" in output.
ericsnowcurrently Sep 8, 2019
1437d56
Fix a typo in __main__.py.
ericsnowcurrently Sep 8, 2019
7feea9d
Honor provided dirnames.
ericsnowcurrently Sep 8, 2019
ae23801
Fix tests.
ericsnowcurrently Sep 8, 2019
26aaccb
Deal with default dirnames properly.
ericsnowcurrently Sep 8, 2019
4aaad5f
Supporting hiding objects in output.
ericsnowcurrently Sep 8, 2019
3d36cfc
Recognize more object types.
ericsnowcurrently Sep 8, 2019
b7420d5
Ignore known non-statics.
ericsnowcurrently Sep 8, 2019
da151a7
Always support "static const" (non-object) variables.
ericsnowcurrently Sep 9, 2019
6470eb1
"statics" -> "globals".
ericsnowcurrently Sep 9, 2019
1233cf9
Maybe limit the variables in known.tsv.
ericsnowcurrently Sep 10, 2019
fc6c97e
Generate the ignored.tsv file.
ericsnowcurrently Sep 10, 2019
6f8a223
Ignore variables with benign races.
ericsnowcurrently Sep 10, 2019
6c1db44
Mark more variables as PyObject.
ericsnowcurrently Sep 10, 2019
5888d48
Consider private, non-static globals.
ericsnowcurrently Sep 11, 2019
cdeb1d9
Update ignored global variables.
ericsnowcurrently Sep 11, 2019
569c57e
Skip the check where "nm" isn't available.
ericsnowcurrently Sep 11, 2019
7b0745f
Ignore REPL-related variables.
ericsnowcurrently Sep 12, 2019
b5dd31b
Fix whitespace.
ericsnowcurrently Sep 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Generate the ignored.tsv file.
  • Loading branch information
ericsnowcurrently committed Sep 11, 2019
commit fc6c97ec9955794586978bbb0f4e0e23ecf0ca1d
18 changes: 7 additions & 11 deletions Tools/c-analyzer/c_analyzer_common/_generate.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# The code here consists of hacks for pre-populating the known.tsv file.

import contextlib
import glob
import os.path
import re

from c_parser.preprocessor import _iter_clean_lines
from c_parser.naive import (
iter_variables, parse_variable_declaration, find_variables,
Expand All @@ -14,7 +9,7 @@
from . import SOURCE_DIRS, REPO_ROOT
from .known import DATA_FILE as KNOWN_FILE, HEADER as KNOWN_HEADER
from .info import UNKNOWN, ID
from .util import run_cmd, write_tsv
from .util import write_tsv
from .files import iter_cpython_files


Expand Down Expand Up @@ -311,14 +306,15 @@ def known_rows(symbols, *,
yield _as_known(variable.id, variable.vartype)


def known_file(symbols, filename=None, *,
_generate_rows=known_rows,
):
def generate(symbols, filename=None, *,
_generate_rows=known_rows,
_write_tsv=write_tsv,
):
if not filename:
filename = KNOWN_FILE + '.new'

rows = _generate_rows(symbols)
write_tsv(filename, KNOWN_HEADER, rows)
_write_tsv(filename, KNOWN_HEADER, rows)


if __name__ == '__main__':
Expand All @@ -327,4 +323,4 @@ def known_file(symbols, filename=None, *,
binary.PYTHON,
find_local_symbol=None,
)
known_file(symbols)
generate(symbols)
2 changes: 1 addition & 1 deletion Tools/c-analyzer/c_globals/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def _find_globals(dirnames, known, ignored):
unknown = set()
knownvars = (known or {}).get('variables')
for variable in find.globals_from_binary(knownvars=knownvars,
dirnames=dirnames):
dirnames=dirnames):
#for variable in find.globals(dirnames, known, kind='platform'):
if variable.vartype == UNKNOWN:
unknown.add(variable)
Expand Down
205 changes: 135 additions & 70 deletions Tools/c-analyzer/c_globals/supported.py
Original file line number Diff line number Diff line change
@@ -1,133 +1,147 @@
import csv
import os.path
import re

from c_analyzer_common import DATA_DIR
from c_analyzer_common.info import ID
from c_analyzer_common.util import read_tsv
from c_analyzer_common.util import read_tsv, write_tsv


def is_supported(variable, ignored=None, known=None):
"""Return True if the given global variable is okay in CPython."""
if _is_ignored(variable, ignored and ignored.get('variables')):
return True
elif _is_vartype_okay(variable.vartype, ignored.get('types')):
return True
else:
return False
IGNORED_FILE = os.path.join(DATA_DIR, 'ignored.tsv')

IGNORED_COLUMNS = ('filename', 'funcname', 'name', 'kind', 'reason')
IGNORED_HEADER = '\t'.join(IGNORED_COLUMNS)

# XXX Move these to ignored.tsv.
IGNORED = {
# global
'PyImport_FrozenModules',
'M___hello__',
'inittab_copy',
'PyHash_Func',
'_Py_HashSecret_Initialized',
'_TARGET_LOCALES',
'runtime_initialized',
'PyImport_FrozenModules': 'process-global',
'M___hello__': 'process-global',
'inittab_copy': 'process-global',
'PyHash_Func': 'process-global',
'_Py_HashSecret_Initialized': 'process-global',
'_TARGET_LOCALES': 'process-global',

# startup
'static_arg_parsers',
'orig_argv',
'opt_ptr',
'_preinit_warnoptions',
'_Py_StandardStreamEncoding',
'_Py_StandardStreamErrors',
'runtime_initialized': 'runtime startup',
'static_arg_parsers': 'runtime startup',
'orig_argv': 'runtime startup',
'opt_ptr': 'runtime startup',
'_preinit_warnoptions': 'runtime startup',
'_Py_StandardStreamEncoding': 'runtime startup',
'_Py_StandardStreamErrors': 'runtime startup',

# should be const
'tracemalloc_empty_traceback',
'_empty_bitmap_node',
'posix_constants_pathconf',
'posix_constants_confstr',
'posix_constants_sysconf',
'tracemalloc_empty_traceback': 'const',
'_empty_bitmap_node': 'const',
'posix_constants_pathconf': 'const',
'posix_constants_confstr': 'const',
'posix_constants_sysconf': 'const',

# signals are main-thread only
'faulthandler_handlers',
'user_signals',
'faulthandler_handlers': 'signals are main-thread only',
'user_signals': 'signals are main-thread only',
}


def _is_ignored(variable, ignoredvars=None):
if variable.name in IGNORED:
def is_supported(variable, ignored=None, known=None, *,
_ignored=(lambda *a, **k: _is_ignored(*a, **k)),
_vartype_okay=(lambda *a, **k: _is_vartype_okay(*a, **k)),
):
"""Return True if the given global variable is okay in CPython."""
if _ignored(variable,
ignored and ignored.get('variables')):
return True

if ignoredvars and variable.id in ignoredvars:
elif _vartype_okay(variable.vartype,
ignored.get('types')):
return True
else:
return False


def _is_ignored(variable, ignoredvars=None, *,
_IGNORED=IGNORED,
):
"""Return the reason if the variable is a supported global.

Return None if the variable is not a supported global.
"""
if ignoredvars and (reason := ignoredvars.get(variable.id)):
return reason

if variable.funcname is None:
if reason := _IGNORED.get(variable.name):
return reason

# compiler
if variable.filename == 'Python/graminit.c':
if variable.vartype.startswith('static state '):
return True
return 'compiler'
if variable.filename == 'Python/symtable.c':
if variable.vartype.startswith('static identifier '):
return True
return 'compiler'
if variable.filename == 'Python/Python-ast.c':
# These should be const.
if variable.name.endswith('_field'):
return True
return 'compiler'
if variable.name.endswith('_attribute'):
return True
return 'compiler'

# other
if variable.filename == 'Python/dtoa.c':
# guarded by lock?
if variable.name in ('p5s', 'freelist'):
return True
return 'dtoa is thread-safe?'
if variable.name in ('private_mem', 'pmem_next'):
return True
return 'dtoa is thread-safe?'

return False
return None


def _is_vartype_okay(vartype, ignoredtypes=None):
if _is_object(vartype):
return False
return None

if vartype.startswith('static const '):
return True
return 'const'

# components for TypeObject definitions
for name in ('PyMethodDef', 'PyGetSetDef', 'PyMemberDef'):
if name in vartype:
return True
return 'const'
for name in ('PyNumberMethods', 'PySequenceMethods', 'PyMappingMethods',
'PyBufferProcs', 'PyAsyncMethods'):
if name in vartype:
return True
return 'const'
for name in ('slotdef', 'newfunc'):
if name in vartype:
return True
return 'const'

# structseq
for name in ('PyStructSequence_Desc', 'PyStructSequence_Field'):
if name in vartype:
return True
return 'const'

# other definiitions
if 'PyModuleDef' in vartype:
return True
return 'const'

# thread-safe
if '_Py_atomic_int' in vartype:
return True
return 'thread-safe'
if 'pthread_condattr_t' in vartype:
return True
return 'thread-safe'

# startup
if '_Py_PreInitEntry' in va 10000 rtype:
return True
return 'startup'

# global
if 'PyMemAllocatorEx' in vartype:
return True
# if 'PyMemAllocatorEx' in vartype:
# return True

# others
if 'PyThread_type_lock' in vartype:
return True
#if '_Py_hashtable_t' in vartype:
# return True # ???
# if 'PyThread_type_lock' in vartype:
# return True

# XXX ???
# _Py_tss_t
Expand All @@ -137,12 +151,12 @@ def _is_vartype_okay(vartype, ignoredtypes=None):

# functions
if '(' in vartype and '[' not in vartype:
return True
return 'function pointer'

# XXX finish!
# * allow const values?
#raise NotImplementedError
return False
return None


def _is_object(vartype):
Expand Down Expand Up @@ -172,26 +186,17 @@ def _is_object(vartype):
return False


#############################
# ignored

IGNORED_FILE = os.path.join(DATA_DIR, 'ignored.tsv')

COLUMNS = ('filename', 'funcname', 'name', 'kind', 'reason')
HEADER = '\t'.join(COLUMNS)


def ignored_from_file(infile, *,
_read_tsv=read_tsv,
):
"""Yield StaticVar for each ignored var in the file."""
"""Yield a Variable for each ignored var in the file."""
ignored = {
'variables': {},
#'types': {},
#'constants': {},
#'macros': {},
}
for row in _read_tsv(infile, HEADER):
for row in _read_tsv(infile, IGNORED_HEADER):
filename, funcname, name, kind, reason = row
if not funcname or funcname == '-':
funcname = None
Expand All @@ -202,3 +207,63 @@ def ignored_from_file(infile, *,
raise ValueError(f'unsupported kind in row {row}')
values[id] = reason
return ignored


##################################
# generate

def _get_row(varid, reason):
return (
varid.filename,
varid.funcname or '-',
varid.name,
'variable',
str(reason),
)


def _get_rows(variables, ignored=None, *,
_as_row=_get_row,
_is_ignored=_is_ignored,
_vartype_okay=_is_vartype_okay,
):
count = 0
for variable in variables:
reason = _is_ignored(variable,
ignored and ignored.get('variables'),
)
if not reason:
reason = _vartype_okay(variable.vartype,
ignored and ignored.get('types'))
if not reason:
continue

print(' ', variable, repr(reason))
yield _as_row(variable.id, reason)
count += 1
print(f'total: {count}')


def _generate_ignored_file(variables, filename=None, *,
_generate_rows=_get_rows,
_write_tsv=write_tsv,
):
if not filename:
filename = IGNORED_FILE + '.new'
rows = _generate_rows(variables)
_write_tsv(filename, IGNORED_HEADER, rows)


if __name__ == '__main__':
from c_analyzer_common import SOURCE_DIRS
from c_analyzer_common.known import (
from_file as known_from_file,
DATA_FILE as KNOWN_FILE,
)
from . import find
known = known_from_file(KNOWN_FILE)
knownvars = (known or {}).get('variables')
variables = find.globals_from_binary(knownvars=knownvars,
dirnames=SOURCE_DIRS)

_generate_ignored_file(variables)
0