8000 Code duplication between stat.py and _stat.c · Issue #114081 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

Code duplication between stat.py and _stat.c #114081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ronaldoussoren opened this issue Jan 15, 2024 · 5 comments
Open

Code duplication between stat.py and _stat.c #114081

ronaldoussoren opened this issue Jan 15, 2024 · 5 comments
Labels
stdlib Python modules in the Lib dir

Comments

@ronaldoussoren
Copy link
Contributor

There's code duplication between stat.py and _stat.c and a large subset of that is in definitions that are at least in theory platform dependent.

The C file also contains a lot a local fallback definitions of constants, like:

#ifndef S_IROTH
#  define S_IROTH 00004
#endif

Some of these are not really useful, e.g.:

#ifndef S_IFDOOR
#  define S_IFDOOR 0
#endif

We don't do this in other modules, and these fallback definition make it harder to determine which constants are relevant for the current platform.

Proposal:

  1. Remove the definition of platform constants from stat.py
  2. Remove fallback definitions in _stat.c and export only those names that are actually available on platforms
  3. Add explicit tests for those constants that we require to be available on all platforms

The 2nd bullet might break backward compatibility by removing flags on platforms that don't have them (in particular flags like UF_NODUMP that are likely only available on BSDs).

@ronaldoussoren ronaldoussoren added the stdlib Python modules in the Lib dir label Jan 15, 2024
@terryjreedy
Copy link
Member
terryjreedy commented Jan 15, 2024

Perhaps the duplication is intentional, so that a stat module is available even if _stat is not. Even if _stat is always available for CPython, the duplication might be present to cater to non-C implementations. I don't know if the latter is both the reason for duplication and sensible. If so, I think there should be a note saying so. If the duplication is needed, it could be put in the import else clause instead of pass. (EDIT: This can be tricky unless test_coverage is 100%, so the performance gain may not be worth the trouble.)

I would consider separate PRs for handling duplication between the files and fallbacks within _stat, as the back compatibility considerations are different for the two types of changes.

@terryjreedy
Copy link
Member

'Perhaps' is correct; test.test_stat tests everything both with and without _stat.

c_stat = import_fresh_module('stat', fresh=['_stat'])
py_stat = import_fresh_module('stat', blocked=['_stat'])

class TestFilemode:
...
@unittest.skipIf(c_stat is None, 'need _stat extension')
class TestFilemodeCStat(TestFilemode, unittest.TestCase):
    statmod = c_stat

class TestFilemodePyStat(TestFilemode, unittest.TestCase):
    statmod = py_stat

@ronaldoussoren
Copy link
Contributor Author

Both implementations are tested, that doesn't necessarily mean that the split is well thought out ;-).

A large portion of the duplication is in constant definitions (e.g. UF_SETTABLE). Some of these have a prescribed value according to standards (e.g. S_IRUSR must be 0o400 according to POSIX), but others like the UF_ and SF_ flags are platform dependent although the values are remarkably constant in practice.

To match the implementation pattern for other modules binding existing C APIs we should IMHO drop the python implementation and only keep the C one. That requires discussion outside of the tracker though, and possibly even a PEP. That will have to wait a bit, I don't have the bandwidth for this at this time.

I would consider separate PRs for handling duplication between the files and fallbacks within _stat, as the back compatibility considerations are different for the two types of changes.

Definitely.

@serhiy-storchaka
Copy link
Member

Option 2 is what usually used in other modules. For end user there should not be difference if they do not use _stat module directly. If the constant is available, they'll get its value indirectly imported from _stat module, if it is not available, they'll get the fallback value from stat.py instead of stat.c.

The only difference is when _stat module is used directly instead of stat module. stat.py is relatively lightweight, so I do not see reasons to do this. Even Lib/importlib/_bootstrap_external.py does not use _stat module, it just hardcodes some constants.

@hauntsaninja
Copy link
Contributor

@ronaldoussoren specific to #113667 , I noticed that SF_RESTRICTED is only in Python stat and not in C _stat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

4 participants
0