-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Rewrite setup.py files to handle cython dependencies #7719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Rewrite setup.py files to handle cython dependencies #7719
Conversation
+10000 Thanks heaps for working on this... |
2f13e88
to
ed6b671
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few inline comments to help the review
is_cpp_filename = filename.endswith('.cpp') | ||
|
||
# files in src are .c and .cpp files that are not cython-generated | ||
if 'src/' in filename and (is_c_filename or is_cpp_filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most brittle part of the PR I think. I don't know whether there is a better way of doing it, suggestions welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one possible improvement would be to add an additional argument in add_cython_extension like cython_generated_sources
so that we can discriminate more easily betwen the .c files coming from cython and the ones in the repo.
sources = [get_cython_source(filename) for filename in sources] | ||
config.add_extension(name, sources, **kwargs) | ||
|
||
config.ext_modules = cythonize(config.ext_modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I need to double-check this, maybe it should be inside an if is_dev_version
raise ValueError('Please install cython in order ' | ||
'to build a scikit-learn development version') | ||
|
||
sources = [get_cython_source(filename) for filename in sources] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that switches between using .c{,pp} files when building a release and the .pyx when building a dev version.
By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account.
ed6b671
to
1eade7d
Compare
Just to note that sometimes there are inter module dependencies that needs to be recythonized too... For instance any change in the tree's cython sources needs a recythonizing of the gradient boosting... Does this PR handle that? |
It seems like it does:
Excerpt of the output:
|
Amazing |
This is nice ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few of my comments that got folded in outdated diffs.
is_cpp_filename = filename.endswith('.cpp') | ||
|
||
# files in src are .c and .cpp files that are not cython-generated | ||
if ('src' + os.sep) in filename and (is_c_filename or is_cpp_filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most brittle part of the PR I think. I don't know whether 8000 there is a better way of doing it, suggestions welcome.
I guess one possible improvement would be to add an additional argument in add_cython_extension
like cython_generated_sources
so that we can discriminate more easily betwen the .c files coming from cython and the ones in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine I think. Optionally you could just hardcode a list with all the non cython source paths...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking if there exists a pyx
file with the same name in the same folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a simple way of doing it indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could add .pyx
extension when it's a cython file, and add a function like:
import os.path
def no_cythonize(extensions, **_ignore):
for extension in extensions:
sources = []
for sfile in extension.sources:
path, ext = os.path.splitext(sfile)
if ext in ('.pyx', '.py'):
if extension.language == 'c++':
ext = '.cpp'
else:
ext = '.c'
sfile = path + ext
sources.append(sfile)
extension.sources[:] = sources
return extensions
(copy-paste from cython doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense this way around indeed. The .pyx would be mentioned in the setup.py and you would do the switching to .c or .cpp only for releases. I have to say I didn't fully understand the no_cythonize function when I read the doc but it makes more sense now (in particular the naming is confusing because it's not really related to Cython.Build.cythonize).
is_dev_version = not os.path.exists(os.path.join(top_path, 'PKG-INFO')) | ||
|
||
if is_dev_version: | ||
sources = [get_cython_source(filename) for filename in sources] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that switches between using .c{,pp} files when building a release and the .pyx when building a dev version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good if it works as expected. Also, I feel like this shouldn't live in scikit-learn alone. I feel like at least numpy, pandas and scikit-image need the same. Shouldn't we share a solution?
is_cpp_filename = filename.endswith('.cpp') | ||
|
||
# files in src are .c and .cpp files that are not cython-generated | ||
if ('src' + os.sep) in filename and (is_c_filename or is_cpp_filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking if there exists a pyx
file with the same name in the same folder?
raise ValueError('Please install cython in order ' | ||
'to build a scikit-learn development version') | ||
|
||
config.ext_modules = cythonize(config.ext_modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that takes care of the caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://cython.readthedocs.io/en/latest/src/reference/compilation.html#compiling-with-distutils:
The cythonize command also allows for multi-threaded compilation and dependency resolution. Recompilation will be skipped if the target file is up to date with its main source file and dependencies.
It does work as expected in the few tests I run like touching sklearn/tree/_criterion.pxd and making sure that all the dependents are recompiled (see #7719 (comment)). If you have more complicated scenarios that could go wrong I'd be happy to try them out.
Maybe we could have this in scikit-learn for some time so that it gets stress-tested a bit before consolidating across multiple projects? |
Yes, but we could also try reaching out to others and see what their approaches and requirements are? |
Why WIP? |
+:100: |
As I said in the issue description:
No one complained about the Travis cache being useless as a side effect of this PR so I just pushed a commit that removes the build products from theTravis cache and changed the status to MRG. |
Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis.
23ee373
to
7c731a6
Compare
More natural this way. Tweak the extensions to generate from .c and .cpp files for a release.
043f6d6
to
69c5ae1
Compare
+1. And then contribute it to Cython, IMHO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not tested that this works. Otherwise, it looks good to me...
@@ -1,21 +1,18 @@ | |||
import numpy | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove added newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly flake8_diff.sh is not able to spot these. Quickly looking at it this is because we do the diff without context (--unified=0
) in order to avoid noise coming from lines that the PR hasn't touched.
try: | ||
from Cython.Build import cythonize | ||
except ImportError: | ||
raise ValueError('Please install cython in order ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we keep having issues related to cython versioning, I think we should be checking for or specifying version.
extension.sources = sources | ||
|
||
|
||
def tweak_extensions(top_path, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this something less opaque: conditional_cythonize
or cythonize_if_development
?
@@ -33,3 +35,38 @@ def atlas_not_found(blas_info_): | |||
cblas_libs = blas_info.pop('libraries', []) | |||
|
|||
return cblas_libs, blas_info | |||
|
|||
|
|||
def tweak_extensions_to_build_from_c_and_cpp_files(extensions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just rename_cython_extensions
?
@jnothman I have tackled your comments. |
else: | ||
message = ('Please install cython with a version >= 0.23' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really pull this min version out as a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
Otherwise LGTM. |
now that cython >= 0.23 requirement is enforced in setup.py
No easy way to put comments inside multi-line command
You can merge after CI approves. thanks! |
Thank a lot for this @lesteve! |
Is there a benefit to having this in 0.18.1? On 2 November 2016 at 23:00, Raghav RV notifications@github.com wrote:
|
Having this included in the backports will help avoid travis cache based error in the backport PR #7672 :D |
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
Really enjoying this change, compiling is very much faster ! |
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will BA14 always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
Fixes #7094 and maybe related issues if there are other. Closes #7708.
What does this implement/fix? Explain your changes.
Basically build_tools/cythonize.py doesn't take into account cython files dependencies. For example if you change _criterion.pxd the dependents .pyx don't get recompiled, potentially yielding to confusing segmentation faults.
The thing is that cython takes into acount correctly if you write your setup.py carefull (mostly by using Cython.Build.cythonize and switching sources between .c and .pyx as appropriate). Compared to #7708 which is a work-around this PR is trying to tackle the root problem.
Upsides:
Downsides:
This is WIP because I haven't disabled caching on Travis yet. Just wanted to show the changes in setup.py files and get feed-back for now.
ping @raghavrv, @jnothman, @amueller.