-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MNT Add asv benchmark suite #17026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
NicolasHug
merged 34 commits into
scikit-learn:master
from
jeremiedbb:add-asv-benchmarks
Jul 29, 2020
Merged
MNT Add asv benchmark suite #17026
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
c5b9c22
move asv benchmark suite to scikit-learn
jeremiedbb 1836d36
cln
jeremiedbb df88036
don't track cache
jeremiedbb 55c8149
config
jeremiedbb 3cf50b7
add doc
jeremiedbb ad13f0f
fix path
jeremiedbb 6a21314
commited wrong stuff
jeremiedbb 0b4a141
remove classmethod
jeremiedbb 6c6d040
typo
jeremiedbb 03f2a43
mention pandas doc
jeremiedbb 24b3eb2
tst broken doc
jeremiedbb fd500b0
reorder
jeremiedbb d4dc45a
reorg
jeremiedbb 7e32cb6
remove n_jobs from kmeans param names
jeremiedbb 6b40ae8
origin -> upstream
jeremiedbb deb43c2
footnote
jeremiedbb d3924ae
docstrings
jeremiedbb 913d850
docstrings
jeremiedbb 2dc1246
virtualenv
jeremiedbb e6a7558
rename classes
jeremiedbb 4accc0b
fix link
jeremiedbb 405871b
pytest ignore asv_benchmarks
jeremiedbb dab34b8
env vars for config
jeremiedbb 15342aa
env vars for config
jeremiedbb 86210c7
pathlib
jeremiedbb e84d3d2
simpler data cache
jeremiedbb a3d82d2
simpler data cache
jeremiedbb 56b4324
threadpoolctl in deps
jeremiedbb 2392631
cln docstring
jeremiedbb 7be77f3
add MiniBatchKMeans
jeremiedbb 8f1ba71
empty commit for docs to build
NicolasHug 898a333
Merge remote-tracking branch 'upstream/master' into add-asv-benchmarks
jeremiedbb 4121522
additional instructions
jeremiedbb 33de322
fix link
jeremiedbb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*__pycache__* | ||
env/ | ||
html/ | ||
results/ | ||
scikit-learn/ | ||
benchmarks/cache/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
{ | ||
// The version of the config file format. Do not change, unless | ||
// you know what you are doing. | ||
"version": 1, | ||
|
||
// The name of the project being benchmarked | ||
"project": "scikit-learn", | ||
|
||
// The project's homepage | ||
"project_url": "scikit-learn.org/", | ||
|
||
// The URL or local path of the source code repository for the | ||
// project being benchmarked | ||
"repo": "..", | ||
|
||
// The Python project's subdirectory in your repo. If missing or | ||
// the empty string, the project is assumed to be located at the root | ||
// of the repository. | ||
// "repo_subdir": "", | ||
|
||
// Customizable commands for building, installing, and | ||
// uninstalling the project. See asv.conf.json documentation. | ||
// | ||
// "install_command": ["python -mpip install {wheel_file}"], | ||
// "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"], | ||
// "build_command": [ | ||
// "python setup.py build", | ||
// "PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}" | ||
// ], | ||
|
||
// List of branches to benchmark. If not provided, defaults to "master" | ||
// (for git) or "default" (for mercurial). | ||
// "branches": ["master"], // for git | ||
// "branches": ["default"], // for mercurial | ||
|
||
// The DVCS being used. If not set, it will be automatically | ||
// determined from "repo" by looking at the protocol in the URL | ||
// (if remote), or by looking for special directories, such as | ||
// ".git" (if local). | ||
// "dvcs": "git", | ||
|
||
// The tool to use to create environments. May be "conda", | ||
// "virtualenv" or other value depending on the plugins in use. | ||
// If missing or the empty string, the tool will be automatically | ||
// determined by looking for tools on the PATH environment | ||
// variable. | ||
"environment_type": "conda", | ||
|
||
// timeout in seconds for installing any dependencies in environment | ||
// defaults to 10 min | ||
//"install_timeout": 600, | ||
|
||
// the base URL to show a commit for the project. | ||
"show_commit_url": "https://github.com/scikit-learn/scikit-learn/commit/", | ||
|
||
// The Pythons you'd like to test against. If not provided, defaults | ||
// to the current version of Python used to run `asv`. | ||
// "pythons": ["3.6"], | ||
|
||
// The list of conda channel names to be searched for benchmark | ||
// dependency packages in the specified order | ||
// "conda_channels": ["conda-forge", "defaults"] | ||
|
||
// The matrix of dependencies to test. Each key is the name of a | ||
// package (in PyPI) and the values are version numbers. An empty | ||
// list or empty string indicates to just test against the default | ||
// (latest) version. null indicates that the package is to not be | ||
// installed. If the package to be tested is only available from | ||
// PyPi, and the 'environment_type' is conda, then you can preface | ||
// the package name by 'pip+', and the package will be installed via | ||
// pip (with all the conda available packages installed first, | ||
// followed by the pip installed packages). | ||
// | ||
"matrix": { | ||
"numpy": [], | ||
"scipy": [], | ||
"cython": [], | ||
"joblib": [], | ||
"threadpoolctl": [] | ||
}, | ||
|
||
// Combinations of libraries/python versions can be excluded/included | ||
// from the set to test. Each entry is a dictionary containing additional | ||
// key-value pairs to include/exclude. | ||
// | ||
// An exclude entry excludes entries where all values match. The | ||
// values are regexps that should match the whole string. | ||
// | ||
// An include entry adds an environment. Only the packages listed | ||
// are installed. The 'python' key is required. The exclude rules | ||
// do not apply to includes. | ||
// | ||
// In addition to package names, the following keys are available: | ||
// | ||
// - python | ||
// Python version, as in the *pythons* variable above. | ||
// - environment_type | ||
// Environment type, as above. | ||
// - sys_platform | ||
// Platform, as in sys.platform. Possible values for the common | ||
// cases: 'linux2', 'win32', 'cygwin', 'darwin'. | ||
// | ||
// "exclude": [ | ||
// {"python": "3.2", "sys_platform": "win32"}, // skip py3.2 on windows | ||
// {"environment_type": "conda", "six": null}, // don't run without six on conda | ||
// ], | ||
// | ||
// "include": [ | ||
// // additional env for python2.7 | ||
// {"python": "2.7", "numpy": "1.8"}, | ||
// // additional env if run on windows+conda | ||
// {"platform": "win32", "environment_type": "conda", "python": "2.7", "libpython": ""}, | ||
// ], | ||
|
||
// The directory (relative to the current directory) that benchmarks are | ||
// stored in. If not provided, defaults to "benchmarks" | ||
// "benchmark_dir": "benchmarks", | ||
|
||
// The directory (relative to the current directory) to cache the Python | ||
// environments in. If not provided, defaults to "env" | ||
// "env_dir": "env", | ||
|
||
// The directory (relative to the current directory) that raw benchmark | ||
// results are stored in. If not provided, defaults to "results". | ||
// "results_dir": "results", | ||
|
||
// The directory (relative to the current directory) that the html tree | ||
// should be written to. If not provided, defaults to "html". | ||
// "html_dir": "html", | ||
|
||
// The number of characters to retain in the commit hashes. | ||
// "hash_length": 8, | ||
|
||
// `asv` will cache results of the recent builds in each | ||
// environment, making them faster to install next time. This is | ||
// the number of builds to keep, per environment. | ||
// "build_cache_size": 2, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's funny they went with json which specifically doesn't allow comments in the spec. |
||
|
||
// The commits after which the regression search in `asv publish` | ||
// should start looking for regressions. Dictionary whose keys are | ||
// regexps matching to benchmark names, and values corresponding to | ||
// the commit (exclusive) after which to start looking for | ||
// regressions. The default is to start from the first commit | ||
// with results. If the commit is `null`, regression detection is | ||
// skipped for the matching benchmark. | ||
// | ||
// "regressions_first_commits": { | ||
// "some_benchmark": "352cdf", // Consider regressions only after this commit | ||
// "another_benchmark": null, // Skip regression detection altogether | ||
// }, | ||
|
||
// The thresholds for relative change in results, after which `asv | ||
// publish` starts reporting regressions. Dictionary of the same | ||
// form as in ``regressions_first_commits``, with values | ||
// indicating the thresholds. If multiple entries match, the | ||
// maximum is taken. If no entry matches, the default is 5%. | ||
// | ||
// "regressions_thresholds": { | ||
// "some_benchmark": 0.01, // Threshold of 1% | ||
// "another_benchmark": 0.5, // Threshold of 50% | ||
// }, | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
"""Benchmark suite for scikit-learn using ASV""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
from sklearn.cluster import KMeans, MiniBatchKMeans | ||
|
||
from .common import Benchmark, Estimator, Predictor, Transformer | ||
from .datasets import _blobs_dataset, _20newsgroups_highdim_dataset | ||
from .utils import neg_mean_inertia | ||
|
||
|
||
class KMeansBenchmark(Predictor, Transformer, Estimator, Benchmark): | ||
""" | ||
Benchmarks for KMeans. | ||
""" | ||
|
||
param_names = ['representation', 'algorithm', 'init'] | ||
params = (['dense', 'sparse'], ['full', 'elkan'], ['random', 'k-means++']) | ||
|
||
def setup_cache(self): | ||
super().setup_cache() | ||
|
||
def make_data(self, params): | ||
representation, algorithm, init = params | ||
|
||
if representation == 'sparse': | ||
data = _20newsgroups_highdim_dataset(n_samples=8000) | ||
else: | ||
data = _blobs_dataset(n_clusters=20) | ||
|
||
return data | ||
|
||
def make_estimator(self, params): | ||
representation, algorithm, init = params | ||
|
||
max_iter = 30 if representation == 'sparse' else 100 | ||
|
||
estimator = KMeans(n_clusters=20, | ||
algorithm=algorithm, | ||
init=init, | ||
n_init=1, | ||
max_iter=max_iter, | ||
tol=-1, | ||
random_state=0) | ||
|
||
return estimator | ||
|
||
def make_scorers(self): | ||
self.train_scorer = ( | ||
lambda _, __: neg_mean_inertia(self.X, | ||
self.estimator.predict(self.X), | ||
self.estimator.cluster_centers_)) | ||
self.test_scorer = ( | ||
lambda _, __: neg_mean_inertia(self.X_val, | ||
self.estimator.predict(self.X_val), | ||
self.estimator.cluster_centers_)) | ||
|
||
|
||
class MiniBatchKMeansBenchmark(Predictor, Transformer, Estimator, Benchmark): | ||
""" | ||
Benchmarks for MiniBatchKMeans. | ||
""" | ||
|
||
param_names = ['representation', 'init'] | ||
params = (['dense', 'sparse'], ['random', 'k-means++']) | ||
|
||
def setup_cache(self): | ||
super().setup_cache() | ||
|
||
def make_data(self, params): | ||
representation, init = params | ||
|
||
if representation == 'sparse': | ||
data = _20newsgroups_highdim_dataset() | ||
else: | ||
data = _blobs_dataset(n_clusters=20) | ||
|
||
return data | ||
|
||
def make_estimator(self, params): | ||
representation, init = params | ||
|
||
max_iter = 5 if representation == 'sparse' else 2 | ||
|
||
estimator = MiniBatchKMeans(n_clusters=20, | ||
init=init, | ||
n_init=1, | ||
max_iter=max_iter, | ||
batch_size=1000, | ||
max_no_improvement=None, | ||
compute_labels=False, | ||
random_state=0) | ||
|
||
return estimator | ||
|
||
def make_scorers(self): | ||
self.train_scorer = ( | ||
lambda _, __: neg_mean_inertia(self.X, | ||
self.estimator.predict(self.X), | ||
self.estimator.cluster_centers_)) | ||
self.test_scorer = ( | ||
lambda _, __: neg_mean_inertia(self.X_val, | ||
self.estimator.predict(self.X_val), | ||
self.estimator.cluster_centers_)) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should expect contributors to have conda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the config file it's the default env. I added an explanation on how to use virtualenv instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we leave it empty?