-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Generic benchmarking/profiling tool #10289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
sounds useful
|
Definitely agree with that. I think there are cases where
That would also depend if the user is interested in the asymptotic scaling or the scaling in some particular range of parameters. Memory profiling would require additional optional dependency |
I think we can accept such an optional dependency. Also, I'm fine to see
this implemented and put into a separate package, but I think it would be a
great resource for our users, and potentially for building our
documentation and benchmarking enhancements.
|
Can I try working on this feature? |
it's a pretty substantial piece of work. Propose an algorithm?
|
We have to call fit with the parameters in How do we plan to vary the number of features? Please let me know if there's anything I'm missing out on. |
yes, that's the question: how do we vary n_samples / n_features?
|
We can use np.logspace to vary n_samples and we can also let users use linearly spaced n_samples for smaller datasets. After fitting on different values of n_samples we can feed fit_time, peak_memory, model_memory to a GP regressor(And let users chose the kernel). We can also provide an option for KFold cross val at each value of n_sample at the cost of a smaller maximum possible value for n_samples to provide a higher accuracy. |
kfold isn't relevant if we are not interested in a fair estimate of
accuracy. random samples should do. I'm not yet sure what policy we should
have for repeated trials...
rather than logspace, I'd start with a fairly small number of samples and
double them. Or multiply by 4 initially and fill in the gaps later to
reduce uncertainty in the learnt function. Not really sure how to sensibly
vary both samples and features and perhaps we should leave features until
later.
I'm okay if you give this a go, but expect it to not be a bit of a process
|
Thanks a lot. I've been wanting to work on something like this for quite some time now. If we use logspace, n_fits becomes a more meaningful parameter as it then becomes the number of distinct n_samples generated. It might also be more intuitive for users as it is also related to the accuracy of the models that are generated here. Thoughts? |
yes, but: logspace produces floats and we need ints; and we might run out
of time before all n_fits are completed. so we need to take care to try get
the shape of the function within constraints.
|
should we use OpenML, for example OpenML 100? It is only classification right now but covers n_features and n_samples pretty well (looks kinda uniform on a logscale). |
Or did you want to use synthetic data? |
This really sounds like a relatively sophisticated automl problem.... |
I don't like how we always benchmark on canned data. i want the user to get
some understanding of how it will perform on their chosen (dummy or real)
dataset.
This is not really automl, but I think what you mean is that the function
for choosing the next point to try is a lot like the kinds of constrained
searches performed in automl
|
Well trying to estimate runtime from hyper parameters is also pretty typical in automl. |
okay. this issue currently focuses on sample size rather than parameter
variation...
…On 19 Dec 2017 10:32 am, "Andreas Mueller" ***@***.***> wrote:
Well trying to estimate runtime from hyper parameters is also pretty
typical in automl.
And there's work on extrapolating learning curves for neural nets for
example.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#10289 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6x7xzyrdtE8A-zomhNaBA_2orDXpks5tBvYYgaJpZM4Q-TC5>
.
|
ah, sorry, didn't read correctly. Still related but easier ;) |
How about we first fit on all samples, then half, then a fourth and 3 fourths and so on? |
All samples may be much larger than is feasible to run in the benchmark
time... It is also least informative, given the budget, as to the
functional shape of the scaling
…On 20 December 2017 at 22:01, Vrishank Bhardwaj ***@***.***> wrote:
How about we first fit on all samples, then half, then a fourth and 3
fourths and so on?
So even with just a few iterations our model can be fed data for a varied
number of n_samples
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#10289 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62DNI0359l8dlrPVPoHlE3fRkp7Aks5tCOkDgaJpZM4Q-TC5>
.
|
I have started some work on this. Should I put in a PR with n_samples starting at 8 and doubling for each fit? |
Or we could let users select a base and a multiplier? |
I don't mind starting at 8 for now. Might be a bad pick if the dataset is
big and has many classes...
|
Agreed. Is the final goal a script or something similar to an estimator? |
A function + an example run with plots.
…On 22 December 2017 at 00:21, Vrishank Bhardwaj ***@***.***> wrote:
Agreed. Is the final goal a script or something something similar to an
estimator?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#10289 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65hES-7bDWawnpovDqDjtWFIWoFAks5tCltqgaJpZM4Q-TC5>
.
|
Cool. Where should I put the function is the repo? |
start with |
Perhaps sklearn.benchmark??
…On 22 December 2017 at 05:55, Andreas Mueller ***@***.***> wrote:
start with n_classes * 2 for classifiers? ;)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#10289 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wuZ8RBQrm6JKjr3BCezUzZyVMWsks5tCqmpgaJpZM4Q-TC5>
.
|
There are also some low-level concerns that need to be addressed for this to work reliably IMO and in particular the finite resolution of the timer: on Windows, time.time has a resolution of 16ms, A bit in the orthogonal direction to this issue, I have been experimenting with benchmarking lately in this repo and I'm wondering if an API similar to import numpy as np
from sklearn.cluster import KMeans
from neurtu import delayed, timeit
rng = np.random.RandomState(42)
n_samples_max, n_features = 10000, 10
timeit(delayed(KMeans, tags={'n_samples': n_samples})(n_clusters=8)
.fit(rng.rand(n_samples, n_features))
for n_samples in np.geomspace(100, n_samples_max, 5, dtype='int')) which here produces a DataFrame with
that can then be sent to a GP regressor or just visualized. The advantage of such approach is that it can be used to benchmark and compare anything the user might be interested: Here is a more complete example that includes runtime and peak memory usage of LogisticRegression for different |
@jeremiedbb, @jnothman has #17026 solved this issue? Thanks! |
I'm not sure how well #17026 solves the need of a user estimating how well an algorithm will scale on their specific data. If it does, a tutorial would be beneficial! |
@jnothman #17026 Is an implementation of a benchmarking tool for the sample datasets we use in the sklearn examples, it doesn't exactly cover the use case that was in mind for this profiling tool, which was intended to be used to model the change in performance of estimators as their hyperparams change. |
Uh oh!
There was an error while loading. Please reload this page.
We have not been proficient at documenting the estimated runtime or space complexity of our estimators and algorithms. Even were we to document asymptotic complexity functions, it would not give a realistic estimate for all parameter settings, etc. for a particular kind of data. Rather we could assist users in estimating complexity functions empirically.
I would like to see a function something like the following:
This would run
fit
successively for different values ofn_samples
(logarithmically spaced, perhaps guided by a gaussian process) to estimate the function for fitting complexity, within budget. I have not thought extensively about exactly what sampling strategy would be followed. If this is implemented for the library, we would consider experimental and the algorithm subject to change for a little while.What do others think?
The text was updated successfully, but these errors were encountered: