-
Notifications
You must be signed in to change notification settings - Fork 51
Request for project inclusion: scitime #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Interesting! We had ideas for a similar tool at
scikit-learn/scikit-learn#10289 (and an initial
implementation). You might want to also benchmark for memory consumption.
|
Thanks for taking a look @jnothman! We did look at the initial implementation of the benchmarking tool you pointed us to. Building a memory consumption estimator would be a great next step as we continue working on this package. Let us know if you think we satisfy the scikit-learn-contrib requirements, we're looking forward to continuing our work! |
Hi @jnothman , just wanted to follow up on our request and make sure it does not get forgotten. |
We seem to be a bit stuck on scikit-learn-contrib: we do not have a clear
process for review. I tend to focus on the core library, but I hope we
discuss how to better maintain contrib at the sprint this week.
|
I think that rather than a problem with the process, we simply lack people.
Sent from my phone. Please forgive typos and briefness.
…On Feb 24, 2019, 17:45, at 17:45, Joel Nothman ***@***.***> wrote:
We seem to be a bit stuck on scikit-learn-contrib: we do not have a
clear
process for review. I tend to focus on the core library, but I hope we
discuss how to better maintain contrib at the sprint this week.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#38 (comment)
|
Hi @jnothman and @GaelVaroquaux - thanks for the update, it makes perfectly sense. We just published a medium article featured on FreeCodeCamp describing our process, if that helps. |
@GaelVaroquaux I don't understand your point. We need a process that works with the resources we have. I don't see why we could hope to have "enough" people at some point. We should probably discuss how much of a review we want to do. |
@joaquinvanschoren and @janvanrijn might be interested in this as well. |
Hi all - just wanted to follow up on this and see if there was any update. |
@nathan-toubiana Sorry for the delay, I hope I can get to this in May, maybe someone else will get to it earlier. |
thanks for the update! We're very excited to hear that. |
btw have you compared to the model that's within oboe? https://github.com/udellgroup/oboe |
Thanks for the reference. Based on their paper, it seems that their 'meta models' only account for the number of observations and features as meta inputs (we do add model hyperparameters and machine performance data as meta inputs) - other than that and the fact that their meta models are polynomial regressions, their logic seems pretty similar |
I think their model is per hyper-parameter setting, but it's not entirely clear to me. I need to check the code. They say they have relatively accurate results with a simple model. I don't think they look across machines at all, though. Anyway, the paper seemed cool and I thought you might be interested. |
oh that makes sense - not sure how they handle non-categorical hyper-parameters though, I ll look at the code. Definitely super interesting! thanks a lot |
Hi @amueller , we wanted to know if you had a chance to look through our submission? |
Sorry :-/ |
Hi @amueller , Hope you are well. I'm following up on last year's request for our package scitime to be part of sk learn contribs. We've had a significant amount of requests and activity on our repo over the last few months so I thought it could be a good time to reopen our discussion. We'd love to hear how/if we could improve our package to be part of the scikit-learn community. Thanks! |
Thanks @nathan-toubiana the project looks interesting, and it would make sense to have it in scikit-learn-contrib however I have a few questions / comments, for instance regarding the usage example:
The above comments are related to the inclusion process. For the following ones it's just personal curiosity,
If you use a linear model to predict the
Yeah, naively I would have thought that RF would not be the best for this use case, particularly if you are not sure if you are going to be extrapolating.
Predicting the log of the duration might help. |
Hi @rth Thanks for your prompt answer.
We did a PR to add descriptions in the documentation (section
Once the runtime data is generated, it’s very quick to update the pkls (see the documentation here) and we actually did this last week when we bumped the package version to 0.1.0, see our PR here. Users of scitime can build their own pkls by generating their own data and we also plan to make our training data public.
We renamed
Unfortunately, n_samples and n_features are often not the only parameters having a significant impact on runtime. For instance, in RandomForestClassifier, max_depth can significantly change the runtime. This is why we went with this approach. Number of cpus and available memory can also make a difference.
Yes, this is the reason why we decided to keep both meta algos. NN suits best for extrapolations while RF has been trained on a large number of datapoints and provides good estimations for cases that are similar to our training data.
Thanks! We’ll try to retrain doing that and see if it improves the accuracy. |
Hi @rth , Just wanted to follow up on our conversation, we re ready to make these changes and more if needed. Thanks! |
Hi, We just released a new version (v0.1.1) with all the changes discussed above. |
Hi, Just following up on this since it has been some time. Would love to understand next steps needed to get approved. |
The repo hasn't seen any updates in the past 2 years. The repo also includes pickle files, which raises a lot of issues, both in terms of security, and in terms of version compatibility. I don't think we should include this in the contrib org. |
hi @adrinjalali , thanks for getting back to us. The reason why the repo hasn't been updated lately is that we havent heard from our request since our last back and forth (as you can see in the above thread). However, we are still seeing some usage (~1k weekly uploads) and are more than happy to work on more updates as needed, if we are still considered for inclusion. Looking forward to hearing from you. |
The repo being active is usually a requirement for it to be moved here, and not the other way around. Otherwise we have no way of knowing if after inclusion the repo will go stale or not. Also regarding pickles, I would need to know why exactly they're included. Pickle files are executables, and nobody should load any pickle files unless they really really trust the source. So we really need to find an alternative here. |
Uh oh!
There was an error while loading. Please reload this page.
Request for project inclusion in scikit-learn-contrib
The text was updated successfully, but these errors were encountered: