-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Unit test times #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
the benchmark is really useful. Assigning milestone 0.10. |
@robertlayton @fabianp Per Robert's suggestion on the list, I looked through a couple of issues and landed here. I'd like to improve a couple of these unit tests. I'm assuming I can ignore the HMM tests for now, since the module might be deprecated soon? |
Its a good place to start (if you don't mind a bit of a steep learning curve!). Have you done python profiling before? If not, I'm happy to explain the format of the results. |
No clue--but like I mentioned before, I'm here to learn. :D What would you recommend as a reference on Python profilers? Is this a good start? |
Thats a good reference, much better than other docs. I wouldn't worry too much about the detail though, a single read to get the gist is fine. The report I've got is sorted by cumulative time - this is the total amount of time spent within that function. The real meat is the next line:
This means, of the 54 seconds it took to run all of the tests, 13 seconds were spent in the If we ignore the hmm and gmm ones, the most likely candidate for optimisation is
This one test takes 2.4 seconds. It is the |
This is a great rundown--thanks for taking the time to explain! I'll review over the next couple of days and let you know if I get stuck on something. |
Good luck :) |
Picking this up after a delay. The good news is that I'm now comfortable enough with Python to knock this out. Here are the most recent results:
This function in test_hmm.py
is responsible for about 3.5s of build time. It gets called 16 times while building with a variety of values for n_iter--if we standardize that to, say, n_iter=5, I'm sure we'll see a few seconds of savings. We also see sklearn.covariance.tests.test_robust_covariance.launch_mcd_on_dataset() taking about 2s of build time over 6 calls. I'm not too sure what's happening there, exactly; sklearn.covariance.robust_covariance cites a paper for FastMCD, and I'll take a look at that. |
Thanks for tackling this :) I think it would be better to sort by tottime than cumtime. Well at least looking at both is a good idea. |
I'd be happy to close this bug, as we now have much faster tests overall, and a new benchmarking framework that will help in the future. Everyone happy for that? |
Actually I'm still unhappy with the timing ;) |
This utility is nice to find the worst offending tests: https://github.com/patjenk/nose-timer |
Closing as this issue is too open-ended to be properly fixed. The codebase changes all the time, and so do the testing times. |
Following a discussion on the mailing list about the total build time, including testing, it was suggested that some of the unit tests are slow. The following functions take the most cumulative time in running the tests, and give insight into which tests should be addressed first. The hmm/gmm tests are currently the worst for taking time to run. Can someone please have a look at these tests and work out if its possible to make them faster?
I've taken some liberty with the following output, but its basically the result of running
$ python profile_sklearn.py | grep "test"
For reference,
profile_sklearn.py
is:The text was updated successfully, but these errors were encountered: