8000 time.clock() vs time.time() · Issue #2844 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

time.clock() vs time.time() #2844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mblondel opened this issue Feb 11, 2014 · 22 comments
Closed

time.clock() vs time.time() #2844

mblondel opened this issue Feb 11, 2014 · 22 comments
Labels
Easy Well-defined and straightforward way to resolve Enhancement

Comments

@mblondel
Copy link
Member

git grep indicates that we are using time.time() in a few places but in general time.clock() is more appropriate for benchmarking:
http://stackoverflow.com/questions/85451/python-time-clock-vs-time-time-accuracy

@GaelVaroquaux
Copy link
Member

Thanks for pointing this out. I have labeled the issue Easy and Enhancement. Hopefully issue labeling will help us keep track of all the issues on scikit-learn.

@arjoly
Copy link
Member
arjoly commented Feb 12, 2014

Why not using timeit.timeit instead?

@GaelVaroquaux
Copy link
Member

Why not using timeit.timeit instead?

The usage of timeit leads to really ugly code. In addition, we are using
these calls not for precise timing, but for annotating our examples with
a bit of timing information.

@mblondel
Copy link
Member Author

And in some places timeit cannot be used, e.g., in the cross-validation module.

@kaushik94
Copy link
Contributor

Hi,
I am a little new to the community but I am pretty sure there are some misconceptions about time() and clock().
I feel that the official documentation has a flaw.
here's a sample script in python that measures precision in time() and clock() and it turns out that time() is always better in unix like machines and in windows clock() is better.

import time
def measure_time():
    t0 = time.time()
    t1 = time.time()
    while t1 == t0:
        t1 = time.time()
    return t1-t0
def measure_clock():
    t0 = time.clock()
    t1 = time.clock()
    while t1 == t0:
        t1 = time.clock()
    return t1-t0

print "time result : " + str(measure_time())
print "clock result :" + str(measure_clock())  

@kaushik94
Copy link
Contributor

moreover there is a huge drawback in clock(). It works ONLY within a process. Its reference is the starting of a particular process whereas the reference taken by time() is an epoch, a particular point of time in the past(fixed). We might have a problem in benchmarking multi processes with clock()

@mblondel
Copy link
Member Author

I'm not sure whether your example proves anything. Also, I think clock() should be more precise on machines with high load.

@kaushik94
Copy link
Contributor

hi mblondel,
I am trying to get the precision of clock() and time() here and its different in windows and linux(tried on windows 7 and ubunutu 13.04). If you go by the official python documentation, it states that clock() is always better but if you run the code, the time() is 10 times more precise than clock(). And I dont understand what you meant by load but im sure clock() and time() are OS dependent. Please check !

@kaushik94
Copy link
Contributor

And by the way my code calculates the minimum time difference possible in the clock() and time() functions and returns that value.

8000

@mblondel
Copy link
Member Author

What I mean by machine with high load is a machine where several computationally expensive processes are run concurrently (e.g., a server used by several people to do ML experiments). I used to use time.time() but found out that it is not reliable on machines with high load.

@GaelVaroquaux
Copy link
Member

I used to use time.time() but found out that it is not reliable on
machines with high load.

It really depends on what the question that you are asking is. I
personnally am usually interested in knowing how long a computation will
take in real time, not in CPU time, because the time that I have to take
a coffeee is better expressed in real time than CPU time.

@mblondel
Copy link
Member Author

Yes it depends on what question you're asking. If you want to tell whether algorithm A is faster than algorithm B, clock() is better. With time.time() you might get a different answer depending on how busy the machine is.

@AlexanderFabisch
Copy link
Member

Apart from that, clock() is deprecated since Python 3.3. This has already been mentioned in a comment on the stackoverflow post:

Depre
8000
cated since version 3.3: The behaviour of this function depends on the platform:
use perf_counter() or process_time() instead, depending on your requirements, to have
a well defined behaviour.

@GaelVaroquaux
Copy link
Member

OK, how about we just stick with time.time()?

@mblondel
Copy link
Member Author

For examples, it's not big deal to keep using time.time(). For the the cross-validation code, I'm planning to add an option to let the user choose which function to use in PR #2759.

Closing.

@GaelVaroquaux
Copy link
Member

For the the cross-validation code, I'm planning to add an option to let
the user choose which function to use in PR #2759.

I am not even sure that it is a good idea: as a research you might care
to measure only CPU time to choose the algorithm to use. But as an
end-user you really care about the total time. In other terms, times it
take to fork or I/O time matter to you. Beside, two other problems with
time.clock are the fact that it is deprecated (we really shouldn't be
using things that are deprecated) and the fact that it cannot follow
forks.

@mblondel
Copy link
Member Author

Well some users do use scikit-learn for research purposes. The new functions introduced in PR #2759 make it possible to compare different algorithms w.r.t. to both accuracy and training time. Moreover, it is ok to use clock() within a given process. So it is perfectly fine to use it in the cross-validation code, since the time is measured for each train / test split (one split = one process), and not for the entire procedure. My plan is to do something like time_func = hasattr(time, time_func) where time is the time module and time_func is a string parameter. So my code won't even import clock() (which is not deprecated in the 2.7 branch, BTW).

@jnothman
Copy link
Member

Nor am I persuaded, @mblondel. Have you found that they differ
substantially in their evaluation in general? And wouldn't a user concerned
with this level of detail be comfortable adapting the CV code to their
needs?

On 17 February 2014 04:25, Mathieu Blondel notifications@github.com wrote:

Well some users do use scikit-learn for research purposes. The new
functions introduced in PR #2759https://github.com/scikit-learn/scikit-learn/pull/2759make it possible to compare different algorithms w.r.t. to both accuracy
and training time. Moreover, it is ok to use clock() within a given
process. So it is perfectly fine to use it in the cross-validation code,
since the time is measured for each train / test split (one split = one
process), and not for the entire procedure. My plan is to do something like time_func
= hasattr(time, time_func) where time is the time module and time_func is
a string parameter. So my code won't even import clock() (which is not
deprecated in the 2.7 branch, BTW).

Reply to this email directly or view it on GitHubhttps://github.com//issues/2844#issuecomment-35202512
.

@kaushik94
Copy link
Contributor

hi,
I actually agree with jnothman.

@GaelVaroquaux
Copy link
Member

I actually agree with jnothman.

So do I. The reasonning behind this is that adding too much option makes
the tool worst both for the developer (additional branching in the code)
and the user (too many choices, and the important information lost in
details).

@mblondel
Copy link
Member Author

The philosophy of the project has always been to give options to the user and provide sane defaults. I really care about measuring CPU time. If I can't even use my code for my purposes, I think I'll just close PR #2759 and maintain the code privately.

@AlexanderFabisch
Copy link
Member

I think nobody wanted to provoke that. Support for multiple metrics would be a great feature.

However, in other languages it is often considered to be bad practice if you have more than 3 function arguments. In Python that should not be so drastic because you have named arguments but I also think that good code should always try to find a good compromise.

Before you give up on the PR we should try to find a better solution. Maybe we can postpone that to another PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Enhancement
Projects
None yet
Development

No branches or pull requests

6 participants
0