time.clock() vs time.time() #2844

mblondel · 2014-02-11T03:19:22Z

git grep indicates that we are using time.time() in a few places but in general time.clock() is more appropriate for benchmarking:
http://stackoverflow.com/questions/85451/python-time-clock-vs-time-time-accuracy

The text was updated successfully, but these errors were encountered:

GaelVaroquaux · 2014-02-12T12:59:22Z

Thanks for pointing this out. I have labeled the issue Easy and Enhancement. Hopefully issue labeling will help us keep track of all the issues on scikit-learn.

arjoly · 2014-02-12T13:02:58Z

Why not using timeit.timeit instead?

GaelVaroquaux · 2014-02-12T13:15:36Z

Why not using timeit.timeit instead?

The usage of timeit leads to really ugly code. In addition, we are using
these calls not for precise timing, but for annotating our examples with
a bit of timing information.

mblondel · 2014-02-12T13:35:27Z

And in some places timeit cannot be used, e.g., in the cross-validation module.

kaushik94 · 2014-02-13T12:13:54Z

Hi,
I am a little new to the community but I am pretty sure there are some misconceptions about time() and clock().
I feel that the official documentation has a flaw.
here's a sample script in python that measures precision in time() and clock() and it turns out that time() is always better in unix like machines and in windows clock() is better.

import time
def measure_time():
    t0 = time.time()
    t1 = time.time()
    while t1 == t0:
        t1 = time.time()
    return t1-t0
def measure_clock():
    t0 = time.clock()
    t1 = time.clock()
    while t1 == t0:
        t1 = time.clock()
    return t1-t0

print "time result : " + str(measure_time())
print "clock result :" + str(measure_clock())

kaushik94 · 2014-02-13T12:17:10Z

moreover there is a huge drawback in clock(). It works ONLY within a process. Its reference is the starting of a particular process whereas the reference taken by time() is an epoch, a particular point of time in the past(fixed). We might have a problem in benchmarking multi processes with clock()

mblondel · 2014-02-13T13:23:14Z

I'm not sure whether your example proves anything. Also, I think clock() should be more precise on machines with high load.

kaushik94 · 2014-02-14T07:52:15Z

hi mblondel,
I am trying to get the precision of clock() and time() here and its different in windows and linux(tried on windows 7 and ubunutu 13.04). If you go by the official python documentation, it states that clock() is always better but if you run the code, the time() is 10 times more precise than clock(). And I dont understand what you meant by load but im sure clock() and time() are OS dependent. Please check !

kaushik94 · 2014-02-14T07:55:33Z

And by the way my code calculates the minimum time difference possible in the clock() and time() functions and returns that value.

8000

mblondel · 2014-02-14T08:17:36Z

What I mean by machine with high load is a machine where several computationally expensive processes are run concurrently (e.g., a server used by several people to do ML experiments). I used to use time.time() but found out that it is not reliable on machines with high load.

GaelVaroquaux · 2014-02-14T08:24:05Z

I used to use time.time() but found out that it is not reliable on
machines with high load.

It really depends on what the question that you are asking is. I
personnally am usually interested in knowing how long a computation will
take in real time, not in CPU time, because the time that I have to take
a coffeee is better expressed in real time than CPU time.

mblondel · 2014-02-14T08:33:49Z

Yes it depends on what question you're asking. If you want to tell whether algorithm A is faster than algorithm B, clock() is better. With time.time() you might get a different answer depending on how busy the machine is.

AlexanderFabisch · 2014-02-16T11:04:59Z

Apart from that, clock() is deprecated since Python 3.3. This has already been mentioned in a comment on the stackoverflow post:

Depre
8000
cated since version 3.3: The behaviour of this function depends on the platform:
use perf_counter() or process_time() instead, depending on your requirements, to have
a well defined behaviour.

GaelVaroquaux · 2014-02-16T14:12:03Z

OK, how about we just stick with time.time()?

mblondel · 2014-02-16T16:36:31Z

For examples, it's not big deal to keep using time.time(). For the the cross-validation code, I'm planning to add an option to let the user choose which function to use in PR #2759.

Closing.

GaelVaroquaux · 2014-02-16T16:40:14Z

For the the cross-validation code, I'm planning to add an option to let
the user choose which function to use in PR #2759.

I am not even sure that it is a good idea: as a research you might care
to measure only CPU time to choose the algorithm to use. But as an
end-user you really care about the total time. In other terms, times it
take to fork or I/O time matter to you. Beside, two other problems with
time.clock are the fact that it is deprecated (we really shouldn't be
using things that are deprecated) and the fact that it cannot follow
forks.

mblondel · 2014-02-16T17:25:10Z

Well some users do use scikit-learn for research purposes. The new functions introduced in PR #2759 make it possible to compare different algorithms w.r.t. to both accuracy and training time. Moreover, it is ok to use clock() within a given process. So it is perfectly fine to use it in the cross-validation code, since the time is measured for each train / test split (one split = one process), and not for the entire procedure. My plan is to do something like time_func = hasattr(time, time_func) where time is the time module and time_func is a string parameter. So my code won't even import clock() (which is not deprecated in the 2.7 branch, BTW).

jnothman · 2014-02-16T21:24:01Z

Nor am I persuaded, @mblondel. Have you found that they differ
substantially in their evaluation in general? And wouldn't a user concerned
with this level of detail be comfortable adapting the CV code to their
needs?

On 17 February 2014 04:25, Mathieu Blondel notifications@github.com wrote:

Well some users do use scikit-learn for research purposes. The new
functions introduced in PR #2759 https://github.com/scikit-learn/scikit-learn/pull/2759make it possible to compare different algorithms w.r.t. to both accuracy
and training time. Moreover, it is ok to use clock() within a given
process. So it is perfectly fine to use it in the cross-validation code,
since the time is measured for each train / test split (one split = one
process), and not for the entire procedure. My plan is to do something like time_func
= hasattr(time, time_func) where time is the time module and time_func is
a string parameter. So my code won't even import clock() (which is not
deprecated in the 2.7 branch, BTW).

Reply to this email directly or view it on GitHubhttps://github.com//issues/2844#issuecomment-35202512
.

kaushik94 · 2014-02-17T14:39:21Z

hi,
I actually agree with jnothman.

GaelVaroquaux · 2014-02-17T16:20:15Z

I actually agree with jnothman.

So do I. The reasonning behind this is that adding too much option makes
the tool worst both for the developer (additional branching in the code)
and the user (too many choices, and the important information lost in
details).

mblondel · 2014-02-18T02:10:50Z

The philosophy of the project has always been to give options to the user and provide sane defaults. I really care about measuring CPU time. If I can't even use my code for my purposes, I think I'll just close PR #2759 and maintain the code privately.

AlexanderFabisch · 2014-02-18T07:25:12Z

I think nobody wanted to provoke that. Support for multiple metrics would be a great feature.

However, in other languages it is often considered to be bad practice if you have more than 3 function arguments. In Python that should not be so drastic because you have named arguments but I also think that good code should always try to find a good compromise.

Before you give up on the PR we should try to find a better solution. Maybe we can postpone that to another PR?

GaelVaroquaux added Easy labels Feb 12, 2014

mblondel closed this as completed Feb 16, 2014

rth mentioned this issue Mar 4, 2018

Generic benchmarking/profiling tool #10289

Open

Shihab-Shahriar mentioned this issue May 23, 2020

Time computation in benchmarks: process_time() vs time(), perf_counter(), datetime.now() #17316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

time.clock() vs time.time() #2844

time.clock() vs time.time() #2844

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

time.clock() vs time.time() #2844

time.clock() vs time.time() #2844

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!