ENH: provide monitoring of model selection searches #528

stsievert · 2019-06-24T00:44:57Z

What does this PR implement?
It allows the user to monitor the model selection search. It prints out information on the score received ~~and allows the user to stop the searches via Cntrl-C.~~

If verbose is False (default): log all the time but don't pipe to stdout (aka print)
If verbose is True: log and print all the time
If verbose is an int/float and 0 < verbose <= 1: log and print verbose fraction of the time.
If verbose is zero: do not log past initialization.

This allows:

The user to see logs in Jupyter notebooks by specifying verbose=True
Advanced users to configure logging themselves by default with verbose=False.
Advanced users to turn off logging with verbose=0.

All types are cast to float except bool; valid values include verbose in [0.0, 1, True, False, 0.5]. An error will be raised if not 0 <= verbose <= 1.

parameters are not recognized during fitting

stsievert · 2019-06-24T00:46:47Z

I'm not sure this PR has the best method of printing/logging information:

dask-ml/dask_ml/model_selection/_incremental.py

Lines 112 to 115 in bde46b7

    
           def _show_msg(msg, verbose): 
        
               if verbose: 
        
                   print(msg) 
        
               logger.info(msg)

I'm also unsure how to test KeyboardInterrupts.

TomAugspurger

You may be able to write a test for the KeyboardInterrupt using signal.sigint. LMK if you want help with that.

dask_ml/model_selection/_incremental.py

stsievert · 2019-06-27T06:11:16Z

I'm having difficult time with implementation that catches the KeyboardInterrupt. When I test it manually in a notebook, the code throws an KeyboardInterrupt when I manually send a single keyboard interruption and does not stop the training cleanly. Here's the traceback of the error I get:

KeyboardInterrupt               Traceback (most recent call last)
<ipython-input-5-3748455ece57> in <module>()
----> 9 df = _test_keyboardinterrupt()
<ipython-input-4-135866acf811> in _test_keyboardinterrupt()
---> 13     search.fit(X, y)
~/Developer/stsievert/dask-ml/dask_ml/model_selection/_incremental.py in fit(self, X, y, **fit_params)
--> 641         return default_client().sync(self._fit, X, y, **fit_params)
~/anaconda3/lib/python3.6/site-packages/distributed/client.py in sync(self, func, *args, **kwargs)
--> 753             return sync(self.loop, func, *args, **kwargs)
~/anaconda3/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
--> 329             e.wait(10)
~/anaconda3/lib/python3.6/threading.py in wait(self, timeout)
--> 551                 signaled = self._cond.wait(timeout)
~/anaconda3/lib/python3.6/threading.py in wait(self, timeout)
--> 299                     gotit = waiter.acquire(True, timeout)

KeyboardInterrupt:

TomAugspurger · 2019-06-27T15:51:48Z

I'm learning a bit about this for a distributed issue. Will take a look later today.

TomAugspurger · 2019-06-27T19:46:36Z

Can you talk a bit about what the expected behavior of a KeyboardInterrupt halfway through is? As a user, I might expect the following

All currently scheduled fits are cancelled (cluster should become idle)
The usual attributes like best_estimator_ are present, and reflect the state of the world when the training was interrupted.
I have some way of knowing that the training was stopped early? An attribute on the model?

Overall, this looks a bit more complicated that adding better logging. It may be worth splitting into its own PR.

stsievert · 2019-06-27T20:10:30Z

All currently scheduled fits are cancelled (cluster should become idle)

The usual attributes like best_estimator_ are present, and reflect the state of the world when the training was interrupted.

I have some way of knowing that the training was stopped early? An attribute on the model?

I would certainly expect 1 and 2. I intend the KeyboardInterrupt to be a mechanism for the busy data scientist to get on with their workflow (using the best estimator, etc).

TomAugspurger · 2019-07-08T14:58:51Z

Looks like a merge conflict.

stsievert · 2020-04-14T13:32:37Z

verbose is now a float as before, but I always cast to a float now (unless a bool is given). I've updated #528 (comment) to reflect this change.

stsievert · 2020-04-14T16:26:58Z

The CI is failing because contextlib.nullcontext is new in Python 3.7 (source). Do we want to upgrade the minimum Python version or use a NullContext in dask_ml._utils.py?

TomAugspurger · 2020-04-14T18:19:29Z

Ah, I didn't realize it was that new. I think defining something like this in dask_ml._compat

@contextlib.contextmanager
def nullcontext():
    yield

should suffice

stsievert · 2020-04-15T15:52:57Z

I've also deprecated the scores_per_fit parameter in IncrementalSearchCV; the name should be fits_per_score.

tests/model_selection/test_incremental.py

dask_ml/model_selection/_incremental.py

TomAugspurger · 2020-04-16T20:35:08Z

dask_ml/model_selection/_incremental.py

    ):
+        if scores_per_fit is not None and fits_per_score != 1:


Sorry, didn't realize this deprecation was in the __init__. This should be done in the fit. __init__ isn't supposed to do any validation.

This will also fix the issue you saw with the warnings in the doc build.

D'oh, of course. I've made that change, and removed the :okwarning:s. This PR should be ready merge now.

TomAugspurger · 2020-04-17T11:06:57Z

Thanks!

TomAugspurger and others added 19 commits November 6, 2018 21:24

typo [skip ci]

8fde1e8

Replace get with scheduler

c10ead1

Roll back changes to test_scheduler_param

e054a09

try pinning CPython

012a754

fix warnings

1b451ec

Specify dask array chunksizes

41fc7cf

Minor LogisticRegression updates

9e56de7

Bug-Fix in Polynomial-Features

d74cefd

parameters are not recognized during fitting

make transformer params more general

fecc2af

Replace get with scheduler

fd1d051

Roll back changes to test_scheduler_param

9a11b91

Bug-Fix in Polynomial-Features

dc15672

parameters are not recognized during fitting

make transformer params more general

f1af9b2

Add version number to conf.py

f9f7dc6

Use X.Y.Z version format

9b44789

Merge branch 'master' of https://github.com/dask/dask-ml

0f0c87d

ENH: adds verbose to model selection searches

df1b6a6

ENH: adds keyboard interruption in hyperparam opt search

8df51f2

DOC, TST: documents and tests verbosity in model selection

bde46b7

lint

77b3fc2

TomAugspurger reviewed Jun 24, 2019

View reviewed changes

dask_ml/model_selection/_incremental.py Outdated Show resolved Hide resolved

dask_ml/model_selection/_incremental.py Outdated Show resolved Hide resolved

stsievert added 2 commits June 24, 2019 11:28

MAINT: better display of logs

4ce01ff

Prints INFO logging to stdout

3d89d67

MAINT: revert try/except block

8000 fc8023c

MAINT: allow verbose=0

d59e7e6

stsievert added 9 commits April 14, 2020 17:11

nullcontext manager note

81a29e2

black

467e662

lint

29d1d90

note

975b6a1

note

d4e6509

MAINT: make private _IncrementalSearchCV

879d7ab

MAINT TST: easier debugging

a8c5ce4

MAINT: scores_per_fit => fits_per_score

01ab47c

Only warn until Jan 1, 2021

54cc8d0

API: add prefix to public API

d9e3fb3

stsievert commented Apr 15, 2020

View reviewed changes

tests/model_selection/test_incremental.py Outdated Show resolved Hide resolved

stsievert added 3 commits April 15, 2020 14:23

typo

f0fc62f

lint

85ad746

Remove datetime from test

ba5a1d9

stsievert mentioned this pull request Apr 15, 2020

DOC: clarify model selection issues #432

Merged

TomAugspurger reviewed Apr 16, 2020

View reviewed changes

dask_ml/model_selection/_incremental.py Show resolved Hide resolved

dask_ml/model_selection/_incremental.py Outdated Show resolved Hide resolved

DOC: doc fixups (consistent, depreceate, warnings

33addb0

TomAugspurger reviewed Apr 16, 2020

View reviewed changes

dask_ml/model_selection/_incremental.py Outdated Show resolved Hide resolved

no kwargs

04b6bcd

TomAugspurger reviewed Apr 16, 2020

View reviewed changes

Move deprecation to fit

ccd14eb

TomAugspurger approved these changes Apr 17, 2020

View reviewed changes

TomAugspurger merged commit 6fc6011 into dask:master Apr 17, 2020

stsievert mentioned this pull request Apr 29, 2020

IncrementalSearchCV has no logging or verbosity #486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: provide monitoring of model selection searches #528

ENH: provide monitoring of model selection searches #528

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: provide monitoring of model selection searches #528

ENH: provide monitoring of model selection searches #528

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!