8000 Feature Request - Parallel Coordinates Plot for GridSearch result analysis · Issue #24281 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Feature Request - Parallel Coordinates Plot for GridSearch result analysis #24281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
julien-blanchon opened this issue Aug 27, 2022 · 4 comments
Labels
module:inspection Needs Decision - Include Feature Requires decision regarding including feature New Feature

Comments

@julien-blanchon
Copy link
julien-blanchon commented Aug 27, 2022

Describe the workflow you want to enable

GridSearch result are hard to analyze expecially when param_grid is very large.

The current documentation show usages of:

  • matrix_plot/pivot. That can visualize the relationship beetween 2 params for 1 metrics (2D).
  • line plot. Each line represent a 1D (1 params vs 1 metrics) relationship and by taking multiple line when can visualize at least 2D relationship, but this became mess for more.
  • box plot. Same conclusion as for line plot but using box instead.

None of these plot can represent nicely more then a 2D relationship. 3D, 4D and more can be visualize by making a bunch of basic 2D plot, but the number plot will easily became huge and we lost relationship beetween some params. This get even worse with RandomizedSearchCV as the params are not evenly compute.

One common way to represent N-D relationship without loss is to use Parallel Coordinates Plot:

Example of Parallel Coordinates Plot of the proposed implementation

The current implementation of PC Plot are:

  • Plotly express. But it don't integrate well with the param_grid format. Plus is not matplotlib compatible.
  • Tensorboard and deep learning experiement tracker (wandb ...). Same don't integrate weel, not matplotlib stack and need even more conversion work and a web server (or cloud account :()
  • Handmade matplotlib implementation. Matplotlib compatible, but at the time we don't have any easy to use implementation from common package. + with param_grid format compatibility.

Describe your proposed solution

I presently use my own implementation of matplotlib PC Plot (see the image above) for my work.
The current interface is:

def plot_parallel_coordinates(
        grid_model: GridSearchCV, 
        params: list[str] = ["params"],
        scoring: str = ["mean_test_score"]
        cmap : Optional[str] = None,
        ax: Optional[plt.Axes] = None) -> plt.Axes

If think the community could benefith from using a common and scikit-learn integrated implementation (for example in metrics.ParallelCoordinatesPlot or inspection.ParallelCoordinatesPlot and with plot_parallel_coordinates function)

Describe alternatives you've considered, if relevant

No response

Additional context

Please give 👍 if you think this could be benefith, so we can start discuss implementation.

@julien-blanchon julien-blanchon added Needs Triage Issue requires triage New Feature labels Aug 27, 2022
@morganmcg1
Copy link

Hey @julien-blanchon! I work at wandb, curious what you meant by:

deep learning experiement tracker (wandb ...). Same don't integrate weel

I think it should be possible to create your own parallel coordinates plot with wandb no? Or is it the sklearn & wandb integration that isn't working for you? Happy to see if I can help here :)

@thomasjpfan
Copy link
Member

During the triaging meeting, we decide that this feature is interesting, but not high priority. We do welcome upvotes on this issue to gauge interest. On this note, the PR #23740 has an example that demonstrates how to use plotly's parallel_coordinates with grid search results.

For me, I think parallel coordinate plots work well only if it is interactive, which means depending on a library such as plotly. A matplotlib version would work within scikit-learn's current constraints, but it creates a static image. There are ways to make matplotlib interactive, but it only works for specific backends.

@thomasjpfan thomasjpfan added Needs Decision - Include Feature Requires decision regarding including feature module:inspection and removed Needs Triage Issue requires triage labels Sep 9, 2022
@julien-blanchon
Copy link
Author

Hey @julien-blanchon! I work at wandb, curious what you meant by:

deep learning experiement tracker (wandb ...). Same don't integrate weel

I think it should be possible to create your own parallel coordinates plot with wandb no? Or is it the sklearn & wandb integration that isn't working for you? Happy to see if I can help here :)

Hi @morganmcg1, sklearn wandb integration work well ! I was talking about the Hyperparameter Sweeps integrated parallel coordinates chart (https://wandb.ai/site/articles/introduction-hyperparameter-sweeps). It doest the jobs but I need "extensive" work to make it work and you need wandb (which a bunch of other things) anyway to visualize your parallel plot. My proposition was more like adding a small function as a sklearn plot to easily generate matplotlib/jpeg plot

@julien-blanchon
Copy link
Author

Hi @thomasjpfan, I'm happy to ear news !
The PR #23740 has an example is excellent solution. However it's still plotly. I wanted to make a matplotlib solution at the first place, because it's way more in "sklearn" philosophy (as far as I know). And intergrate well with current workflow, like making a multiple matplotlib axes and then passing them out in differents function arguments.

For me, I think parallel coordinate plots work well only if it is interactive, which means depending on a library such as plotly. A matplotlib version would work within scikit-learn's current constraints, but it creates a static image. There are ways to make matplotlib interactive, but it only works for specific backends.

I agree with you, in this case interactivity help a lot. But even as a static image parallel plot could be extensivily usefull. I'm not a big fan ether of using matplotlib interactive backends.

Anyway in case of increasing interest, I have some code snippet for that and I will be quite happy to help with this issues ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:inspection Needs Decision - Include Feature Requires decision regarding including feature New Feature
Projects
Status: Discussion
Development

No branches or pull requests

3 participants
0