8000 [MRG+1] DOC Cleaning up what's new for 0.20 by jnothman · Pull Request #11734 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] DOC Cleaning up what's new for 0.20 #11734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 7, 2018
Merged

Conversation

jnothman
Copy link
Member
@jnothman jnothman commented Aug 2, 2018

TODO

  1. put things under appropriate headings
  2. copy editing
  3. reordering for improved information structure?
  4. identify omissions from commit history (help welcome, let me know)
    - [x] July 13, 2017 (0.19 release) - August 31, 2017 done in [MRG+1] DOC Cleaning up what's new for 0.20 #11734 (comment)

@jnothman
Copy link
Member Author
jnothman commented Aug 3, 2018

I've lazily decided to just group entries by their module (at least in Enhancements, Bug fixes, Api changes) where possible. A more coarse-grained scheme was annoying to maintain.

Here's some automation:

import sys
import re
from collections import defaultdict

text = sys.stdin.read()

bucketed = defaultdict(list)

for entry in re.split('\n(?=- )', text.strip()):
    modules = re.findall(r':(?:func|meth|mod|class|obj):`(?:[^<`]*<)?(?:sklearn.)?([a-z]\w+)',
                         entry)
    modules = set(modules)
    if len(modules) > 1:
        key = 'Multiple modules'
    elif modules:
        key = ':mod:`%s`' % next(iter(modules))
    else:
        key = 'Miscellaneous'
    bucketed[key].append(entry)
    entry = entry.strip() + '\n'


for key, bucket in sorted(bucketed.items()):
    print(key)
    print()
    print('\n'.join(bucket))

@jnothman
Copy link
Member Author
jnothman commented Aug 3, 2018

I hope others don't mind that.

@rth
Copy link
Member
rth commented Aug 3, 2018

identify omissions from commit history (help welcome, let me know)

That's 1006 commits to review, sigh, maybe splitting it by month and multiple people could help...

@rth
Copy link
Member
rth commented Aug 3, 2018

Possible omissions from the git history for July 13 2017 (first commit after 0.19 release) - 31 August 2017 period (listed with git log 0.19.X..master)

:mod:`preprocessing`

- All scalers (i.e. :func:`preprocessing.scale`, :func:`preprocessing.minmax_scale`,
  :func:`preprocessing.maxabs_scale`, and :func:`preprocessing.robust_scale`)
  accept 1D arrays. :issue:`9596` by :user:`Guillaume Lemaitre <glemaitre>`. 

:mod:`metrics`

- Deprecate ``size_threshold`` parameter in :func:`metrics.manhattan_distances`.
  :issue:`9295` by :user:`Pravar D Mahajan <pravarmahajan>`.

:mod:`semi_supervised`

- Increased the default ``max_iter`` in :class:`semi_supervised.LabelPropagation` 
  from 30 to 1000 to reach better convergence.
  :issue:`9441` by :user:`Utkarsh Upadhyay <musically-ut>`.

:mod:`pipeline`

- Fixed a bug in :class:`pipeline.Pipeline` due to ``steps`` parameter being a tuple.
  It is now always a list. :issue:`9604` by `Joris Van den Bossche`_.

I addition I am not sure if following PRs (also from the same time range) should have an entry,

@rth
Copy link
Member
rth commented Aug 3, 2018

I've lazily decided to just group entries by their module (at least in Enhancements, Bug fixes, Api changes) where possible. A more coarse-grained scheme was annoying to maintain.

+1 I think it might also be readable, so users don't have to look into 3 separate sections to discover changes about a particular estimator.

@jnothman
Copy link
Member Author
jnothman commented Aug 4, 2018

Oh, I wasn't talking about removing the distinction between Enhancements / Bug Fixes / API Changes (only between Linear and Ensembles etc.). Do you really think we should do that?

@jnothman
Copy link
Member Author
jnothman commented Aug 4, 2018

That's 1006 commits to review, sigh, maybe splitting it by month and multiple people could help...

No, you do it with a diff!

Something like:

diff <(
    git log --oneline 0.19.1..master |
        grep -wv -e DOC -e MNT -e BLD -e COSMIT -e EXA -e examples -e example -e minor |
        grep -o '(#[0-9][0-9]\+)$' |
        grep -o '[0-9]\+' |
        sort) <(
    cat doc/whats_new/v0.20.rst |
        grep -o 'issue:`[0-9]\+`' |
        grep -o '[0-9]\+' | sort) |
    grep '<' |
    cut -c3- |
    sed 's/.*/--grep (#&)/g' |
    xargs git log

Though this is still 420 commits...

Ideally, this would also check, for any PRs which fail to match what's new, whether the issue number closed by those PRs are also absent in what's new...

@jnothman
Copy link
Member Author
jnothman commented Aug 6, 2018

@rth
Copy link
Member
rth commented Aug 6, 2018

No, you do it with a diff!

Right, that is nice

Check it out: http://scikit-learn.org/circle?30718/whats_new/v0.20.html

I thinks the output looks very pretty!

@jnothman
Copy link
Member Author
jnothman commented Aug 6, 2018 via email

@rth
Copy link
Member
rth commented Aug 6, 2018

http://scikit-learn.org/circle?30718/whats_new/v0.20.html

I was looking for an opinion on whether it is more useful or accessible, or
at least more maintainable than the status quo,

I am +1 on this change, I do think it helps readability, though most other packages I looked at (scipy, pandas) seems to have separate sections for deprecated/bug fixes, so maybe users would be more used to it.

Maybe @glemaitre @qinhanmin2014 or @jorisvandenbossche would have a second option?

@qinhanmin2014
Copy link
Member

Great work, though seems that many packages (AFAIK) are using separate sections.
There's still a few formatting issues. Feel free to resolve them or we can merge as it is and I'll push a commit to take care of them.

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can first merge this one and then identify omissions.

@qinhanmin2014 qinhanmin2014 changed the title DOC WIP Cleaning up what's new for 0.20 [MRG+1] DOC Cleaning up what's new for 0.20 Aug 6, 2018
@qinhanmin2014
Copy link
Member

ping @rth can we merge here? I'll take care of a few formatting issues in the PR.

@rth
Copy link
Member
rth commented Aug 6, 2018

Maybe would wait a bit for other comments (a day or so), as this is potentially a significant refactoring of the release notes, then merge? I'm +1 for merging generally. It will create merge conflicts for some existing PRs though.

@jnothman
Copy link
Member Author
jnothman commented Aug 7, 2018

A nice thing about this design is that it's relatively easy to resolve such conflicts.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rth
Copy link
Member
rth commented Aug 7, 2018

Merging, thanks @jnothman !

The omissions and a few formatting issues could be fixed in a subsequent PR to make diff more readable as suggested in #11734 (review)

@rth rth merged commit fc2503f into scikit-learn:master Aug 7, 2018
@amueller
Copy link
Member
amueller commented Aug 7, 2018

sorry coming late to the party. I think it would be great to have an overview of changed modules with links to the sections (i.e. have table of content). Right now there's a lot of scrolling.
And the "changed models" should probably link to the description of the actual changes.

@jnothman
Copy link
Member Author
jnothman commented Aug 7, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0