8000 OPTICS plot fix by espg · Pull Request #11849 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

OPTICS plot fix #11849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 141 commits into from
Closed

OPTICS plot fix #11849

wants to merge 141 commits into from

Conversation

espg
Copy link
Contributor
@espg espg commented Aug 17, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes plot. Current plot doesn't include lines because np.full_like() defaults to integer and everything is set at zero. It looks like this:

image

This fix will make it look like this:

image

Any other comments?

Hopefully this will fix my contribution graph too. I didn't adhere to best git practices when I opened #1984 , so everything was in master instead of a separate branch. Once this is merged I should be able to open concurrent PR's in sklearn without things being broken (and/or showing unrelated commits from OPTICS).

espg and others added 30 commits May 21, 2013 17:26
Equivalent results to DBSCAN, but allows execution on arbitrarily large datasets. After initial construction, allows multiple 'scans' to quickly extract DBSCAN clusters at variable epsilon distances
Shows example usage of OPTICS to extract clustering structure
Mainly issues in long lines, long comments
OPTICS object, with fit method. Documentation updates.
extract method
new plot example that matches the updated API
updated n_cluster attribute with reruns of extract
removed scaling factor on first ‘fit’ run
should pass unit test now?
Scales eps to give stable results from first input distance. Extraction
above scaled eps is not allowed; extraction below scaled eps but
greater than input distance prints stability warning but still will run
clustering extraction. All distances below initial input distance are
stable; no warning printed
Fixed noise points from being initialized as type ‘core point’
Fixed initialization for first ‘fit’ call
Decoupled eps and eps_prime (deep copy)

Matched plot example to same random state as dbscan
Added second plot to show ‘extract’ usage
eps is not modified in init; kwargs fix
Why do I have to do this?
Fit now returns self; test fix for unit test fail in ball tree (asarray
problem)
labels to labels_
Temporary fix until balltree can be updated to deal with sparse
matrices.
Using ‘check_array’
Removed extraneous lines and comments (old commented out code has been
removed)
Checking module compatibility with current sklearn build
Added the unit tests in test_optics for fit, extract, and declaration.
Should bring coverage to ~100%. Additionally, fixed a small bug that
cropped up in the extract function during testing.
Removed unused imports, added to get warning.
Result should be 100% coverage
All public methods now have doc strings; forcing a rebuild of OPTICS so
that build tests pass (last round failed due to external module)
99% pep8 now… line 138 isn’t, but reads better with the long variable
names
Includes general description, discussion of algorithm output, and
comparison with DBSCAN. References and implementation notes are included
Following suggestion from jnothman for doing nneighbors queries enmass.
Added OPTICS to cluster init
espg added 25 commits May 22, 2018 19:26
narrative changes to tests. Normalized reachability distances for significant_min parameter. Cleaned up plots; small changes to documentation with optics_.py
also minor documentation updates
since labels are 0 indexed, max of labels is (1 - total number of clusters). We can't take len(set(clusters)) because of noise (will be 4, not 3).
Fixed float division, condensed variable names to be shorter, renamed bools to True and False, removed un-needed code block. Still need to add tests for cluster tree extraction :-(
coverage should be complete now; fixed minor bug; removed un-needed check.
also fixed documentation link
entry was supposed to be '1.0' not '10'; the test is supposed to posit 3 clusters, 2 of which are too small and are merged. Old version posited 4 clusters, two of which were merged as intended, and two of which were discarded (cluster merging requires one of the clusters to be large enough to be an independent cluster; with 4 instead of three, this case did not happen for either of the first two clusters).
merge conflict in clustering.rst in previous commit; this push updates the doctest values to current correct values, and resolves the conflict
Restructured documentation. Small unit test fixes. Added test to ensure clustering metrics between OPTICS dbscan_extract and DBSCAN are within 2% of each other.
renamed reachability_ordering --> ordering for consistency
changes unit tests to check for specific error message (instead of 'a' error); minor updates to documentation. This also fixes a bug in the extract_dbscan function whereby some core points were erroneously marked as periphery... this is fixed by reverting to a previous extraction code block that initalizes all points as core and then demotes noise and periphery points during the extract scan. Parameterized unit test.
New invarient test between optics and dbscan
Vectorized extract dbscan function to see if would improve performance of periphery point labeling; it did not, but the function is vectorized. Changed unit test with min_samples=1 to min_samples=3, as at min_samples=1 the test isn't meaningful (no noise is possible, all points are marked core). Parameterized parity test. Changed parity test to assert ~5% or better mismatch, instead of 5 points (this is needed for larger clusters, as the starting point mismatch effect scales with cluster size).
@agramfort
Copy link
Member

@espg you should be able to reset your master branch to upstream/master after backing up your master

git checkout master
git checkout -b master_bck
git checkout master
git fetch upstream
git reset --hard upstream/master
git checkout plotfix
git reset --hard upstream/master
# fix the script
git push origin plotfix -f

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @espg
The diff looks strange, so I'll push to master directly.

qinhanmin2014 added a commit that referenced this pull request Aug 30, 2018
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Sep 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0