8000 Merge branch 'master' into circle-noplot · scikit-learn/scikit-learn@2d001a6 · GitHub
[go: up one dir, main page]

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 2d001a6

Browse files
committed
Merge branch 'master' into circle-noplot
2 parents 7096f3d + c171561 commit 2d001a6

File tree

15 files changed

+101
-66
lines changed

15 files changed

+101
-66
lines changed

.gitattributes

Lines changed: 1 addition & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1 @@
1-
/sklearn/__check_build/_check_build.c -diff
2-
/sklearn/_isotonic.c -diff
3-
/sklearn/cluster/_dbscan_inner.cpp -diff
4-
/sklearn/cluster/_hierarchical.cpp -diff
5-
/sklearn/cluster/_k_means.c -diff
6-
/sklearn/cluster/_k_means_elkan.c -diff
7-
/sklearn/datasets/_svmlight_format.c -diff
8-
/sklearn/decomposition/_online_lda.c -diff
9-
/sklearn/decomposition/cdnmf_fast.c -diff
10-
/sklearn/ensemble/_gradient_boosting.c -diff
11-
/sklearn/feature_extraction/_hashing.c -diff
12-
/sklearn/linear_model/cd_fast.c -diff
13-
/sklearn/linear_model/sgd_fast.c -diff
14-
/sklearn/linear_model/sag_fast.c -diff
15-
/sklearn/metrics/pairwise_fast.c -diff
16-
/sklearn/neighbors/ball_tree.c -diff
17-
/sklearn/neighbors/kd_tree.c -diff
18-
/sklearn/svm/liblinear.c -diff
19-
/sklearn/svm/libsvm.c -diff
20-
/sklearn/svm/libsvm_sparse.c -diff
21-
/sklearn/tree/_tree.c -diff
22-
/sklearn/tree/_utils.c -diff
23-
/sklearn/utils/arrayfuncs.c -diff
24-
/sklearn/utils/graph_shortest_path.c -diff
25-
/sklearn/utils/lgamma.c -diff
26-
/sklearn/utils/_logistic_sigmoid.c -diff
27-
/sklearn/utils/murmurhash.c -diff
28-
/sklearn/utils/seq_dataset.c -diff
29-
/sklearn/utils/sparsefuncs_fast.c -diff
30-
/sklearn/utils/weight_vector.c -diff
1+
/doc/whats_new.rst merge=union

build_tools/circle/push_doc.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,11 @@ MSG="Pushing the docs to $dir/ for branch: $CIRCLE_BRANCH, commit $CIRCLE_SHA1"
2424

2525
cd $HOME
2626
if [ ! -d $DOC_REPO ];
27-
then git clone "git@github.com:scikit-learn/"$DOC_REPO".git";
27+
then git clone --depth 1 --no-checkout "git@github.com:scikit-learn/"$DOC_REPO".git";
2828
fi
2929
cd $DOC_REPO
30+
git config core.sparseCheckout true
31+
echo $dir > .git/info/sparse-checkout
3032
git checkout $CIRCLE_BRANCH
3133
git reset --hard origin/$CIRCLE_BRANCH
3234
git rm -rf $dir/ && rm -rf $dir/

circle.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ dependencies:
99
- ./build_tools/circle/build_doc.sh:
1010
timeout: 3600 # seconds
1111
test:
12-
# Grep error on the documentation
1312
override:
14-
- cat ~/log.txt && if grep -q "Traceback (most recent call last):" ~/log.txt; then false; else true; fi
13+
# override is needed otherwise nosetests is run by default
14+
- echo "Documentation has been 6D38 built in the 'dependencies' step. No additional test to run"
1515
deployment:
1616
push:
1717
branch: /^master$|^[0-9]+\.[0-9]+\.X$/

doc/modules/clustering.rst

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -746,17 +746,18 @@ by black points below.
746746

747747
.. topic:: Implementation
748748

749-
The algorithm is non-deterministic, but the core samples will
750-
always belong to the same clusters (although the labels may be
751-
different). The non-determinism comes from deciding to which cluster a
752-
non-core sample belongs. A non-core sample can have a distance lower
753-
than ``eps`` to two core samples in different clusters. By the
749+
The DBSCAN algorithm is deterministic, always generating the same clusters
750+
when given the same data in the same order. However, the results can differ when
751+
data is provided in a different order. First, even though the core samples
752+
will always be assigned to the same clusters, the labels of those clusters
753+
will depend on the order in which those samples are encountered in the data.
754+
Second and more importantly, the clusters to which non-core samples are assigned
755+
can differ depending on the data order. This would happen when a non-core sample
756+
has a distance lower than ``eps`` to two core samples in different clusters. By the
754757
triangular inequality, those two core samples must be more distant than
755758
``eps`` from each other, or they would be in the same cluster. The non-core
756-
sample is assigned to whichever cluster is generated first, where
757-
the order is determined randomly. Other than the ordering of
758-
the dataset, the algorithm is deterministic, making the results relatively
759-
stable between runs on the same data.
759+
sample is assigned to whichever cluster is generated first in a pass
760+
through the data, and so the results will depend on the data ordering.
760761

761762
The current implementation uses ball trees and kd-trees
762763
to determine the neighborhood of points,

doc/modules/grid_search.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ distribution. After describing these tools we detail
4141

4242
Note that it is common that a small subset of those parameters can have a large
4343
impact on the predictive or computation performance of the model while others
44-
can be left to their default values. It is recommend to read the docstring of
44+
can be left to their default values. It is recommended to read the docstring of
4545
the estimator class to get a finer understanding of their expected behavior,
4646
possibly by reading the enclosed reference to the literature.
4747

doc/modules/model_evaluation.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,6 +1133,12 @@ are predicted. This is useful if you want to know how many top-scored-labels
11331133
you have to predict in average without missing any true one. The best value
11341134
of this metrics is thus the average number of true labels.
11351135

1136+
.. note::
1137+
1138+
Our implementation's score is 1 greater than the one given in Tsoumakas
1139+
et al., 2010. This extends it to handle the degenerate case in which an
1140+
instance has 0 true labels.
1141+
11361142
Formally, given a binary indicator matrix of the ground truth labels
11371143
:math:`y \in \left\{0, 1\right\}^{n_\text{samples} \times n_\text{labels}}` and the
11381144
score associated with each label
@@ -1236,6 +1242,12 @@ Here is a small example of usage of this function::
12361242
>>> label_ranking_loss(y_true, y_score)
12371243
0.0
12381244

1245+
1246+
.. topic:: References:
1247+
1248+
* Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In
1249+
Data mining and knowledge discovery handbook (pp. 667-685). Springer US.
1250+
12391251
.. _regression_metrics:
12401252

12411253
Regression metrics

doc/modules/model_persistence.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@ additional metadata should be saved along the pickled model:
8181
This should make it possible to check that the cross-validation score is in the
8282
same range as before.
8383

84+
Since a model internal representation may be different on two different
85+
architectures, dumping a model on one architecture and loading it on
86+
another architecture is not supported.
87+
8488
If you want to know more about these issues and explore other possible
8589
serialization methods, please refer to this
8690
`talk by Alex Gaynor <http://pyvideo.org/video/2566/pickles-are-for-delis-not-software>`_.

doc/whats_new.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ Enhancements
8585
do not set attributes on the estimator.
8686
:issue:`7533` by :user:`Ekaterina Krivich <kiote>`.
8787

88+
- For sparse matrices, :func:`preprocessing.normalize` with ``return_norm=True``
89+
will now raise a ``NotImplementedError`` with 'l1' or 'l2' norm and with norm 'max'
90+
the norms returned will be the same as for dense matrices (:issue:`7771`).
91+
By `Ang Lu <https://github.com/luang008>`_.
92+
8893
Bug fixes
8994
.........
9095

examples/classification/plot_lda_qda.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
"""
22
====================================================================
3-
Linear and Quadratic Discriminant Analysis with confidence ellipsoid
3+
Linear and Quadratic Discriminant Analysis with covariance ellipsoid
44
====================================================================
55
6-
Plot the confidence ellipsoids of each class and decision boundary
6+
This example plots the covariance ellipsoids of each class and
7+
decision boundary learned by LDA and QDA. The ellipsoids display
8+
the double standard deviation for each class. With LDA, the
9+
standard deviation is the same for all the classes, while each
10+
class has its own standard deviation with QDA.
711
"""
812
print(__doc__)
913

examples/datasets/plot_iris_dataset.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
# import some data to play with
3232
iris = datasets.load_iris()
3333
X = iris.data[:, :2] # we only take the first two features.
34-
Y = iris.target
34+
y = iris.target
3535

3636
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
3737
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
@@ -40,7 +40,7 @@
4040
plt.clf()
4141

4242
# Plot the training points
43-
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)
43+
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
4444
plt.xlabel('Sepal length')
4545
plt.ylabel('Sepal width')
4646

@@ -54,7 +54,7 @@
5454
fig = plt.figure(1, figsize=(8, 6))
5555
ax = Axes3D(fig, elev=-150, azim=110)
5656
X_reduced = PCA(n_components=3).fit_transform(iris.data)
57-
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y,
57+
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=y,
5858
cmap=plt.cm.Paired)
5959
ax.set_title("First three PCA directions")
6060
ax.set_xlabel("1st eigenvector")

0 commit comments

Comments
 (0)
0