8000 Merge branch 'master' into circle-noplot · scikit-learn/scikit-learn@2d001a6 · GitHub
[go: up one dir, main page]

Skip to content

Commit 2d001a6

Browse files
committed
Merge branch 'master' into circle-noplot
2 parents 7096f3d + c171561 commit 2d001a6

File tree

15 files changed

+101
-66
lines changed

15 files changed

+101
-66
lines changed

.gitattributes

Lines changed: 1 addition & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1 @@
1-
/sklearn/__check_build/_check_build.c -diff
2-
/sklearn/_isotonic.c -diff
3-
/sklearn/cluster/_dbscan_inner.cpp -diff
4-
/sklearn/cluster/_hierarchical.cpp -diff
5-
/sklearn/cluster/_k_means.c -diff
6-
/sklearn/cluster/_k_means_elkan.c -diff
7-
/sklearn/datasets/_svmlight_format.c -diff
8-
/sklearn/decomposition/_online_lda.c -diff
9-
/sklearn/decomposition/cdnmf_fast.c -diff
10-
/sklearn/ensemble/_gradient_boosting.c -diff
11-
/sklearn/feature_extraction/_hashing.c -diff
12-
/sklearn/linear_model/cd_fast.c -diff
13-
/sklearn/linear_model/sgd_fast.c -diff
14-
/sklearn/linear_model/sag_fast.c -diff
15-
/sklearn/metrics/pairwise_fast.c -diff
16-
/sklearn/neighbors/ball_tree.c -diff
17-
/sklearn/neighbors/kd_tree.c -diff
18-
/sklearn/svm/liblinear.c -diff
19-
/sklearn/svm/libsvm.c -diff
20-
/sklearn/svm/libsvm_sparse.c -diff
21-
/sklearn/tree/_tree.c -diff
22-
/sklearn/tree/_utils.c -diff
23-
/sklearn/utils/arrayfuncs.c -diff
24-
/sklearn/utils/graph_shortest_path.c -diff
25-
/sklearn/utils/lgamma.c -diff
26-
/sklearn/utils/_logistic_sigmoid.c -diff
27-
/sklearn/utils/murmurhash.c -diff
28-
/sklearn/utils/seq_dataset.c -diff
29-
/sklearn/utils/sparsefuncs_fast.c -diff
30-
/sklearn/utils/weight_vector.c -diff
1+
/doc/whats_new.rst merge=union

build_tools/circle/push_doc.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,11 @@ MSG="Pushing the docs to $dir/ for branch: $CIRCLE_BRANCH, commit $CIRCLE_SHA1"
2424

2525
cd $HOME
2626
if [ ! -d $DOC_REPO ];
27-
then git clone "git@github.com:scikit-learn/"$DOC_REPO".git";
27+
then git clone --depth 1 --no-checkout "git@github.com:scikit-learn/"$DOC_REPO".git";
2828
fi
2929
cd $DOC_REPO
30+
git config core.sparseCheckout true
31+
echo $dir > .git/info/sparse-checkout
3032
git checkout $CIRCLE_BRANCH
3133
git reset --hard origin/$CIRCLE_BRANCH
3234
git rm -rf $dir/ && rm -rf $dir/

circle.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ dependencies:
99
- ./build_tools/circle/build_doc.sh:
1010
timeout: 3600 # seconds
1111
test:
12-
# Grep error on the documentation
1312
override:
14-
- cat ~/log.txt && if grep -q "Traceback (most recent call last):" ~/log.txt; then false; else true; fi
13+
# override is needed otherwise nosetests is run by default
14+
- echo "Documentation has been built in the 'dependencies' step. No additional test to run"
1515
deployment:
1616
push:
1717
branch: /^master$|^[0-9]+\.[0-9]+\.X$/

doc/modules/clustering.rst

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -746,17 +746,18 @@ by black points below.
746746

747747
.. topic:: Implementation
748748

749-
The algorithm is non-deterministic, but the core samples will
750-
always belong to the same clusters (although the labels may be
751-
different). The non-determinism comes from deciding to which cluster a
752-
non-core sample belongs. A non-core sample can have a distance lower
753-
than ``eps`` to two core samples in different clusters. By the
749+
The DBSCAN algorithm is deterministic, always generating the same clusters
750+
when given the same data in the same order. However, the results can differ when
751+
data is provided in a different order. First, even though the core samples
752+
will always be assigned to the same clusters, the labels of those clusters
753+
will depend on the order in which those samples are encountered in the data.
754+
Second and more importantly, the clusters to which non-core samples are assigned
755+
can differ depending on the data order. This would happen when a non-core sample
756+
has a distance lower than ``eps`` to two core samples in different clusters. By the
754757
triangular inequality, those two core samples must be more distant than
755758
``eps`` from each other, or they would be in the same cluster. The non-core
756-
sample is assigned to whichever cluster is generated first, where
757-
the order is determined randomly. Other than the ordering of
758-
the dataset, the algorithm is deterministic, making the results relatively
759-
stable between runs on the same data.
759+
sample is assigned to whichever cluster is generated first in a pass
760+
through the data, and so the results will depend on the data ordering.
760761

761762
The current implementation uses ball trees and kd-trees
762763
to determine the neighborhood of points,

doc/modules/grid_search.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ distribution. After describing these tools we detail
4141

4242
Note that it is common that a small subset of those parameters can have a large
4343
impact on the predictive or computation performance of the model while others
44-
can be left to their default values. It is recommend to read the docstring of
44+
can be left to their default values. It is recommended to read the docstring of
4545
the estimator class to get a finer understanding of their expected behavior,
4646
possibly by reading the enclosed reference to the literature.
4747

doc/modules/model_evaluation.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,6 +1133,12 @@ are predicted. This is useful if you want to know how many top-scored-labels
11331133
you have to predict in average without missing any true one. The best value
11341134
of this metrics is thus the average number of true labels.
11351135

1136+
.. note::
1137+
1138+
Our implementation's score is 1 greater than the one given in Tsoumakas
1139+
et al., 2010. This extends it to handle the degenerate case in which an
1140+
instance has 0 true labels.
1141+
11361142
Formally, given a binary indicator matrix of the ground truth labels
11371143
:math:`y \in \left\{0, 1\right\}^{n_\text{samples} \times n_\text{labels}}` and the
11381144
score associated with each label
@@ -1236,6 +1242,12 @@ Here is a small example of usage of this function::
12361242
>>> label_ranking_loss(y_true, y_score)
12371243
0.0
12381244

1245+
1246+
.. topic:: References:
1247+
1248+
* Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In
1249+
Data mining and knowledge discovery handbook (pp. 667-685). Springer US.
1250+
12391251
.. _regression_metrics:
12401252

12411253
Regression metrics

doc/modules/model_persistence.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@ additional metadata should be saved along the pickled model:
8181
This should make it possible to check that the cross-validation score is in the
8282
same range as before.
8383

84+
Since a model internal representation may be different on two different
85+
architectures, dumping a model on one architecture and loading it on
86+
another architecture is not supported.
87+
8488
If you want to know more about these issues and explore other possible
8589
serialization methods, please refer to this
8690
`talk by Alex Gaynor <ht 6B28 tp://pyvideo.org/video/2566/pickles-are-for-delis-not-software>`_.

doc/whats_new.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ Enhancements
8585
do not set attributes on the estimator.
8686
:issue:`7533` by :user:`Ekaterina Krivich <kiote>`.
8787

88+
- For sparse matrices, :func:`preprocessing.normalize` with ``return_norm=True``
89+
will now raise a ``NotImplementedError`` with 'l1' or 'l2' norm and with norm 'max'
90+
the norms returned will be the same as for dense matrices (:issue:`7771`).
91+
By `Ang Lu <https://github.com/luang008>`_.
92+
8893
Bug fixes
8994
.........
9095

examples/classification/plot_lda_qda.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
"""
22
====================================================================
3-
Linear and Quadratic Discriminant Analysis with confidence ellipsoid
3+
Linear and Quadratic Discriminant Analysis with covariance ellipsoid
44
====================================================================
55
6-
Plot the confidence ellipsoids of each class and decision boundary
6+
This example plots the covariance ellipsoids of each class and
7+
decision boundary learned by LDA and QDA. The ellipsoids display
8+
the double standard deviation for each class. With LDA, the
9+
standard deviation is the same for all the classes, while each
10+
class has its own standard deviation with QDA.
711
"""
812
print(__doc__)
913

examples/datasets/plot_iris_dataset.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
# import some data to play with
3232
iris = datasets.load_iris()
3333
X = iris.data[:, :2] # we only take the first two features.
34-
Y = iris.target
34+
y = iris.target
3535

3636
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
3737
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
@@ -40,7 +40,7 @@
4040
plt.clf()
4141

4242
# Plot the training points
43-
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)
43+
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
4444
plt.xlabel('Sepal length')
4545
plt.ylabel('Sepal width')
4646

@@ -54,7 +54,7 @@
5454
fig = plt.figure(1, figsize=(8, 6))
5555
ax = Axes3D(fig, elev=-150, azim=110)
5656
X_reduced = PCA(n_components=3).fit_transform(iris.data)
57-
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y,
57+
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=y,
5858
cmap=plt.cm.Paired)
5959
ax.set_title("First three PCA directions")
6060
ax.set_xlabel("1st eigenvector")

0 commit comments

Comments
 (0)
0