10000 Merge tag '0.11' (theirs) into releases · seckcoder/scikit-learn@5b136ab · GitHub
[go: up one dir, main page]

Skip to content

Commit 5b136ab

Browse files
committed
Merge tag '0.11' (theirs) into releases
* tag '0.11': (1482 commits) DOC: updated testing instructions BUG: remove n_jobs=-1 from examples COSMIT typo in whatsnew RELEASE 0.11 FIX: wrong cover-package, misleading coverage as 100% ENH: update joblib FIX: bug in test_setup. Actually avoid multiprocessing now. DOC: image to graph utilities DOC: Feature extraction vs feature selection DOC: more readable title DOC: avoid 2 rows of images ENH: prevent multiprocessing in tests under Windows DOC: faster and more meaningful example DOC fix last docstring error. Don't remove redundant docstring. I dare you, I double dare you mother******! DOC: instructions on testing DOC more minor fixes DOC banner 14 duplication? DOC minor fixes to rst and image paths Revert "ENH: avoid an underflow" ENH: avoid an underflow ...
2 parents 55e3226 + 4ae44b0 commit 5b136ab

File tree

420 files changed

+60069
-18794
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

420 files changed

+60069
-18794
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
*~
44
.#*
55
*.swp
6+
*.swo
67
.DS_Store
78
build
89
sklearn/datasets/__config__.py
@@ -36,4 +37,3 @@ nips2010_pdf/
3637
*.nt.bz2
3738
*.tar.gz
3839
*.tgz
39-
joblib

AUTHORS.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,7 @@ People
5050

5151
* `Jake VanderPlas <http://www.astro.washington.edu/users/vanderplas/>`_
5252

53-
* `Alexandre Gramfort
54-
<http://www-sop.inria.fr/members/Alexandre.Gramfort/index.fr.html>`_
53+
* `Alexandre Gramfort <http://alexandre.gramfort.net>`_
5554

5655
* `Olivier Grisel <http://twitter.com/ogrisel>`_
5756

@@ -99,6 +98,8 @@ People
9998

10099
* `Andreas Müller <http://www.ais.uni-bonn.de/~amueller/>`_
101100

101+
* `Satra Ghosh <www.mit.edu/~satra>`_
102+
102103

103104
If I forgot anyone, do not hesitate to send me an email to
104105
fabian.pedregosa@inria.fr and I'll include you in the list.

COPYING

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
21< 741A /td>
New BSD License
32

4-
Copyright (c) 2007 - 2011 The scikit-learn developers.
3+
Copyright (c) 2007 - 2012 The scikit-learn developers.
54
All rights reserved.
65

76

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@ test-code: in
3232
$(NOSETESTS) -s sklearn
3333
test-doc:
3434
$(NOSETESTS) -s --with-doctest --doctest-tests --doctest-extension=rst \
35-
--doctest-fixtures=_fixture doc/ doc/modules/
35+
--doctest-extension=inc --doctest-fixtures=_fixture doc/ doc/modules/ \
36+
doc/developers doc/tutorial/basic doc/tutorial/statistical_inference
3637

3738
test-coverage:
3839
$(NOSETESTS) -s --with-coverage --cover-html --cover-html-dir=coverage \

README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. -*- mode: rst -*-
22
3-
About
4-
=====
3+
scikit-learn
4+
============
55

66
scikit-learn is a Python module for machine learning built on top of
77
SciPy and distributed under the 3-Clause BSD license.

benchmarks/bench_sgd_covertype.py renamed to benchmarks/bench_covertype.py

Lines changed: 53 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""
2-
================================
3-
Covertype dataset with dense SGD
4-
================================
2+
===========================
3+
Covertype dataset benchmark
4+
===========================
55
66
Benchmark stochastic gradient descent (SGD), Liblinear, and Naive Bayes, CART
77
(decision tree), RandomForest and Extra-Trees on the forest covertype dataset
@@ -40,10 +40,6 @@
4040
4141
[1] http://archive.ics.uci.edu/ml/datasets/Covertype
4242
43-
To run this example use your favorite python shell::
44-
45-
% ipython benchmark/bench_sgd_covertype.py
46-
4743
"""
4844
from __future__ import division
4945

@@ -57,6 +53,7 @@
5753
from time import time
5854
import os
5955
import numpy as np
56+
from optparse import OptionParser
6057

6158
from sklearn.svm import LinearSVC
6259
from sklearn.linear_model import SGDClassifier
@@ -65,6 +62,20 @@
6562
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
6663
from sklearn import metrics
6764

65+
op = OptionParser()
66+
op.add_option("--classifiers",
67+
dest="classifiers", default='liblinear,GaussianNB,SGD,CART',
68+
help="comma-separated list of classifiers to benchmark. "
69+
"default: %default. available: "
70+
"liblinear,GaussianNB,SGD,CART,ExtraTrees,RandomForest")
71+
72+
op.print_help()
73+
74+
(opts, args) = op.parse_args()
75+
if len(args) > 0:
76+
op.error("this script takes no arguments.")
77+
sys.exit(1)
78+
6879
######################################################################
6980
## Download the data, if not already on disk
7081
if not os.path.exists('covtype.data.gz'):
@@ -133,9 +144,9 @@
133144
print("%s %d (%d, %d)" % ("number of test samples:".ljust(25),
134145
X_test.shape[0], np.sum(y_test == 1),
135146
np.sum(y_test == -1)))
136-
print("")
137-
print("Training classifiers...")
138-
print("")
147+
148+
149+
classifiers = dict()
139150

140151

141152
######################################################################
@@ -159,41 +170,54 @@ def benchmark(clf):
159170
'dual': False,
160171
'tol': 1e-3,
161172
}
162-
liblinear_res = benchmark(LinearSVC(**liblinear_parameters))
163-
liblinear_err, liblinear_train_time, liblinear_test_time = liblinear_res
173+
classifiers['liblinear'] = LinearSVC(**liblinear_parameters)
164174

165175
######################################################################
166176
## Train GaussianNB model
167-
gnb_err, gnb_train_time, gnb_test_time = benchmark(GaussianNB())
177+
classifiers['GaussianNB'] = GaussianNB()
168178

169179
######################################################################
170180
## Train SGD model
171181
sgd_parameters = {
172182
'alpha': 0.001,
173183
'n_iter': 2,
174184
}
175-
sgd_err, sgd_train_time, sgd_test_time = benchmark(SGDClassifier(
176-
**sgd_parameters))
185+
classifiers['SGD'] = SGDClassifier( **sgd_parameters)
177186

178187
######################################################################
179188
## Train CART model
180-
cart_err, cart_train_time, cart_test_time = benchmark(
181-
DecisionTreeClassifier(min_split=5,
182-
max_depth=None))
189+
classifiers['CART'] = DecisionTreeClassifier(min_samples_split=5,
190+
max_depth=None)
183191

184192
######################################################################
185193
## Train RandomForest model
186-
rf_err, rf_train_time, rf_test_time = benchmark(
187-
RandomForestClassifier(n_estimators=20,
188-
min_split=5,
189-
max_depth=None))
194+
classifiers['RandomForest'] = RandomForestClassifier(n_estimators=20,
195+
min_samples_split=5,
196+
max_features=None,
197+
max_depth=None)
190198

191199
######################################################################
192200
## Train Extra-Trees model
193-
et_err, et_train_time, et_test_time = benchmark(
194-
ExtraTreesClassifier(n_estimators=20,
195-
min_split=5,
196-
max_depth=None))
201+
classifiers['ExtraTrees'] = ExtraTreesClassifier(n_estimators=20,
202+
min_samples_split=5,
203+
max_features=None,
204+
max_depth=None)
205+
206+
207+
selected_classifiers = opts.classifiers.split(',')
208+
for name in selected_classifiers:
209+
if name not in classifiers:
210+
op.error('classifier %r unknwon')
211+
sys.exit(1)
212+
213+
print("")
214+
print("Training Classifiers")
215+
print("====================")
216+
print("")
217+
err, train_time, test_time = {}, {}, {}
218+
for name in sorted(selected_classifiers):
219+
print("Training %s ..." % name)
220+
err[name], train_time[name], test_time[name] = benchmark(classifiers[name])
197221

198222
######################################################################
199223
## Print classification performance
@@ -212,12 +236,8 @@ def print_row(clf_type, train_time, test_time, err):
212236
print("%s %s %s %s" % ("Classifier ", "train-time", "test-time",
213237
"error-rate"))
214238
print("-" * 44)
215-
print_row("Liblinear", liblinear_train_time, liblinear_test_time,
216-
liblinear_err)
217-
print_row("GaussianNB", gnb_train_time, gnb_test_time, gnb_err)
218-
print_row("SGD", sgd_train_time, sgd_test_time, sgd_err)
219-
print_row("CART", cart_train_time, cart_test_time, cart_err)
220-
print_row("RandomForest", rf_train_time, rf_test_time, rf_err)
221-
print_row("Extra-Trees", et_train_time, et_test_time, et_err)
239+
240+
for name in sorted(selected_classifiers, key=lambda name: err[name]):
241+
print_row(name, train_time[name], test_time[name], err[name])
222242
print("")
223243
print("")
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Author: Mathieu Blondel <mathieu@mblondel.org>
2+
# License: BSD Style.
3+
import time
4+
5+
import pylab as pl
6+
7+
from sklearn.utils import check_random_state
8+
from sklearn.metrics.pairwise import pairwise_distances
9+
from sklearn.metrics.pairwise import pairwise_kernels
10+
11+
def plot(func):
12+
random_state = check_random_state(0)
13+
one_core = []
14+
multi_core = []
15+
sample_sizes = range(1000, 6000, 1000)
16+
17+
for n_samples in sample_sizes:
18+
X = random_state.rand(n_samples, 300)
19+
20+
start = time.time()
21+
func(X, n_jobs=1)
22+
one_core.append(time.time() - start)
23+
24+
start = time.time()
25+
func(X, n_jobs=-1)
26+
multi_core.append(time.time() - start)
27+
28+
pl.figure()
29+
pl.plot(sample_sizes, one_core, label="one core")
30+
pl.plot(sample_sizes, multi_core, label="multi core")
31+
pl.xlabel('n_samples')
32+
pl.ylabel('time')
33+
pl.title('Parallel %s' % func.__name__)
34+
pl.legend()
35+
36+
def euclidean_distances(X, n_jobs):
37+
return pairwise_distances(X, metric="euclidean", n_jobs=n_jobs)
38+
39+
def rbf_kernels(X, n_jobs):
40+
return pairwise_kernels(X, metric="rbf", n_jobs=n_jobs, gamma=0.1)
41+
42+
plot(euclidean_distances)
43+
plot(rbf_kernels)
44+
pl.show()

benchmarks/bench_plot_ward.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
from sklearn.cluster import Ward
1212

13-
ward = Ward(n_clusters=15)
13+
ward = Ward(n_clusters=3)
1414

1515
n_samples = np.logspace(.5, 3, 9)
1616
n_features = np.logspace(1, 3.5, 7)

benchmarks/bench_sgd_regression.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
from time import time
1919

2020
from sklearn.linear_model import Ridge, SGDRegressor, ElasticNet
21-
from sklearn.metrics import mean_square_error
21+
from sklearn.metrics import mean_squared_error
2222
from sklearn.datasets.samples_generator import make_regression
2323

2424
if __name__ == "__main__":
@@ -68,7 +68,7 @@
6868
clf = ElasticNet(alpha=alpha, rho=0.5, fit_intercept=False)
6969
tstart = time()
7070
clf.fit(X_train, y_train)
71-
elnet_results[i, j, 0] = mean_square_error(clf.predict(X_test),
71+
elnet_results[i, j, 0] = mean_squared_error(clf.predict(X_test),
7272
y_test)
7373
elnet_results[i, j, 1] = time() - tstart
7474

@@ -81,7 +81,7 @@
8181

8282
tstart = time()
8383
clf.fit(X_train, y_train)
84-
sgd_results[i, j, 0] = mean_square_error(clf.predict(X_test),
84+
sgd_results[i, j, 0] = mean_squared_error(clf.predict(X_test),
8585
y_test)
8686
sgd_results[i, j, 1] = time() - tstart
8787

@@ -90,7 +90,7 @@
9090
clf = Ridge(alpha=alpha, fit_intercept=False)
9191
tstart = time()
9292
clf.fit(X_train, y_train)
93-
ridge_results[i, j, 0] = mean_square_error(clf.predict(X_test),
93+
ridge_results[i, j, 0] = mean_squared_error(clf.predict(X_test),
9494
y_test)
9595
ridge_results[i, j, 1] = time() - tstart
9696

doc/about.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ citations to the following paper:
2424
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
2525
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
2626
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
27-
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay E.},
27+
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
2828
journal={Journal of Machine Learning Research},
2929
volume={12},
3030
pages={2825--2830},

doc/conf.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
# built documents.
7474
#
7575
# The short X.Y version.
76-
version = '0.10'
76+
version = '0.11'
7777
# The full version, including alpha/beta/rc tags.
7878
import sklearn
7979
release = sklearn.__version__
@@ -126,11 +126,12 @@
126126
# Theme options are theme-specific and customize the look and feel of a theme
127127
# further. For a list of options available for each theme, see the
128128
# documentation.
129-
#html_theme_options = {}
129+
html_theme_options = {'oldversion':False, 'collapsiblesidebar': True}
130130

131131
# Add any paths that contain custom themes here, relative to this directory.
132132
html_theme_path = ['themes']
133133

134+
134135
# The name for this set of Sphinx documents. If None, it defaults to
135136
# "<project> v<release> documentation".
136137
#html_title = None

doc/datasets/index.rst

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,8 @@ Sample generators
108108
In addition, scikit-learn includes various random sample generators that
109109
can be used to build artifical datasets of controled size and complexity.
110110

111-
.. image:: ../auto_examples/images/plot_random_dataset_1.png
112-
:target: ../auto_examples/plot_random_dataset.html
111+
.. image:: ../auto_examples/datasets/images/plot_random_dataset_1.png
112+
:target: ../auto_examples/datasets/plot_random_dataset.html
113113
:scale: 50
114114
:align: center
115115

@@ -125,6 +125,7 @@ can be used to build artifical datasets of controled size and complexity.
125125
make_friedman1
126126
make_friedman2
127127
make_friedman3
128+
make_hastie_10_2
128129
make_low_rank_matrix
129130
make_sparse_coded_signal
130131
make_sparse_uncorrelated
@@ -171,11 +172,11 @@ features::
171172
_`Faster API-compatible implementation`: https://github.com/mblondel/svmlight-loader
172173

173174

174-
.. include:: olivetti_faces.rst
175+
.. include:: olivetti_faces.inc
175176

176-
.. include:: twenty_newsgroups.rst
177+
.. include:: twenty_newsgroups.inc
177178

178-
.. include:: mldata.rst
179+
.. include:: mldata.inc
179180

180-
.. include:: labeled_faces.rst
181+
.. include:: labeled_faces.inc
181182

0 commit comments

Comments
 (0)
0