8000 Merge branch 'master' into ctranslist · scikit-learn/scikit-learn@1fc61a5 · GitHub
[go: up one dir, main page]

Skip to content

Commit 1fc61a5

Browse files
committed
Merge branch 'master' into ctranslist
2 parents 5cca32e + 0c0a9e8 commit 1fc61a5

File tree

17 files changed

+168
-89
lines changed

17 files changed

+168
-89
lines changed

.circleci/config.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,21 @@ jobs:
6565
path: ~/log.txt
6666
destination: log.txt
6767

68+
pypy3:
69+
docker:
70+
- image: pypy:3-6.0.0
71+
steps:
72+
- restore_cache:
73+
keys:
74+
- pypy3-ccache-{{ .Branch }}
75+
- pypy3-ccache
76+
- checkout
77+
- run: ./build_tools/circle/build_test_pypy.sh
78+
- save_cache:
79+
key: pypy3-ccache-{{ .Branch }}-{{ .BuildNum }}
80+
paths:
81+
- ~/.ccache
82+
- ~/.cache/pip
6883

6984
deploy:
7085
docker:
@@ -89,6 +104,21 @@ workflows:
89104
jobs:
90105
- python3
91106
- python2
107+
- pypy3:
108+
filters:
109+
branches:
110+
only:
111+
- 0.20.X
92112
- deploy:
93113
requires:
94114
- python3
115+
pypy:
116+
triggers:
117+
- schedule:
118+
cron: "0 0 * * *"
119+
filters:
120+
branches:
121+
only:
122+
- master
123+
jobs:
124+
- pypy3

build_tools/circle/build_test_pypy.sh

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,16 @@ source pypy-env/bin/activate
1818
python --version
1919
which python
2020

21-
pip install --extra-index https://antocuni.github.io/pypy-wheels/ubuntu numpy==1.14.4 Cython pytest
21+
pip install --extra-index https://antocuni.github.io/pypy-wheels/ubun B41A tu numpy Cython pytest
2222
pip install "scipy>=1.1.0" sphinx numpydoc docutils
2323

2424
ccache -M 512M
2525
export CCACHE_COMPRESS=1
2626
export PATH=/usr/lib/ccache:$PATH
27+
export LOKY_MAX_CPU_COUNT="2"
2728

28-
pip install -e .
29+
pip install -vv -e .
2930

30-
make test
31+
python -m pytest sklearn/
32+
python -m pytest doc/sphinxext/
33+
python -m pytest $(find doc -name '*.rst' | sort)

conftest.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ def pytest_collection_modifyitems(config, items):
3232
skip_marker = pytest.mark.skip(
3333
reason='FeatureHasher is not compatible with PyPy')
3434
for item in items:
35-
if item.name == 'sklearn.feature_extraction.hashing.FeatureHasher':
35+
if item.name in (
36+
'sklearn.feature_extraction.hashing.FeatureHasher',
37+
'sklearn.feature_extraction.text.HashingVectorizer'):
3638
item.add_marker(skip_marker)
3739

3840
# Skip tests which require internet if the flag is provided

doc/developers/advanced_installation.rst

Lines changed: 10 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Building from source
3434

3535
Scikit-learn requires:
3636

37-
- Python (>= 2.7 or >= 3.4),
37+
- Python (>= 3.5),
3838
- NumPy (>= 1.8.2),
3939
- SciPy (>= 0.13.3).
4040

@@ -110,18 +110,11 @@ Linux
110110

111111
Installing from source requires you to have installed the scikit-learn runtime
112112
dependencies, Python development headers and a working C/C++ compiler.
113-
Under Debian-based operating systems, which include Ubuntu, if you have
114-
Python 2 you can install all these requirements by issuing::
115-
116-
sudo apt-get install build-essential python-dev python-setuptools \
117-
python-numpy python-scipy \
118-
libatlas-dev libatlas3-base
119-
120-
If you have Python 3::
121-
113+
Under Debian-based operating systems, which include Ubuntu::
114+
122115
sudo apt-get install build-essential python3-dev python3-setuptools \
123-
python3-numpy python3-scipy \
124-
libatlas-dev libatlas3-base
116+
python3-numpy python3-scipy \
117+
libatlas-dev libatlas3-base
125118

126119
On recent Debian and Ubuntu (e.g. Ubuntu 14.04 or later) make sure that ATLAS
127120
is used to provide the implementation of the BLAS and LAPACK linear algebra
@@ -190,9 +183,7 @@ PATH environment variable.
190183
32-bit Python
191184
-------------
192185

193-
For 32-bit python it is possible use the standalone installers for
194-
`microsoft visual c++ express 2008 <http://download.microsoft.com/download/A/5/4/A54BADB6-9C3F-478D-8657-93B3FC9FE62D/vcsetup.exe>`_
195-
for Python 2 or Microsoft Visual C++ Express 2010 for Python 3.
186+
For 32-bit Python use Microsoft Visual C++ Express 2010.
196187

197188
Once installed you should be able to build scikit-learn without any
198189
particular configuration by running the following command in the scikit-learn
@@ -211,34 +202,27 @@ The Windows SDKs include the MSVC compilers both for 32 and 64-bit
211202
architectures. They come as a ``GRMSDKX_EN_DVD.iso`` file that can be mounted
212203
as a new drive with a ``setup.exe`` installer in it.
213204

214-
- For Python 2 you need SDK **v7.0**: `MS Windows SDK for Windows 7 and .NET
215-
Framework 3.5 SP1
216-
<https://www.microsoft.com/en-us/download/details.aspx?id=18950>`_
217-
218-
- For Python 3 you need SDK **v7.1**: `MS Windows SDK for Windows 7 and .NET
205+
- For Python you need SDK **v7.1**: `MS Windows SDK for Windows 7 and .NET
219206
Framework 4
220207
<https://www.microsoft.com/en-us/download/details.aspx?id=8442>`_
221208

222209
Both SDKs can be installed in parallel on the same host. To use the Windows
223210
SDKs, you need to setup the environment of a ``cmd`` console launched with the
224-
following flags (at least for SDK v7.0)::
211+
following flags ::
225212

226213
cmd /E:ON /V:ON /K
227214

228215
Then configure the build environment with::
229216

230217
SET DISTUTILS_USE_SDK=1
231218
SET MSSdk=1
232-
"C:\Program Files\Microsoft SDKs\Windows\v7.0\Setup\WindowsSdkVer.exe" -q -version:v7.0
233-
"C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\SetEnv.cmd" /x64 /release
219+
"C:\Program Files\Microsoft SDKs\Windows\v7.1\Setup\WindowsSdkVer.exe" -q -version:v7.1
220+
"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /x64 /release
234221

235222
Finally you can build scikit-learn in the same ``cmd`` console::
236223

237224
python setup.py install
238225

239-
Replace ``v7.0`` by the ``v7.1`` in the above commands to do the same for
240-
Python 3 instead of Python 2.
241-
242226
Replace ``/x64`` by ``/x86`` to build for 32-bit Python instead of 64-bit
243227
Python.
244228

doc/developers/utilities.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,7 @@ Graph Routines
175175
Benchmarking
176176
------------
177177

178-
- :func:`bench.total_seconds` (back-ported from ``timedelta.total_seconds``
179-
in Python 2.7). Used in ``benchmarks/bench_glm.py``.
178+
- :func:`bench.total_seconds`: Used in ``benchmarks/bench_glm.py``.
180179

181180

182181
Testing Functions

doc/modules/feature_extraction.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -735,9 +735,9 @@ decide better::
735735
array([[1, 1, 1, 0, 1, 1, 1, 0],
736736
[1, 1, 0, 1, 1, 1, 0, 1]])
737737

738-
In the above example, ``'char_wb`` analyzer is used, which creates n-grams
738+
In the above example, ``char_wb`` analyzer is used, which creates n-grams
739739
only from characters inside word boundaries (padded with space on each
740-
side). The ``'char'`` analyzer, alternatively, creates n-grams that
740+
side). The ``char`` analyzer, alternatively, creates n-grams that
741741
span across words::
742742

743743
>>> ngram_vectorizer = CountVectorizer(analyzer='char_wb', ngram_range=(5, 5))

doc/other_distributions.rst

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,20 +36,13 @@ Arch Linux
3636

3737
Arch Linux's package is provided through the `official repositories
3838
<https://www.archlinux.org/packages/?q=scikit-learn>`_ as
39-
``python-scikit-learn`` for Python 3 and ``python2-scikit-learn`` for Python 2.
39+
``python-scikit-le 2851 arn`` for Python.
4040
It can be installed by typing the following command:
4141

4242
.. code-block:: none
4343
4444
# pacman -S python-scikit-learn
4545
46-
or:
47-
48-
.. code-block:: none
49-
50-
# pacman -S python2-scikit-learn
51-
52-
depending on the version of Python you use.
5346
5447
5548
NetBSD

examples/compose/plot_column_transformer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ def transform(self, posts):
8989
# Extract the subject & body
9090
('subjectbody', SubjectBodyExtractor()),
9191

92-
# Use C toolumnTransformer to combine the features from subject and body
92+
# Use ColumnTransformer to combine the features from subject and body
9393
('union', ColumnTransformer(
9494
[
9595
# Pulling features from the post's subject line (first column)

sklearn/cluster/optics_.py

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import numpy as np
1515

1616
from ..utils import check_array
17+
from ..utils import gen_batches, get_chunk_n_rows
1718
from ..utils.validation import check_is_fitted
1819
from ..neighbors import NearestNeighbors
1920
from ..base import BaseEstimator, ClusterMixin
@@ -395,8 +396,6 @@ def fit(self, X, y=None):
395396
# Start all points as 'unprocessed' ##
396397
self.reachability_ = np.empty(n_samples)
397398
self.reachability_.fill(np.inf)
398-
self.core_distances_ = np.empty(n_samples)
399-
self.core_distances_.fill(np.nan)
400399
# Start all points as noise ##
401400
self.labels_ = np.full(n_samples, -1, dtype=int)
402401

@@ -407,9 +406,7 @@ def fit(self, X, y=None):
407406
n_jobs=self.n_jobs)
408407

409408
nbrs.fit(X)
410-
self.core_distances_[:] = nbrs.kneighbors(X,
411-
self.min_samples)[0][:, -1]
412-
409+
self.core_distances_ = self._compute_core_distances_(X, nbrs)
413410
self.ordering_ = self._calculate_optics_order(X, nbrs)
414411

415412
indices_, self.labels_ = _extract_optics(self.ordering_,
@@ -425,6 +422,42 @@ def fit(self, X, y=None):
425422

426423
# OPTICS helper functions
427424

425+
def _compute_core_distances_(self, X, neighbors, working_memory=None):
426+
"""Compute the k-th nearest neighbor of each sample
427+
428+
Equivalent to neighbors.kneighbors(X, self.min_samples)[0][:, -1]
429+
but with more memory efficiency.
430+
431+
Parameters
432+
----------
433+
X : array, shape (n_samples, n_features)
434+
The data.
435+
neighbors : NearestNeighbors instance
436+
The fitted nearest neighbors estimator.
437+
working_memory : int, optional
438+
The sought maximum memory for temporary distance matrix chunks.
439+
When None (default), the value of
440+
``sklearn.get_config()['working_memory']`` is used.
441+
442+
Returns
443+
-------
444+
core_distances : array, shape (n_samples,)
445+
Distance at which each sample becomes a core point.
446+
Points which will never be core have a distance of inf.
447+
"""
448+
n_samples = len(X)
449+
core_distances = np.empty(n_samples)
450+
core_distances.fill(np.nan)
451+
452+
chunk_n_rows = get_chunk_n_rows(row_bytes=16 * self.min_samples,
453+
max_n_rows=n_samples,
454+
working_memory=working_memory)
455+
slices = gen_batches(n_samples, chunk_n_rows)
456+
for sl in slices:
457+
core_distances[sl] = neighbors.kneighbors(
458+
X[sl], self.min_samples)[0][:, -1]
459+
return core_distances
460+
428461
def _calculate_optics_order(self, X, nbrs):
429462
# Main OPTICS loop. Not parallelizable. The order that entries are
430463
# written to the 'ordering_' list is important!

0 commit comments

Comments
 (0)
0