8000 Merge branch 'bayesDocstring' of https://github.com/qdeffense/scikit-… · qdeffense/scikit-learn@cc5f94f · GitHub
[go: up one dir, main page]

Skip to content

Commit cc5f94f

Browse files
committed
Merge branch 'bayesDocstring' of https://github.com/qdeffense/scikit-learn into bayesDocstring
2 parents b94c3a6 + 1dd3a6f commit cc5f94f

File tree

113 files changed

+1974
-759
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+1974
-759
lines changed

README.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,10 @@ scikit-learn requires:
5555
**Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.**
5656
scikit-learn 0.21 and later require Python 3.5 or newer.
5757

58-
Scikit-learn plotting capabilities (i.e., functions start with "plot_") require
59-
Matplotlib (>= 1.5.1). For running the examples Matplotlib >= 1.5.1 is
60-
required. A few examples require scikit-image >= 0.12.3, a few examples require
61-
pandas >= 0.18.0.
58+
Scikit-learn plotting capabilities (i.e., functions start with "plot_"
59+
and classes end with "Display") require Matplotlib (>= 1.5.1). For running the
60+
examples Matplotlib >= 1.5.1 is required. A few examples require
61+
scikit-image >= 0.12.3, a few examples require pandas >= 0.18.0.
6262

6363
User installation
6464
~~~~~~~~~~~~~~~~~

benchmarks/bench_covertype.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ def load_data(dtype=np.float32, order='C', random_state=13):
101101
'ExtraTrees': ExtraTreesClassifier(n_estimators=20),
102102
'RandomForest': RandomForestClassifier(n_estimators=20),
103103
'CART': DecisionTreeClassifier(min_samples_split=5),
104-
'SGD': SGDClassifier(alpha=0.001, max_iter=1000, tol=1e-3),
104+
'SGD': SGDClassifier(alpha=0.001),
105105
'GaussianNB': GaussianNB(),
106106
'liblinear': LinearSVC(loss="l2", penalty="l2", C=1000, dual=False,
107107
tol=1e-3),

benchmarks/bench_sparsify.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def sparsity_ratio(X):
7777
print("test data sparsity: %f" % sparsity_ratio(X_test))
7878

7979
###############################################################################
80-
clf = SGDRegressor(penalty='l1', alpha=.2, fit_intercept=True, max_iter=2000,
80+
clf = SGDRegressor(penalty='l1', alpha=.2, max_iter=2000,
8181
tol=None)
8282
clf.fit(X_train, y_train)
8383
print("model sparsity: %f" % sparsity_ratio(clf.coef_))

build_tools/circle/build_doc.sh

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -117,11 +117,11 @@ export PATH="/usr/lib/ccache:$MINICONDA_PATH/bin:$PATH"
117117
ccache -M 512M
118118
export CCACHE_COMPRESS=1
119119

120-
# Configure the conda environment and put it in the path using the
121-
# provided versions
122-
123-
# Adds older packages for python 3.5
124-
if [[ "$PYTHON_VERSION" == "3.5" ]]; then
120+
# Old packages coming from the 'free' conda channel have been removed but we
121+
# are using them for our min-dependencies doc generation. See
122+
# https://www.anaconda.com/why-we-removed-the-free-channel-in-conda-4-7/ for
123+
# more details.
124+
if [[ "$CIRCLE_JOB" == "doc-min-dependencies" ]]; then
125125
conda config --set restore_free_channel true
126126
fi
127127

@@ -130,10 +130,10 @@ conda create -n $CONDA_ENV_NAME --yes --quiet python="${PYTHON_VERSION:-*}" \
130130
cython="${CYTHON_VERSION:-*}" pytest coverage \
131131
matplotlib="${MATPLOTLIB_VERSION:-*}" sphinx=2.1.2 pillow \
132132
scikit-image="${SCIKIT_IMAGE_VERSION:-*}" pandas="${PANDAS_VERSION:-*}" \
133-
joblib
133+
joblib memory_profiler
134134

135135
source activate testenv
136-
pip install "sphinx-gallery>=0.2,<0.3"
136+
pip install sphinx-gallery==0.3.1
137137
pip install numpydoc==0.9
138138

139139
# Build and install scikit-learn in dev mode

doc/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,7 @@
250250
sphinx_gallery_conf = {
251251
'doc_module': 'sklearn',
252252
'backreferences_dir': os.path.join('modules', 'generated'),
253+
'show_memory': True,
253254
'reference_url': {
254255
'sklearn': None}
255256
}

doc/developers/contributing.rst

Lines changed: 82 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,33 @@ Finally, follow the formatting rules below to make it consistently good:
603603
SelectKBest : Select features based on the k highest scores.
604604
SelectFpr : Select features based on a false positive rate test.
605605

606+
* When documenting the parameters and attributes, here is a list of some
607+
well-formatted examples::
608+
609+
n_clusters : int, default=3
610+
The number of clusters detected by the algorithm.
611+
612+
some_param : {'hello', 'goodbye'}, bool or int, default=True
613+
The parameter description goes here, which can be either a string
614+
literal (either `hello` or `goodbye`), a bool, or an int. The default
615+
value is True.
616+
617+
array_parameter : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) or (n_samples,)
618+
This parameter accepts data in either of the mentioned forms, with one
619+
of the mentioned shapes. The default value is
620+
`np.ones(shape=(n_samples,))`.
621+
622+
In general have the following in mind:
623+
624+
1. Use Python basic types. (``bool`` instead of ``boolean``)
625+
2. Use parenthesis for defining shapes: ``array-like of shape (n_samples,)``
626+
or ``array-like of shape (n_samples, n_features)``
627+
3. For strings with multiple options, use brackets:
628+
``input: {'log', 'squared', 'multinomial'}``
629+
4. 1D or 2D data can be a subset of
630+
``{array-like, ndarray, sparse matrix, dataframe}``. Note that ``array-like``
631+
can also be a ``list``, while ``ndarray`` is explicitly only a ``numpy.ndarray``.
632+
606633
* For unwritten formatting rules, try to follow existing good works:
607634

608635
* For "References" in docstrings, see the Silhouette Coefficient
@@ -912,8 +939,8 @@ In the following example, k is deprecated and renamed to n_clusters::
912939

913940
import warnings
914941

915-
def example_function(n_clusters=8, k='not_used'):
916-
if k != 'not_used':
942+
def example_function(n_clusters=8, k='deprecated'):
943+
if k != 'deprecated':
917944
warnings.warn("'k' was renamed to n_clusters in version 0.13 and "
918945
"will be removed in 0.15.", DeprecationWarning)
919946
n_clusters = k
@@ -923,12 +950,12 @@ When the change is in a class, we validate and raise warning in ``fit``::
923950
import warnings
924951

925952
class ExampleEstimator(BaseEstimator):
926-
def __init__(self, n_clusters=8, k='not_used'):
953+
def __init__(self, n_clusters=8, k='deprecated'):
927954
self.n_clusters = n_clusters
928955
self.k = k
929956

930957
def fit(self, X, y):
931-
if self.k != 'not_used':
958+
if self.k != 'deprecated':
932959
warnings.warn("'k' was renamed to n_clusters in version 0.13 and "
933960
"will be removed in 0.15.", DeprecationWarning)
934961
self._n_clusters = self.k
@@ -1647,3 +1674,54 @@ make this task easier and faster (in no particular order).
16471674
<https://git-scm.com/docs/git-grep#_examples>`_) is also extremely
16481675
useful to see every occurrence of a pattern (e.g. a function call or a
16491676
variable) in the code base.
1677+
1678+
1679+
.. _plotting_api:
1680+
1681+
Plotting API
1682+
============
1683+
1684+
Scikit-learn defines a simple API for creating visualizations for machine
1685+
learning. The key features of this API is to run calculations once and to have
1686+
the flexibility to adjust the visualizations after the fact. This logic is
1687+
encapsulated into a display object where the computed data is stored and
1688+
the plotting is done in a `plot` method. The display object's `__init__`
1689+
method contains only the data needed to create the visualiz 10000 ation. The `plot`
1690+
method takes in parameters that only have to do with visualization, such as a
1691+
matplotlib axes. The `plot` method will store the matplotlib artists as
1692+
attributes allowing for style adjustments through the display object. A
1693+
`plot_*` helper function accepts parameters to do the computation and the
1694+
parameters used for plotting. After the helper function creates the display
1695+
object with the computed values, it calls the display's plot method. Note
1696+
that the `plot` method defines attributes related to matplotlib, such as the
1697+
line artist. This allows for customizations after calling the `plot` method.
1698+
1699+
For example, the `RocCurveDisplay` defines the following methods and
1700+
attributes:
1701+
1702+
.. code-block:: python
1703+
1704+
class RocCurveDisplay:
1705+
def __init__(self, fpr, tpr, roc_auc, estimator_name):
1706+
...
1707+
self.fpr = fpr
1708+
self.tpr = tpr
1709+
self.roc_auc = roc_auc
1710+
self.estimator_name = estimator_name
1711+
1712+
def plot(self, ax=None, name=None, **kwargs):
1713+
...
1714+
self.line_ = ...
1715+
self.ax_ = ax
1716+
self.figure_ = ax.figure_
1717+
1718+
def plot_roc_curve(estimator, X, y, pos_label=None, sample_weight=None,
1719+
drop_intermediate=True, response_method="auto",
1720+
name=None, ax=None, **kwargs):
1721+
# do computation
1722+
viz = RocCurveDisplay(fpr, tpr, roc_auc,
1723+
estimator.__class__.__name__)
1724+
return viz.plot(ax=ax, name=name, **kwargs)
1725+
```
1726+
1727+
Read more in the :ref:`User Guide <visualizations>`.

doc/developers/performance.rst

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -403,11 +403,7 @@ kcachegrind
403403
Multi-core parallelism using ``joblib.Parallel``
404404
================================================
405405

406-
TODO: give a simple teaser example here.
407-
408-
Checkout the official joblib documentation:
409-
410-
- https://joblib.readthedocs.io
406+
See `joblib documentation <https://joblib.readthedocs.io>`_
411407

412408

413409
.. _warm-restarts:

doc/install.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ Scikit-learn requires:
2222
- SciPy (>= 0.17.0)
2323
- joblib (>= 0.11)
2424

25-
Scikit-learn plotting capabilities (i.e., functions start with "plot_") require
26-
Matplotlib (>= 1.5.1). For running the examples Matplotlib >= 1.5.1 is
27-
required. A few examples require scikit-image >= 0.12.3, a few examples require
28-
pandas >= 0.18.0.
25+
Scikit-learn plotting capabilities (i.e., functions start with "plot_"
26+
and classes end with "Display") require Matplotlib (>= 1.5.1). For running the
27+
examples Matplotlib >= 1.5.1 is required. A few examples require
28+
scikit-image >= 0.12.3, a few examples require pandas >= 0.18.0.
2929

3030
.. warning::
3131

doc/modules/classes.rst

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,11 @@ Classes
9999
cluster.AgglomerativeClustering
100100
cluster.Birch
101101
cluster.DBSCAN
102-
cluster.OPTICS
103102
cluster.FeatureAgglomeration
104103
cluster.KMeans
105104
cluster.MiniBatchKMeans
106105
cluster.MeanShift
106+
cluster.OPTICS
107107
cluster.SpectralClustering
108108

109109
Functions
@@ -658,8 +658,17 @@ Kernels:
658658

659659
inspection.partial_dependence
660660
inspection.permutation_importance
661-
inspection.plot_partial_dependence
662661

662+
Plotting
663+
--------
664+
665+
.. currentmodule:: sklearn
666+
667+
.. autosummary::
668+
:toctree: generated/
669+
:template: function.rst
670+
671+
inspection.plot_partial_dependence
663672

664673
.. _isotonic_ref:
665674

@@ -1007,6 +1016,26 @@ See the :ref:`metrics` section of the user guide for further details.
10071016
metrics.pairwise_distances_chunked
10081017

10091018

1019+
Plotting
1020+
--------
1021+
1022+
See the :ref:`visualizations` section of the user guide for further details.
1023+
1024+
.. currentmodule:: sklearn
1025+
1026+
.. autosummary::
1027+
:toctree: generated/
1028+
:template: function.rst
1029+
1030+
metrics.plot_roc_curve
1031+
1032+
.. autosummary::
1033+
:toctree: generated/
1034+
:template: class.rst
1035+
1036+
metrics.RocCurveDisplay
1037+
1038+
10101039
.. _mixture_ref:
10111040

10121041
:mod:`sklearn.mixture`: Gaussian Mixture Models
@@ -1435,9 +1464,18 @@ Low-level methods
14351464
:template: function.rst
14361465

14371466
tree.export_graphviz
1438-
tree.plot_tree
14391467
tree.export_text
14401468

1469+
Plotting
1470+
--------
1471+
1472+
.. currentmodule:: sklearn
1473+
1474+
.. autosummary::
1475+
:toctree: generated/
1476+
:template: function.rst
1477+
1478+
tree.plot_tree
14411479

14421480
.. _utils_ref:
14431481

0 commit comments

Comments
 (0)
0