[MRG+1] Issue #7779 Fixed with new function (datasets.load_wine) added. #7912

tylerlanigan · 2016-11-19T23:10:38Z

Reference Issue

What does this implement/fix? Explain your changes.

Provides a new example called plot_scaling_importance.py

In order to implement the example, the wine dataset from UCI is used.

A load_wine function is added to sklearn.datasets that load the wine dataset. The dataset is stored in the datasets/data folder. Updated are base.py and init.py to reflect this.

The following has been tested and works:
from sklearn.datasets import load_wine

A test is added in datasets/tests/test_base.py to test the new load_wine() function. The new function has been tested in the style of load_iris()

Any other comments?

jnothman · 2016-11-20T02:36:45Z

Test failure:

======================================================================
FAIL: sklearn.datasets.tests.test_base.test_load_wine
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/datasets/tests/test_base.py", line 215, in test_load_wine
    assert_equal(res.data.shape, (178, 13))
AssertionError: Tuples differ: (150, 4) != (178, 13)
First differing element 0:
150
178
- (150, 4)
+ (178, 13)
    """Fail immediately, with the given message."""
>>  raise self.failureException('Tuples differ: (150, 4) != (178, 13)\n\nFirst differing element 0:\n150\n178\n\n- (150, 4)\n+ (178, 13)')

tylerlanigan · 2016-11-20T02:58:54Z

Ok that was dumb of me. I did all the testing outside of that folder and forgot to update the original load_iris to load_wine. Fixed.

In future pull requests I'll figure out my testing bug. Maybe I could ask you for a little guidance for how you set up your testing environment?

tylerlanigan · 2016-11-21T00:22:24Z

@jnothman I've submitted a new commit called "fixed test_base.py" which I believe addressed the error that popped up from before. I'm confused about this new error message that is popping up for continuous integration? How do I go about fixing this?

jnothman · 2016-11-21T01:40:55Z

The current error is from flake8: it's about style, not test failure.


./examples/preprocessing/plot_scaling_importance.py:32:80: E501 line too long (97 > 79 characters)
which has been standardized using :class:`StandardScaler <sklearn.preprocessing.StandardScaler>`,
                                                                               ^
./examples/preprocessing/plot_scaling_importance.py:49:1: E402 module level import not at top of file
from sklearn.cross_validation import train_test_split
^
./examples/preprocessing/plot_scaling_importance.py:50:1: E402 module level import not at top of file
from sklearn import preprocessing
^
./examples/preprocessing/plot_scaling_importance.py:51:1: E402 module level import not at top of file
from sklearn.decomposition import PCA
^
./examples/preprocessing/plot_scaling_importance.py:52:1: E402 module level import not at top of file
from sklearn.preprocessing import StandardScaler
^
./examples/preprocessing/plot_scaling_importance.py:52:1: F401 'sklearn.preprocessing.StandardScaler' imported but unused
from sklearn.preprocessing import StandardScaler
^
./examples/preprocessing/plot_scaling_importance.py:53:1: E402 module level import not at top of file
from sklearn.naive_bayes import GaussianNB
^
./examples/preprocessing/plot_scaling_importance.py:54:1: E402 module level import not at top of file
from sklearn import metrics
^
./examples/preprocessing/plot_scaling_importance.py:55:1: E402 module level import not at top of file
import matplotlib.pyplot as plt
^
./examples/preprocessing/plot_scaling_importance.py:56:1: E402 module level import not at top of file
from sklearn.datasets import load_wine
^
./examples/preprocessing/plot_scaling_importance.py:67:80: E501 line too long (91 > 79 characters)
                                                    test_size=0.30, random_state=RAN_STATE)
                                                                               ^
./sklearn/datasets/base.py:321:43: W291 trailing whitespace
                 feature_names=['alcohol',
                                          ^
./sklearn/datasets/tests/test_base.py:213:1: E302 expected 2 blank lines, found 1
def test_load_wine():

tylerlanigan · 2016-11-21T01:50:36Z

@jnothman Does this need to be corrected before it is merged? I was looking at another example when I made this one and it generated a few PEP8 errors because the modules weren't at the top. I assumed that this needs to be the case in order to render it correctly for the webpage.

jnothman · 2016-11-21T02:03:40Z

I'm not sure, but that might be an old convention:
http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html
seems to work okay despite printing doc after imports

On 21 November 2016 at 12:50, Tyler Lanigan notifications@github.com
wrote:

@jnothman https://github.com/jnothman Does this need to be corrected
before it is merged? I was looking at another example when I made this one
and it generated a few PEP8 errors because the modules weren't at the top.
I assumed that this needs to be the case in order to render it correctly
for the webpage.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7912 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz66j8Y5sP-TAHc2ygNeyZX17OSIPiks5rAPjvgaJpZM4K3ZYh
.

jnothman · 2016-11-21T02:48:02Z

examples/preprocessing/plot_scaling_importance.py

@@ -1,6 +1,13 @@
 #!/usr/bin/python
 # -*- coding: utf-8 -*-

+from sklearn.cross_validation import train_test_split


No, docstring must come before improts

@jnothman Alright I'll change it back.

tylerlanigan · 2016-11-21T03:36:38Z

@jnothman I've moved the import statements back and looked at the logs for the errors. It seems that the only flake8 errors are that the import statements needs to be at the top. Is there anything else that is required of me before this can be merged?

./examples/preprocessing/plot_scaling_importance.py:41:1: E402 module level import not at top of file
from sklearn.cross_validation import train_test_split
^
./examples/preprocessing/plot_scaling_importance.py:42:1: E402 module level import not at top of file
from sklearn import preprocessing
^
./examples/preprocessing/plot_scaling_importance.py:43:1: E402 module level import not at top of file
from sklearn.decomposition import PCA
^
./examples/preprocessing/plot_scaling_importance.py:44:1: E402 module level import not at top of file
from sklearn.naive_bayes import GaussianNB
^
./examples/preprocessing/plot_scaling_importance.py:45:1: E402 module level import not at top of file
from sklearn import metrics
^
./examples/preprocessing/plot_scaling_importance.py:46:1: E402 module level import not at top of file
import matplotlib.pyplot as plt
^
./examples/preprocessing/plot_scaling_importance.py:47:1: E402 module level import not at top of file
from sklearn.datasets import load_wine
^

jnothman · 2016-11-21T08:56:26Z

Well, two core devs need to review and approve of it... so probably there
is more work to do, but what will depend on someone finding time to review
:)

On 21 November 2016 at 14:36, Tyler Lanigan notifications@github.com
wrote:

@jnothman https://github.com/jnothman I've moved the import statements
back and looked at the logs for the errors. It seems that the only flake8
errors are that the import statements needs to be at the top. Is there
anything else that is required of me before this can be merged?

./examples/preprocessing/plot_scaling_importance.py:41:1: E402 module level import not at top of file
from sklearn.cross_validation import train_test_split
^
./examples/preprocessing/plot_scaling_importance.py:42:1: E402 module level import not at top of file
from sklearn import preprocessing
^
./examples/preprocessing/plot_scaling_importance.py:43:1: E402 module level import not at top of file
from sklearn.decomposition import PCA
^
./examples/preprocessing/plot_scaling_importance.py:44:1: E402 module level import not at top of file
from sklearn.naive_bayes import GaussianNB
^
./examples/preprocessing/plot_scaling_importance.py:45:1: E402 module level import not at top of file
from sklearn import metrics
^
./examples/preprocessing/plot_scaling_importance.py:46:1: E402 module level import not at top of file
import matplotlib.pyplot as plt
^
./examples/preprocessing/plot_scaling_importance.py:47:1: E402 module level import not at top of file
from sklearn.datasets import load_wine
^

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7912 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz66iVWIpEGJMiEu-v13XJ_g2sp6ayks5rARHHgaJpZM4K3ZYh
.

lesteve · 2016-11-21T15:01:53Z

The flake8 linting is done on the PR diff, so that explains why it is complaining on your example and not on old ones.

To be honest I lean slightly towards changing setup.cfg to ignore E402 (imports not at the top of the file). This would make sense especially for examples where you tend to have some kind of interactive-like style and do your imports only when you need them.

lesteve · 2016-11-21T15:03:00Z

In your particular case you can move print(__doc__) after the imports if you want to get rid of the flake8 failures.

GaelVaroquaux · 2016-11-21T15:08:34Z

To be honest I lean slightly towards changing setup.cfg to ignore E402 (imports not at the top of the file).

They are also sometimes necessary to avoid circular imports, though this should be infrequent.

lesteve

First pass

lesteve · 2016-11-21T15:32:37Z

examples/preprocessing/plot_scaling_importance.py

+=========================================================
+
+Features scaling though standardization (or Z-score normalization)
+can be an importance preprocessing step for many machine learning


lesteve · 2016-11-21T15:33:07Z

examples/preprocessing/plot_scaling_importance.py

+to the covariance matrix.
+
+In order to illustrate this in an example, PCA will be performed on a dataset
+which has been standardized using StandardScalerand a copy which has remained


missing space: StandardScaler and

lesteve · 2016-11-21T15:34:33Z

examples/preprocessing/plot_scaling_importance.py

+from __future__ import print_function
+print(__doc__)
+
+from sklearn.cross_validation import train_test_split


Use sklearn.model_selection instead since sklearn.cross_validation is deprecated in 0.18.

lesteve · 2016-11-21T15:35:15Z

examples/preprocessing/plot_scaling_importance.py

+ax2.set_title('Standardized training dataset after PCA')
+
+for ax in (ax1, ax2):
+


Minor comment: I would remove this blank line and add one before plt.tight_layout.

lesteve · 2016-11-21T15:35:58Z

sklearn/datasets/__init__.py

@@ -68,6 +69,7 @@
           'fetch_rcv1',
           'fetch_kddcup99',
           'get_data_home',
+           'load_wine',


the load_ functions seems alphabetically ordered, please respect that (both here and in the imports)

lesteve · 2016-11-21T15:38:29Z

sklearn/datasets/base.py

+
+    Examples
+    --------
+    Let's say you are interested in the samples 10, 25, and 50, and want to


Hmmm you seems to take the samples [10, 80, 140] below, is that intended?

lesteve · 2016-11-21T15:40:13Z

sklearn/datasets/base.py

+
+    The copy of UCI ML Wine Data Set dataset is
+    downloaded from:
+    https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data


Have you modified the class target labels as mentioned in the comments? If yes you may want to mention that.

lesteve · 2016-11-22T07:42:16Z

examples/preprocessing/plot_scaling_importance.py

+
+
+# Contants
+RAN_STATE = 42


No strong reason to save three letters -> RANDOM_STATE = 42

lesteve · 2016-11-22T07:44:54Z

examples/preprocessing/plot_scaling_importance.py

+weight) because of their respective scales (meters vs. kilos) it can
+be seen how not scaling the features would cause PCA to determine that
+the direction of maximal variance more closely corresponds with the
+‘weight’ axis. As a change in height of one meter can be considered much


weird quotes around weight, did you mean to use '

lesteve · 2016-11-22T07:47:01Z

examples/preprocessing/plot_scaling_importance.py

+# License: BSD 3 clause
+
+
+# Contants


Constants

To be honest this is the kind of comments I would remove, since it doesn't bring much added value

tylerlanigan · 2016-11-22T19:37:03Z

@lesteve Sounds good. I can get to this is a couple of days. My computer is in the shop

jnothman · 2016-11-24T02:04:22Z

Btw, see the current rendering at:

http://scikit-learn.org/circle?7264/auto_examples/preprocessing/plot_scaling_importance.html

amueller · 2016-11-28T21:59:15Z

maybe actually print the first principal component with and without scaling? Without scaling, it will be dominated by the feature with the largest scale (I would imagine) and you could explain that in the docstring?

tylerlanigan · 2016-11-28T23:44:56Z

@amueller Yeah I can do that no problem.

tylerlanigan · 2016-11-29T01:12:07Z

@lesteve I have also completed your requested changes

tylerlanigan · 2016-12-05T18:44:59Z

@lesteve How do we proceed? I have made all of the changes that you requested, and have passed all of the checks?

jnothman

The loader also needs a unit test in datasets/tests/test_base.py

jnothman · 2016-12-05T23:19:18Z

examples/preprocessing/plot_scaling_importance.py

+features.
+
+The results will then be used to train a naive Bayes classifier, and a clear
+difference the prediction accuracies will be observed.


"difference the" -> "difference in"

jnothman · 2016-12-05T23:19:19Z

sklearn/datasets/base.py

+
+    =================   ==============
+    Classes                          3
+    Samples per class               [59,71,48]


I'm unsure of what you mean? Do you mean that the [59,71,48] needs to line up center wise with the 3? Please clarify

everything else is right aligned

jnothman · 2016-12-05T23:19:19Z

sklearn/datasets/base.py

+        If True, returns ``(data, target)`` instead of a Bunch object.
+        See below for more information about the `data` and `target` object.
+
+


remove blank line

jnothman · 2016-12-05T23:19:19Z

sklearn/datasets/base.py

+
+    (data, target) : tuple if ``return_X_y`` is True
+
+


remove blank line

jnothman · 2016-12-05T23:19:19Z

sklearn/datasets/base.py

+    https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data
+
+    The file has been modified:
+        -to include class labels class_0, class_1 and class_2;


I don't think this will render correctly. Please check, at http://scikit-learn.org/circle?CIRCLE_BUILD_NO/modules/generated/sklearn.datasets.load_wine.html

Then again, I'm not sure this detail needs to be here as long as we're assured these changes are valid.

jnothman · 2016-12-05T23:19:19Z

sklearn/datasets/base.py

+        -to rename target variables from 1, 2, and 3 to 0, 1 and 2;
+        -to include to amount of datapoints and class labels.
+
+


one blank line

jnothman · 2016-12-05T23:19:18Z

examples/preprocessing/plot_scaling_importance.py

+think of Principle Component Analysis (PCA) as being a prime example
+of when normalization is important. In PCA we are interested in the
+components that maximize the variance. If there exists components
+(e.g human height) that vary less then other components (e.g human


then -> than

jnothman · 2016-12-06T00:00:15Z

sklearn/datasets/base.py

+
+    =================   ==============
+    Classes                          3
+    Samples per class               [59,71,48]


everything else is right aligned

tylerlanigan · 2016-12-09T03:25:57Z

@jnothman, Ok I've completed your requested changes. The unit test was done in a previous commit.

tylerlanigan · 2016-12-12T22:30:35Z

@jnothman @lesteve Hey guys, I've completed your changes and pushed to the pull request. I think it's waiting on you to approve..?

jnothman · 2016-12-13T01:01:07Z

Sorry, big load of PRs atm, and end of year parties and such to attend :)

…

On 13 December 2016 at 09:30, Tyler Lanigan ***@***.***> wrote: @jnothman <https://github.com/jnothman> @lesteve <https://github.com/lesteve> Hey guys, I've completed your changes and pushed to the pull request. I think it's waiting on you to approve..? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7912 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60DefpdbhacEn4gd2ckk4QYBq8NSks5rHcsMgaJpZM4K3ZYh> .

jnothman · 2016-12-05T23:19:18Z

sklearn/datasets/base.py

+def load_wine(return_X_y=False):
+    """Load and return the wine dataset (classification).
+
+            .. versionadded:: 0.18


Reduce indent to 4 spaces

jnothman · 2016-12-14T10:54:05Z

examples/preprocessing/plot_scaling_importance.py

+more important than the change in weight of one kilogram, it is easily
+seen that this determination is incorrect. In the case of PCA, scaling
+features using normalization is preferred over using min-max scaling as
+the primary components are computed using the correlation matrix as


I'm not sure I get this statement

This statement doesn't need to be here. I removed it for clarity.

jnothman · 2016-12-14T10:56:40Z

examples/preprocessing/plot_scaling_importance.py

+regression) require features to be normalized, intuitively we can
+think of Principle Component Analysis (PCA) as being a prime example
+of when normalization is important. In PCA we are interested in the
+components that maximize the variance. If there exists components


How about, more succinctly, "If one component (e.g. human height) varies less than another (e.g. weight) because of their respective scales (meters vs. kilos), PCA might determine that the direction of maximal variance more closely corresponds with the 'weight' axis, if those features are not scaled.

jnothman · 2016-12-14T10:57:00Z

examples/preprocessing/plot_scaling_importance.py

+be seen how not scaling the features would cause PCA to determine that
+the direction of maximal variance more closely corresponds with the
+'weight' axis. As a change in height of one meter can be considered much
+more important than the change in weight of one kilogram, it is easily


"it is easily seen that this determination is incorrect" -> "this is clearly incorrect".

jnothman · 2016-12-14T10:57:25Z

examples/preprocessing/plot_scaling_importance.py

+opposed to the covariance matrix.
+
+In order to illustrate this in an example, PCA will be performed on a
+dataset which has been standardized using Standard Scaler and a copy


:class:`preprocessing.StandardScaler`

"comparing the use of the unscaled data against the same with StandardScaler applied"

Seeing as we have added a specialised dataset for this illustration, please briefly describe the dataset and the reason for its heterogeneous scale.

jnothman · 2016-12-14T11:10:52Z

sklearn/datasets/base.py

+    downloaded and modified to fit standard format from:
+    https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data
+
+    Examples


Needs a References section citing the original PARVUS

I'm also not sure whether we're obliged to cite UCI... Or indeed whether we're licenced to copy the UCI data. Nothing at https://archive.ics.uci.edu/ml/datasets/Wine suggests that we are.

@jnothman The load_breast_cancer function is also from UCI...

The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset is downloaded from: https://goo.gl/U2Uwz2 ``` That is the only citation I found for it. I also suspect that load_iris got it's data from UCI as well, but it doesn't mention anything. I can add a reference anyways to set a better example

jnothman · 2016-12-14T11:17:24Z

sklearn/datasets/base.py

+    """
+    module_path = dirname(__file__)
+    with open(join(module_path, 'data', 'wine_data.csv')) as csv_file:
+        data_file = csv.reader(csv_file)


I'd appreciate if this was refactored wrt load_iris

@jnothman I don't see a difference in those four lines between load_iris and load_wine. Please clarify what you mean

that's why you should factor it out into a helper function

@jnothman I can do this, but should I also edit all the other load functions to use this new refactored one? I'm concerned that this is getting a little outside of the scope of the original issue.

Sigh. If there are really many of them that need refactoring, perhaps not. But I only really think it's this, iris and breast_cancer that share the same loading.

jnothman · 2016-12-14T11:45:10Z

sklearn/datasets/descr/wine_data.rst

@@ -0,0 +1,85 @@
+Wine Data Database


A lot of this is copied verbatim without licence.

Note that this does not quite conform to their citation policy either: https://archive.ics.uci.edu/ml/citation_policy.html

I added the citation and reworded the paragraph a little.

jnothman · 2016-12-14T11:45:22Z

sklearn/datasets/tests/test_base.py

+    assert_true(res.DESCR)
+
+    # test return_X_y option
+    X_y_tuple = load_iris(return_X_y=True)


jnothman · 2016-12-14T11:45:27Z

sklearn/datasets/tests/test_base.py

+
+    # test return_X_y option
+    X_y_tuple = load_iris(return_X_y=True)
+    bunch = load_iris()


jnothman · 2016-12-05T23:19:18Z

examples/preprocessing/plot_scaling_importance.py

+regression) require features to be normalized, intuitively we can
+think of Principle Component Analysis (PCA) as being a prime example
+of when normalization is important. In PCA we are interested in the
+components that maximize the variance. If there exists components


exists -> exist

How about, more succinctly, "If one component (e.g. human height) varies less than another (e.g. weight) because of their respective scales (meters vs. kilos), PCA might determine that the direction of maximal variance more closely corresponds with the 'weight' axis, if those features are not scaled.

jnothman · 2016-12-14T20:20:46Z

sklearn/datasets/base.py

+    """
+    module_path = dirname(__file__)
+    with open(join(module_path, 'data', 'wine_data.csv')) as csv_file:
+        data_file = csv.reader(csv_file)


that's why you should factor it out into a helper function

jnothman · 2016-12-05T23:19:18Z

examples/preprocessing/plot_scaling_importance.py

+print(__doc__)
+
+# Code source: Tyler Lanigan <tylerlanigan@gmail.com>
+# 			   Sebastian Raschka <mail@sebastianraschka.com>


jnothman · 2016-12-14T20:21:45Z

@amueller know anything about our licence to redistribute data from UCI??

tylerlanigan · 2016-12-14T20:26:45Z

@jnothman Hey, just a heads up, I haven't pushed these changes yet. I'm just marking them as "done" as I go to keep track of them myself.

amueller · 2016-12-14T20:40:22Z

@jnothman our license? My guess would be that iris is public domain. I'm not sure we did any research for the other datasets. UCI doesn't specify anything. If someone uploads their data, they "donate" it, but there is nothing on licensing terms. Given that it's maintained by the library, they should know better lol.

We could bug them to be better about specifying licenses, or we could redistribute until someone tells us not to?

…nd load_breast_cancer

Also use same number of decimal places across columns of same row.

jnothman · 2017-01-23T01:34:05Z

Looks like you botched the merge.

tylerlanigan · 2017-01-23T01:38:05Z

Yeah, I'm a little unsure how to fix this..

jnothman · 2017-01-23T01:41:04Z

Firstly you can git reset --hard your branch back to the commit you were at before, so that you haven't lost anything. Then the goal is to fetch the updated upstream/master (i.e. master as at scikit-learn central) and, when in your branch, git merge upstream/master (or git rebase upstream/master).

jnothman · 2017-01-23T01:41:47Z

294af08 was one of the most recent working HEADs of this branch.

tylerlanigan · 2017-01-26T16:59:34Z

Is there anything else?

jnothman · 2017-01-26T21:42:11Z

We've got a bit of a backlog for the core devs to work through, sorry. We currently don't have much availability.

…

On 27 January 2017 at 03:59, Tyler Lanigan ***@***.***> wrote: Is there anything else? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7912 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz674Qhm7qq3H9ujsaVx-rE4H8wb-oks5rWND3gaJpZM4K3ZYh> .

MechCoder

LGTM otherwise

MechCoder · 2017-02-07T05:28:10Z

sklearn/datasets/base.py

@@ -242,6 +242,122 @@ def load_files(container_path, description=None, categories=None,
                 DESCR=description)


+def load_data(module_path, data_file_name):


Any reason why you did not use the np.genfromtxt utility directly, I use it to load csv's? I feel this entire block is just

path = os.path.join(module_path, data_file_name) X_y = np.genfromtxt(path, delimiter=",", skip_header=1) data = X_y[:, :-1] y = np.asarray(X_y[:, -1], dtype=np.int)

(or something similar)

whether or not rightly, we've used the header line to encode extra information

Don't know if that is a valid excuse. The target_names can be hard-coded.

jnothman · 2017-02-07T22:30:19Z

i can't say i mind either way, as long as it's consistent across datasets and not duplicating lots of custom code

…

On 8 Feb 2017 4:07 am, "Manoj Kumar" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/datasets/base.py <#7912>: > @@ -242,6 +242,122 @@ def load_files(container_path, description=None, categories=None, DESCR=description) +def load_data(module_path, data_file_name): Don't know if that is a valid excuse. The target_names can be hard-coded. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7912>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz61yJQmd_9exKA92XBKR3WgJpRcqhks5raKTVgaJpZM4K3ZYh> .

MechCoder · 2017-02-09T20:18:58Z

Me neither, but this part of the codebase can do with some code-refactoring. Please merge if you are happy.

jnothman · 2017-02-13T13:26:44Z

Thanks @t-lanigan

tylerlanigan · 2017-02-13T20:57:47Z

@jnothman Thanks as well. When can I see page on the website?

lesteve · 2017-02-13T22:39:14Z

@jnothman Thanks as well. When can I see page on the website?

You want to look at scikit-learn.org/dev which is updated on each push into master and go to the examples page.

also add load_wine to datasets

jnothman reviewed Nov 21, 2016

View reviewed changes

lesteve requested changes Nov 22, 2016

View reviewed changes

jnothman requested changes Dec 5, 2016

View reviewed changes

jnothman requested changes Dec 6, 2016

View reviewed changes

jnothman requested changes Dec 14, 2016

View reviewed changes

jnothman reviewed Dec 14, 2016

View reviewed changes

tylerlanigan and others added 8 commits January 22, 2017 17:07

fixed link to StandardScaler

3b61a0c

slight mod to wine description file

caa9499

added pipelining to plot_scaling... fixed spelling

11f6d11

added load_data function to clean up code for load_wine, load_iris, a…

ac9f242

…nd load_breast_cancer

DOC Align summary statistics for wine

684a2d7

Also use same number of decimal places across columns of same row.

flake8 compliance...again

14e186f

updated base.py load_data for PEP257

c33b382

base.py PEP257 one line description for load_data

538cb5a

tylerlanigan force-pushed the iss7779 branch from d47074b to 538cb5a Compare January 23, 2017 02:00

MechCoder reviewed Feb 7, 2017

View reviewed changes

jnothman approved these changes Feb 13, 2017

View reviewed changes

jnothman merged commit eb9fe80 into scikit-learn:master Feb 13, 2017

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

DOC add example regarding feature scaling (scikit-learn#7912)

6266bba

also add load_wine to datasets

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

DOC add example regarding feature scaling (scikit-learn#7912)

6b92b04

also add load_wine to datasets

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

DOC add example regarding feature scaling (scikit-learn#7912)

19bc28f

also add load_wine to datasets

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

DOC add example regarding feature scaling (scikit-learn#7912)

fb22731

also add load_wine to datasets

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

DOC add example regarding feature scaling (scikit-learn#7912)

5663882

also add load_wine to datasets

lemonlaug pushed a commit to lemonlaug/scikit-learn that referenced this pull request Jan 6, 2021

DOC add example regarding feature scaling (scikit-learn#7912)

eb7a345

also add load_wine to datasets

		ax2.set_title('Standardized training dataset after PCA')

		for ax in (ax1, ax2):

		If True, returns ``(data, target)`` instead of a Bunch object.
		See below for more information about the `data` and `target` object.

		-to rename target variables from 1, 2, and 3 to 0, 1 and 2;
		-to include to amount of datapoints and class labels.

		@@ -242,6 +242,122 @@ def load_files(container_path, description=None, categories=None,
		DESCR=description)


		def load_data(module_path, data_file_name):



		# Contants
		RAN_STATE = 42

Uh oh!

[MRG+1] Issue #7779 Fixed with new function (datasets.load_wine) added. #7912

[MRG+1] Issue #7779 Fixed with new function (datasets.load_wine) added. #7912

Uh oh!

Conversation

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment