[MRG] ENH: Added block_size parameter for lesser memory consumption #7979

dalmia · 2016-12-05T09:06:39Z

Reference Issue

What does this implement/fix? Explain your changes.

This intends to add a function pairwise_distances_blockwise with an additional block_size parameter to avoid storing all the O(n^2) pairs' distances. This would then be used by neighbors, silhouette_score and pairwise_distances_argmin_min.

Any other comments?

Stacking within pairwise_distances_reduce
Revert changes to sklearn/neighbors
Docstring for pairwise_distances_reduce
Tests for pairwise_distances_reduce

jnothman · 2016-12-05T11:32:35Z

I'd think the main thing needing work in the docstring is the return value: we can't return a monolithic matrix.

dalmia · 2016-12-06T03:30:12Z

Yes, that needs to be changed. But since it's depending on the block_size, I am unable to assign a specific value to it. Any suggestion that you may have?
The work-around I could think of was to assign n_samples_x and n_samples_y which depend on the block_size parameter?

jnothman · 2016-12-06T03:45:16Z

I would have thought you'd return a generator of chunks which when concatenated would produce the full matrix..

…

On 6 December 2016 at 14:30, Aman Dalmia ***@***.***> wrote: Yes, that needs to be changed. But since it's depending on the block_size, I am unable to assign a specific value to it. Any suggestion that you may have? The work-around I could think of was to assign n_samples_x and n_samples_y which depend on the block_size parameter? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6ysGzj226cso-47DloP74S5YiseXks5rFNbFgaJpZM4LD_lx> .

dalmia · 2016-12-06T05:48:59Z

@jnothman I currently don't have any experience with multithreading(in practise) but am able to understand most of the parts of your changes in #7177. So, do you suggest I follow up from your changes to get started and then try to fix if something breaks?

jnothman · 2016-12-06T06:01:23Z

Multithreading isn't really the issue here. If you're uncertain, write a test case and ask for a review of that.

…

On 6 December 2016 at 16:49, Aman Dalmia ***@***.***> wrote: @jnothman <https://github.com/jnothman> I currently don't have any experience with multithreading(in practise) but am able to understand most of the parts of your changes in #7177 <#7177>. So, do you suggest I follow up from your changes to get started and then try to fix if something breaks? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68vPjfEow5zA9cVYXwSxeenS5PFxks5rFPdNgaJpZM4LD_lx> .

dalmia · 2016-12-07T07:33:40Z

@jnothman I added the code for the generator and checked it by running an example. It seemed to work fine. I could not add tests/ check with nosetests because of the following error I am getting on running nosetests:

======================================================================
ERROR: Failure: ValueError (numpy.dtype has the wrong size, try recompiling. Expected 88, got 96)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/__init__.py", line 57, in <module>
    from .base import clone
  File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/base.py", line 12, in <module>
    from .utils.fixes import signature
  File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/utils/__init__.py", line 10, in <module>
    from .murmurhash import murmurhash3_32
  File "__init__.pxd", line 155, in init sklearn.utils.murmurhash (sklearn/utils/murmurhash.c:6319)
ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

Please help me with this error.

jnothman · 2016-12-07T09:53:36Z

Build scikit-learn first: run `make in`

…

On 7 December 2016 at 18:33, Aman Dalmia ***@***.***> wrote: @jnothman <https://github.com/jnothman> I added the code for the generator and checked it by running an example. It seemed to work fine. I could not add tests/ check with nosetests because of the following error I am getting on running nosetests: ======================================================================ERROR: Failure: ValueError (numpy.dtype has the wrong size, try recompiling. Expected 88, got 96)---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/__init__.py", line 57, in <module> from .base import clone File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/base.py", line 12, in <module> from .utils.fixes import signature File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/utils/__init__.py", line 10, in <module> from .murmurhash import murmurhash3_32 File "__init__.pxd", line 155, in init sklearn.utils.murmurhash (sklearn/utils/murmurhash.c:6319)ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96 ---------------------------------------------------------------------- Ran 1 test in 0.001s FAILED (errors=1) Please help me with this error. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60lt9eSQoUssQuJGNDyTiG4EfLynks5rFmFVgaJpZM4LD_lx> .

dalmia · 2016-12-08T03:16:15Z

I still get the same error. I get this while compiling:

 #warning "Using deprecated NumPy API, disable it by " \

Any suggestions? @jnothman

dalmia · 2016-12-08T04:01:42Z

It's fixed now.

jnothman · 2016-12-08T04:24:57Z

flake8 errors

dalmia · 2016-12-08T04:57:34Z

Yes, I had noticed those errors before committing. But I didn't change it as I had just copied the docstring from the pairwise_distances function. Is there someway of overriding the flake8 errors? Because the same line in pairwise_distances function doesn't give this error:

    Y : array [n_samples_b, n_features], optional
        An optional second feature array. Only allowed if metric != "precomputed".

jnothman · 2016-12-08T04:59:09Z

flake8 is run in diff mode so that we only encourage you to fix errors you have introduced to the code (although it doesn't catch everything). You are still required to fix those issues.

dalmia · 2016-12-08T07:10:39Z

I have added the tests for pairwise_distances_blockwise. I couldn't think of a better way of testing for invalid block size in a generator. Please review.
Thanks.

dalmia · 2016-12-10T06:31:41Z

ping @jnothman @amueller @lesteve

jnothman · 2016-12-12T03:19:47Z

sklearn/metrics/pairwise.py

+
+        pairwise_distances(X, y, metric, n_jobs)
+
+    but uses much less memory.


"but may use less memory"

jnothman · 2016-12-12T03:20:33Z

sklearn/metrics/pairwise.py

+
+def pairwise_distances_blockwise(X, Y=None, metric='euclidean', n_jobs=1,
+                                 block_size=DEFAULT_BLOCK_SIZE, **kwds):
+    """ Compute the distance matrix from a vector array X and optional Y.


remove space between """ and text (PEP257)

jnothman · 2016-12-12T03:23:31Z

sklearn/metrics/pairwise.py

+
+    Returns
+    -------
+    D : generator of blocks based on the ``block_size`` parameter. The blocks,


first line is reserved for type description

jnothman · 2016-12-12T03:24:55Z

sklearn/metrics/pairwise.py

+
+    Parameters
+    ----------
+    X : array [n_samples_a, n_samples_a] if metric == "precomputed", or, \


Remove , after "or"

jnothman · 2016-12-12T03:25:36Z

sklearn/metrics/pairwise.py

+
+    This method takes either a vector array or a distance matrix, and returns
+    a distance matrix. If the input is a vector array, the distances are
+    computed. If the input is a distances matrix, it is returned instead.


"returned in blocks"

jnothman · 2016-12-12T03:25:37Z

sklearn/metrics/pairwise.py

+                                 block_size=DEFAULT_BLOCK_SIZE, **kwds):
+    """ Compute the distance matrix from a vector array X and optional Y.
+
+    This method takes either a vector array or a distance matrix, and returns


"returns" -> "generates blocks of"

jnothman · 2016-12-12T03:27:00Z

sklearn/metrics/tests/test_pairwise.py

+
+def test_pairwise_distances_blockwise_invalid_block_size():
+    rng = np.random.RandomState(0)
+    X = rng.random_sample((400, 4))


this might as well be empty or zeros

jnothman · 2016-12-12T03:27:09Z

sklearn/metrics/tests/test_pairwise.py

+    rng = np.random.RandomState(0)
+    X = rng.random_sample((400, 4))
+    y = rng.random_sample((200, 4))
+    gen = pairwise_distances_blockwise(X, y, block_size=0, metric='euclidean')


I'd rather see a test where block_size=1 or greater

I was testing this for invalid block size hence used 0.

but 1 is invalid too if X has many samples

jnothman · 2016-12-12T03:28:38Z

sklearn/metrics/tests/test_pairwise.py

+    X = rng.random_sample((400, 4))
+    y = rng.random_sample((200, 4))
+    gen = pairwise_distances_blockwise(X, y, block_size=0, metric='euclidean')
+    assert_raise_message(ValueError, 'block_size should be at least n_samples '


I'd rather this error be raised upon calling pairwise_distances_blockwise. That means you need to not use a generator function immediately, but use a secondary function call.

jnothman · 2016-12-12T03:30:31Z

sklearn/metrics/pairwise.py

+    for start in range(0, n_samples, block_n_rows):
+        # get distances from block to every other sample
+        stop = min(start + block_n_rows, X.shape[0])
+        yield pairwise_distances(X[start:stop], Y, metric, n_jobs, **kwds)


This will use at most block_size memory overall, not per job. I'm not sure this is the best way to do parallelism here, though. I.e. ideally we would use a parallel generator, but I don't think joblib.parallel supports that atm.

jnothman · 2016-12-12T03:31:28Z

sklearn/metrics/tests/test_pairwise.py

+    X = rng.random_sample((400, 4))
+    gen = pairwise_distances_blockwise(X, block_size=1, metric="euclidean")
+    S = np.empty((0, X.shape[0]))
+    for row in gen:


just use np.vstack(list(gen))

jnothman · 2016-12-12T03:32:52Z

sklearn/metrics/tests/test_pairwise.py

+    gen = pairwise_distances_blockwise(X, block_size=1, metric="euclidean")
+    S = np.empty((0, X.shape[0]))
+    for row in gen:
+        S = np.vstack((S, row))


Firstly, please check that block_size is upheld for each block.

Secondly, instead of repeated vstack, add to a list, then vstack the list.

jnothman · 2016-12-12T03:34:01Z

sklearn/metrics/tests/test_pairwise.py

@@ -370,6 +372,49 @@ def test_pairwise_distances_argmin_min():
    np.testing.assert_almost_equal(dist_orig_val, dist_chunked_val, decimal=7)


+def check_invalid_block_size_generator(generator):


isn't this the same as next?

jnothman · 2016-12-12T03:35:13Z

sklearn/metrics/tests/test_pairwise.py

+    rng = np.random.RandomState(0)
+    # Euclidean distance should be equivalent to calling the function.
+    X = rng.random_sample((400, 4))
+    gen = pairwise_distances_blockwise(X, block_size=1, metric="euclidean")


we should be testing at least one other metric. I think you would be best with a helper function like check_pairwise_distances_blockwise, then call it for different sets of parameters inclduing metric, Y None or otherwise, and block_size

dalmia · 2016-12-12T05:51:41Z

@jnothman Thanks for the review. I've mostly implemented the changes you mentioned. However, the part you mentioned about using a parallel generator was a bit unclear to me. Since you're telling that it's not supported atm, should I be adding anything else there?

jnothman · 2016-12-12T10:48:56Z

In terms of parallelism. The current approach to parallelism means that n_jobs jobs are started for every block. This is a lot of overhead. It may be possible to use this kind of within-block parallelism only if we reuse the same jobs (threads or processes) from block to block. I think this is possible but would require passing a Parallel instance into pairwise_distances

The parallelism used in #7177 was performed across blocks, but on the basis that each block was reduced to a negligible memory overhead before returning. For the sake of parallelism or otherwise, a more useful interface may be something like:

def pairwise_distances_reduce(X, Y, reduce_func, metric, n_jobs, block_size):
    ...

wherein reduce_func is applied to each block calculated in parallel processes. This could be used in silhouette_samplesor in a rewrite of pairwise_distances_argmin_min

dalmia · 2016-12-15T23:03:09Z

@jnothman Thanks for the detailed explanation. I mostly got hold of what you are explaining. However, looking at #7177 I couldn't understand where in the code has each block been reduced to a negligible memory overhead. Could you please direct me to it so that I can understand better and implement the same?

jnothman · 2016-12-15T23:13:43Z

It gets translated into n_samples * n_clusters (intra_clust_dists, inter_clust_dists), rather than n_samples * n_samples.

…

On 16 December 2016 at 10:03, Aman Dalmia ***@***.***> wrote: @jnothman <https://github.com/jnothman> Thanks for the detailed explanation. I mostly got hold of what you are explaining. However, looking at #7177 <#7177> I couldn't understand where in the code has each block been reduced to a negligible memory overhead. Could you please direct me to it so that I can understand better and implement the same? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60PwIsdiYMhbiHy36gFHwJmshQFYks5rIccugaJpZM4LD_lx> .

dalmia · 2016-12-15T23:44:07Z

I understood it now. So, do you have any such reduce_func in mind for pairwise_distances or should I try incorporating it for silhouette_samples directly?

jnothman · 2016-12-18T05:39:18Z

The pairwise_distance case has no reduce_func.

…

On 16 December 2016 at 10:44, Aman Dalmia ***@***.***> wrote: I understood it now. So, do you have any such reduce_func in mind for pairwise_distances or should I try incorporating it for silhouette_samples directly? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6ybgMm6YmWLqv0Y2AWhY7qEEa0SCks5rIdDIgaJpZM4LD_lx> .

dalmia · 2016-12-18T07:03:19Z

@jnothman Oh! So, should I get started on incorporating it for silhouette_samples or pairwise_distances_argmin_min via pairwise_distances_reduce?

jnothman · 2016-12-20T02:37:44Z

Sure, go ahead, but I think using this helper in nearest neighbors would be more practically useful.

…

On 18 December 2016 at 18:03, Aman Dalmia ***@***.***> wrote: @jnothman <https://github.com/jnothman> I intended to ask that should I get started on the rewrite of silhouette_samples via pairwise_distances_reduce. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64uLCztdor92anatOFzYo2MnLVlIks5rJNq5gaJpZM4LD_lx> .

dalmia · 2016-12-22T06:04:50Z

@jnothman I'd been trying to think what to do next here, but am a bit confused(as your PR for silhouette_samples is not yet merged). I assume you are saying that I replace pairwise_distances in nearest neighbors to pairwise_distances_blockwise since we don't have a reduce_func for pairwise_distances. Could you please brief me as to what should I be doing next in this PR?

jnothman · 2016-12-22T10:51:40Z

nearest neighbors reduces pairwise_distances's output to the nearest neighbors. So it can be implemented with a reduce_func.

…

On 22 December 2016 at 17:04, Aman Dalmia ***@***.***> wrote: @jnothman <https://github.com/jnothman> I'd been trying to think what to do next here, but am a bit confused(as your PR for silhouette_samples is not yet merged). I assume you are saying that I replace pairwise_distances in nearest neighbors to pairwise_distances_blockwise since we don't have a reduce_func for pairwise_distances. Could you please brief me as to what should I be doing next in this PR? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6wqMNieyubJVaFIVmghFPHEoC0qRks5rKhMDgaJpZM4LD_lx> .

jnothman · 2017-01-05T07:17:21Z

Well, that one can directly make use of pairwise_distances_argmin which uses blocks.

…

On 5 January 2017 at 16:04, Aman Dalmia ***@***.***> wrote: In nearest neighbors, approximate.py, base.py and nearest_centroid.py use pairwise_distances. Just so I understand what I need to do clearly, nearest_centroid.py uses it in the following way: return self.classes_[pairwise_distances( X, self.centroids_, metric=self.metric).argmin(axis=1)] Could you please give me a head start as to how could I replace this with a generator of blocks with a reduce_func? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7979 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz61V9OdGgk2_cZ44cA72OuO9xTIqJks5rPHnKgaJpZM4LD_lx> .

dalmia · 2017-02-20T15:22:37Z

Seems great, working on them. Have a doubt though:

If we are to keep it, it needs tests.

From tests, do you mean apart from those in the gist linked above? (which I have added in utils/test_utils.py)

jnothman · 2017-02-23T11:46:56Z

From tests, do you mean apart from those in the gist linked above? (which I have added in utils/test_utils.py)

Ha I'd forgotten I actually wrote a gist with unit tests :P

jnothman

Changing to WIP while you make those substantial additions to the PR.

…educe

jnothman

Thanks for this, and sorry for the slow review. There continues to be interest in this kind of feature, with issues raised wrt silhouette_score and in the implementation of large margin nearest neighbors.

I don't think you need to integrate #7177 here; I think we can mark it MRG again.

jnothman · 2017-05-24T14:13:52Z

sklearn/metrics/pairwise.py

-
+    if batch_size is not None:
+        warnings.warn("'batch_size' was deprecated in version 0.19 and will "
+                      "be removed in version 0.21.", DeprecationWarning)


You should set block_size according to batch_size, i.e. block_size = int(ceil(batch_size ** 2 / BYTES_PER_FLOAT / 1024 / 1024)) (I think).

jnothman · 2017-05-24T14:15:14Z

sklearn/metrics/pairwise.py

@@ -457,10 +431,13 @@ def pairwise_distances_argmin(X, Y, axis=1, metric="euclidean",
    sklearn.metrics.pairwise_distances
    sklearn.metrics.pairwise_distances_argmin_min
    """
+    if batch_size is not None:


Just forward the parameters as provided onto argmin_min and let it do the warning and conversions.

jnothman · 2017-05-24T14:16:04Z

sklearn/metrics/pairwise.py

@@ -256,8 +257,19 @@ def euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False,
    return distances if squared else np.sqrt(distances, out=distances)


+def _argmin_min_reduce_min(dist):


Should this just be _argmin_min_reduce?

jnothman · 2017-05-24T14:20:08Z

sklearn/metrics/pairwise.py

+    Returns
+    -------
+    D : array-like or sparse matrix or tuple
+        A distance matrix D such that D_{i, j} is the distance between the


This is not accurate. The only thing we can say about the output here is that its first axis (or the first axis of each entry in a tuple) corresponds to rows in X.

jnothman · 2017-05-24T14:21:08Z

sklearn/metrics/pairwise.py

+    are computed. If the input is a distances matrix, it is returned in blocks
+    instead.
+
+    This is equivalent to calling:


"vstacking the chunks generated by this function is equivalent to calling:"

jnothman · 2017-05-24T14:21:19Z

sklearn/metrics/pairwise.py

+
+        pairwise_distances(X, y, metric, n_jobs)
+
+    but may use less memory.


but avoids storing the entire distance matrix in memory

jnothman · 2017-05-24T14:23:47Z

sklearn/metrics/pairwise.py

@@ -1131,6 +1109,214 @@ def _pairwise_callable(X, Y, metric, **kwds):
                  'sokalsneath', 'sqeuclidean', 'yule', "wminkowski"]


+def _generate_pairwise_distances_blockwise(X, Y=None, metric='euclidean',
+                                           n_jobs=1,
+                                           block_size=DEFAULT_BLOCK_SIZE,


block_size isn't used here

jnothman · 2017-05-24T14:24:23Z

sklearn/metrics/pairwise.py

+    D : generator of blocks based on the ``block_size`` parameter.
+
+    """
+    if metric != 'precomputed' and Y is None:


I'd do this in pairwise_distances_blockwise

jnothman · 2017-05-24T14:25:08Z

sklearn/metrics/pairwise.py

+    n_samples = X.shape[0]
+    for start in range(0, n_samples, block_n_rows):
+        # get distances from block to every other sample
+        stop = min(start + block_n_rows, X.shape[0])


does just using start + block_n_rows break anything? Usually slice notation does not raise an error if stop is greater than the length

jnothman · 2017-05-24T14:26:46Z

sklearn/metrics/pairwise.py

@@ -1131,6 +1109,214 @@ def _pairwise_callable(X, Y, metric, **kwds):
                  'sokalsneath', 'sqeuclidean', 'yule', "wminkowski"]


+def _generate_pairwise_distances_blockwise(X, Y=None, metric='euclidean',


I think you should inline this function in pairwise_distances_blockwise.

jnothman · 2017-06-03T12:06:28Z

@dalmia if this is considered for attention during an upcoming sprint, would you like to help finish it, or let someone else take it on?

jnothman · 2017-06-08T05:18:20Z

@dalmia, I will mark this for someone else to finish.

dalmia · 2017-06-08T08:05:24Z

@jnothman please do the same. I won't be able to continue work on this

jnothman · 2017-08-23T23:57:08Z

As noted at #8602 (comment), this continues to be an in-demand feature, but I've received next-to-no feedback from other core devs on whether I've led this PR in the right direction. I've also wondered whether we should make the batch size a global option (e.g. sklearn.set_config(chunk_memory='64M')) and then use it wherever operations can be chunked. Your thoughts on whether this is a priority; its design (at lesat in terms of the magical flexible_vstack I introduced); and whether configurations should be global are very welcome, @amueller, @lesteve, @GaelVaroquaux, ...

jnothman · 2018-02-06T23:31:33Z

Closing as superseded by #10280

dalmia changed the title ~~[WIP] ENH: Added template for pairwise_distances_blockwise with docstring c…~~ [WIP] ENH: Added block_size parameter for lesser memory consumption Dec 6, 2016

dalmia force-pushed the 7287 branch from 00167cb to daff290 Compare December 8, 2016 05:24

dalmia changed the title ~~[WIP] ENH: Added block_size parameter for lesser memory consumption~~ [MRG] ENH: Added block_size parameter for lesser memory consumption Dec 8, 2016

jnothman requested changes Dec 12, 2016

View reviewed changes

dalmia added 2 commits February 20, 2017 20:36

TST: add tests for flexible_vstack

6d49c12

DOC: improve docstring for

b3fb795

FIX: correct X, y for Python3

f3d3a1a

jnothman changed the title ~~[MRG] ENH: Added block_size parameter for lesser memory consumption~~ [WIP] ENH: Added block_size parameter for lesser memory consumption Feb 23, 2017

jnothman reviewed Feb 23, 2017

View reviewed changes

ENH: rewrote pairwise_distances_argmin_min using pairwise_distances_r…

f901d7e

…educe

rth mentioned this pull request May 20, 2017

FEAT Large Margin Nearest Neighbor implementation #8602

Open

jnothman reviewed May 24, 2017

View reviewed changes

jnothman approved these changes May 24, 2017

View reviewed changes

jnothman changed the title ~~[WIP] ENH: Added block_size parameter for lesser memory consumption~~ [MRG] ENH: Added block_size parameter for lesser memory consumption May 24, 2017

jnothman added the Sprint label Jun 3, 2017

jnothman mentioned this pull request Jun 6, 2017

[MRG] Blockwise parallel silhouette computation #1976

Closed

jnothman added Need Contributor Waiting for Reviewer labels Jun 8, 2017

jnothman mentioned this pull request Jun 12, 2017

[MRG] Estimator tags #8022

Merged

4 tasks

jnothman added this to the 0.19 milestone Jun 13, 2017

jnothman removed the Sprint label Jun 19, 2017

jnothman modified the milestones: 0.19, 0.20 Jun 27, 2017

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

jnothman mentioned this pull request Dec 10, 2017

[MRG+1] ENH Add working_memory global config for chunked operations #10280

Merged

10 tasks

jnothman closed this Feb 6, 2018


		pairwise_distances(X, y, metric, n_jobs)

		but uses much less memory.

		@@ -370,6 +372,49 @@ def test_pairwise_distances_argmin_min():
		np.testing.assert_almost_equal(dist_orig_val, dist_chunked_val, decimal=7)


		def check_invalid_block_size_generator(generator):

		@@ -256,8 +257,19 @@ def euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False,
		return distances if squared else np.sqrt(distances, out=distances)


		def _argmin_min_reduce_min(dist):


		pairwise_distances(X, y, metric, n_jobs)

		but may use less memory.

		@@ -1131,6 +1109,214 @@ def _pairwise_callable(X, Y, metric, **kwds):
		'sokalsneath', 'sqeuclidean', 'yule', "wminkowski"]


		def _generate_pairwise_distances_blockwise(X, Y=None, metric='euclidean',

Uh oh!

[MRG] ENH: Added block_size parameter for lesser memory consumption #7979

[MRG] ENH: Added block_size parameter for lesser memory consumption #7979

Uh oh!

Conversation

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment