correct and reasonable new example to replace the old one #25409

Song-Pingfan · 2023-01-16T16:44:55Z

scikit-learn/sklearn/feature_extraction/image.py

Line 496 in 98cf537

>>> X = load_sample_images().images[1]

The problem of the old example is that it did not consider the "n_samples" dimension of the function from sklearn.feature_extraction.image.PatchExtractor and therefore caused a strange, unreasonable and confusing result that had no channel dimension.

The new example corrects the issue and gives a clear, reasonable demonstration of how to use PatchExtractor sensibly. It is shown as following:

>>> from sklearn.datasets import load_sample_images
>>> from sklearn.feature_extraction import image
# Use the array data from the second image in this dataset:
>>> X = load_sample_images().images[1]
>>> X = X[np.newaxis,:] # make X has a shape (n_samples, image_height, image_width, n_channels). Very important!!
>>> print('Image shape: {}'.format(X.shape))
Image shape: (1, 427, 640, 3)
>>> pe = image.PatchExtractor(patch_size=(10, 10))
>>> pe_fit = pe.fit(X) # Do nothing and return the estimator unchanged.
>>> pe_trans = pe.transform(X)
>>> print('Patches shape: {}'.format(pe_trans.shape))
Patches shape: (263758, 10, 10, 3)

Note, this line X = X[np.newaxis,:] adds a new axis as the "n_samples" dimension, which makes X has a correct shape (n_samples, image_height, image_width, n_channels). It is the key difference from the old example. This change is very important, because, otherwise, "image_height" will be treated as "n_samples", and other dimensions will also be messed up similarly. That is why the result in the old example look strange and confusing, with only 3 dimensions but without the channel dimension.

It would be good to add the following reconstruction scripts to verify that the original image can be reconstructed from the extracted patches using reconstruct_from_patches_2d function. This makes the example more complete, similar to the example of using the extract_patches_2d function. Again, it is important to keep in mind the "n_samples" dimension.

>>> X_reconstructed = image.reconstruct_from_patches_2d(pe_trans, X.shape[1:])
>>> print(X_reconstructed.shape)
>>> np.testing.assert_array_equal(X[0], X_reconstructed)
(427, 640, 3)

The text was updated successfully, but these errors were encountered:

thomasjpfan · 2023-02-24T15:00:19Z

I agree with the change to add in n_samples, I'll recommend something like this:

from sklearn.datasets import load_sample_images
from sklearn.feature_extraction import image
# Use the array data from the second image in this dataset:
X = load_sample_images()["images"][1]
X = X[None, ...]
print(f"Image shape: {X.shape}")

pe = image.PatchExtractor(patch_size=(10, 10))
pe_trans = pe.transform(X)
print(f"Patches shape: {pe_trans.shape}")

X_reconstructed = image.reconstruct_from_patches_2d(pe_trans, X.shape[1:])
print(f"Reconstructed shape: {X_reconstructed.shape}")

There is no need to call fit before transform. Our docstring tests will assert that the reconstructed shape is correct.

@Song-Pingfan Are you interested in opening a pull request to update the docstring?

github-actions bot added the Needs Triage Issue requires triage label Jan 16, 2023

thomasjpfan added Documentation module:feature_extraction and removed Needs Triage Issue requires triage labels Feb 24, 2023

murezzda mentioned this issue Mar 28, 2023

DOC improve example of PatchExtractor #26002

Merged

glemaitre closed this as completed in #26002 Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correct and reasonable new example to replace the old one #25409

correct and reasonable new example to replace the old one #25409

correct and reasonable new example to replace the old one #25409

correct and reasonable new example to replace the old one #25409

Comments