add fast image processor nougat #37661

NahieliV · 2025-04-21T20:03:26Z

What does this PR do?

Adds fast image processor for Nougat model.

#36978

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ x] Did you read the contributor guideline,
Pull Request section?
[ x] Was this discussed/approved via a Github issue or the forum?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@yonigozlan

github-actions · 2025-04-21T20:03:43Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

yonigozlan

Hi @NahieliV , thanks for your contribution! Noted a few things to change :)

src/transformers/models/nougat/image_processing_nougat_fast.py

yonigozlan · 2025-04-22T19:04:10Z

tests/models/nougat/test_image_processing_nougat.py

+def to_channels_first(image):
+    """
+    Converts a NumPy image from channels-last (H, W, C) to channels-first (C, H, W)
+    if needed. Leaves PyTorch tensors and non-NumPy types unchanged.
+    """
+    if isinstance(image, np.ndarray):
+        if image.ndim == 3 and image.shape[-1] in [1, 3]:
+            return torch.tensor(np.transpose(image, (2, 0, 1)))
+        return torch.tensor(image)
+    return image
+
+


this shouldn't be needed if you make it so that all input images are channels_first

I removed this for the tests of the method align_long_axis, which indeed returns the channels first if the input image is channels first.

However, there is a small bug in the original implementation for the method crop_margin when the following condition is met.

max_val = data.max() min_val = data.min()

Even when we set data_format = "channels_first", the image is returned chanels last. This is because the image input format is inferred at the beginning, and then when to_pil_image is called, it's transformed into channels last. Then, when calling to_channel_dimension_format, data_format and input_data_format are the same so the function just returns the image.

I can raise an issue and fix it if you agree.

I see, nice catch! If you could fix the issue in this PR that would be great. Thanks!

yonigozlan

Thanks for iterating! Still a few changes left to make, but overall almost ready to go!

src/transformers/models/nougat/image_processing_nougat_fast.py

NahieliV · 2025-05-01T17:15:28Z

@yonigozlan , changes are done. There are small differences due to the difference in the implementation of interpolation.BICUBIC from PIL and PyTorch. For the test test_slow_fast_equivalence, there are 16 pixels with a difference over 1e-1. The max pixel difference is 0.18.

yonigozlan

Hi @NahieliV ! Sorry for the delay, but thanks for iterating! Looks ready to merge to me :). Very last thing to do is to add a comment explaining why we have a larger than usual difference with slow processor, and overwrite the equivalence tests with a higher threshold.

yonigozlan · 2025-05-13T16:00:35Z

src/transformers/models/nougat/image_processing_nougat_fast.py

+
+        new_size = (height, width)
+
+        return F.resize(image, new_size, interpolation=F.InterpolationMode.BICUBIC)


Indeed the difference with the slow image processor probably comes from here, as in the slow processor, reducing_gap=2.0 is used. Could you add a comment just above this line explaining this issue?

Should be good now.

…processor

yonigozlan

Thanks @NahieliV for iterating on this! I had to make some small changes mainly because of recent updates in Transformers, but LGTM! Waiting for the PR to be green then I'll merge

HuggingFaceDocBuilderDev · 2025-06-26T21:53:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions bot marked this pull request as draft April 21, 2025 20:03

NahieliV mentioned this pull request Apr 21, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Open

78 tasks

yonigozlan reviewed Apr 22, 2025

View reviewed changes

NahieliV force-pushed the adding_nougat_fast_processor branch from 86305b0 to 9aa40ea Compare April 27, 2025 17:28

NahieliV marked this pull request as ready for review April 27, 2025 17:56

yonigozlan reviewed Apr 28, 2025

View reviewed changes

NahieliV force-pushed the adding_nougat_fast_processor branch 2 times, most recently from 41e8fab to ea91a91 Compare May 1, 2025 16:55

yonigozlan reviewed May 13, 2025

View reviewed changes

NahieliV added 6 commits May 21, 2025 20:22

add fast image processor nougat

19e6b17

test fixes

b1fd641

docstring white space

f7fb38e

last fixes

2bf4253

docstring_type

a155fb7

tolerance unit test

b4dd0fe

NahieliV force-pushed the adding_nougat_fast_processor branch from ea91a91 to b4dd0fe Compare May 21, 2025 19:59

NahieliV and others added 8 commits May 21, 2025 23:06

fix tolerance

cdd8d7d

fix rtol

6515593

remove traling white space

273372e

remove white space

08ff310

note for tolerance unit test

122349d

fix tests

f5c4a5e

Merge remote-tracking branch 'upstream/main' into adding_nougat_fast_…

234e449

…processor

remove print

d63e24e

yonigozlan approved these changes Jun 26, 2025

View reviewed changes

Merge branch 'main' into adding_nougat_fast_processor

7fd75c6

yonigozlan enabled auto-merge (squash) June 27, 2025 14:27

yonigozlan merged commit 4336ecd into huggingface:main Jun 27, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add fast image processor nougat #37661

add fast image processor nougat #37661

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		new_size = (height, width)

		return F.resize(image, new_size, interpolation=F.InterpolationMode.BICUBIC)

add fast image processor nougat #37661

add fast image processor nougat #37661

Uh oh!

Conversation

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!