Add Fast Image Processor for vilt #37304

devxaitist · 2025-04-05T15:34:57Z

What does this PR do?

Add ViltImageProcessorFast implementation for faster image processing using PyTorch. (#36978)

This PR adds a fast image processor for the Vilt model that leverages PyTorch and torchvision functions instead of PIL/numpy. The implementation improves performance by using tensor operations and enabling GPU processing.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@yonigozlan

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-04-05T15:35:08Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

yonigozlan

Hi @devxaitist , thanks for contributing, Looks great! Some things to modifiy to make the processor cleaner

src/transformers/models/vilt/image_processing_vilt_fast.py

tests/models/vilt/test_image_processing_vilt.py

devxaitist · 2025-04-10T13:57:43Z

Hi @devxaitist , thanks for contributing, Looks great! Some things to modifiy to make the processor cleaner

Hi @yonigozlan, thank you for pointing out the need to separate functionality according to the single responsibility principle. Breaking down the _preprocess function into specialized methods like _resize and _pad_batch was definitely the right approach. I appreciate your guidance on improving the code structure.

I've implemented all the requested changes, so please review the updated code when you have a chance.

# Conflicts: # src/transformers/__init__.py # src/transformers/utils/dummy_torchvision_objects.py

devxaitist · 2025-04-28T14:07:33Z

Hi @yonigozlan, I've updated the branch to reflect the latest changes in the main branch. When you have a moment, I'd appreciate your review on the updated PR. Thank you!

yonigozlan

Thanks for iterating @devxaitist ! Still changes to be done, also it would be great to override the test_slow_fast_equivalence functions in the test to also compare the padding masks.

src/transformers/models/vilt/image_processing_vilt_fast.py

tests/models/vilt/test_image_processing_vilt.py

…and fast implementations

devxaitist · 2025-04-29T14:38:41Z

Thank you so much for your valuable feedback, @yonigozlan ! I've implemented all the requested changes, including overriding the test_slow_fast_equivalence functions to compare padding masks.

I greatly appreciate your time and guidance throughout this process. This is my first open-source contribution, and your feedback has been incredibly helpful for my learning.

When you have a moment, I'd be grateful if you could review these latest changes. Thank you again for your patience and support!

yonigozlan

Thanks for iterating @devxaitist looks much better! I suggested a small improvement in the padding function, other than that LGTM!
You'll also have to rebase on main, and change the way the docstrings are handled, as this was recently updated. You'll need to replace the @add_start_docstrings with @auto_docstring from ...utils, and move the docstrings of the custom args to the ViltFastImageProcessorKwargs class. You can look at other fast image processors to see how it's done.
Thanks again for the great work!

src/transformers/models/vilt/image_processing_vilt_fast.py

…rFast

devxaitist · 2025-05-13T13:38:04Z

Thank you for your continuous support and detailed reviews, @yonigozlan ! I've implemented all the suggested changes, including the reorder_images approach for both masks and images. The tests are passing smoothly, and the code looks much cleaner now.

I'm both excited and a bit nostalgic as we're approaching the final review. Your thorough feedback throughout this process has been incredibly valuable, and I've learned so much from our interactions. It's been a wonderful journey contributing to the transformers library, and I'm grateful for your patience and guidance.

Could you please take a look at these final changes when you have a moment? Thank you again for making this first open-source contribution such a meaningful experience!

yonigozlan

Very nice, LGTM!
I'm glad to hear you enjoyed working on this PR, and congrats on your first open-source contribution! 🤗

HuggingFaceDocBuilderDev · 2025-05-13T15:41:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* init vilt image processor fast * Refactor image processor tests to use loop for all processors * Add ViltImageProcessorFast with PyTorch-based optimized image processing * Change made automatically by make fixup command * Change made automatically by make fix-copies command * Fix type hints in ViltImageProcessorFast for Python compatibility * Define constants for image resizing based on COCO dataset aspect ratio * Add missing property initializations to ViltImageProcessorFast * Extract resize logic into dedicated method in ViltImageProcessorFast * Extract padding logic into dedicated method * Implement shape-based image grouping for optimized processing in Vilt * Update test suite to verify ViltImageProcessorFast attributes * Move variable declarations to _preprocess method parameters * Remove unused parameters * Rename _resize method to resize to override existing function * Remove whitespace * Remove unnecessary type check and conversion for stacked_images * Remove redundant loop and apply padding directly to stacked images * Refactor pad function to return images and mask as tuple instead of dict * Add tests comparing padding masks in slow and fast implementations * Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations * Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast * Move docstrings of custom args to ViltFastImageProcessorKwargs * Use reorder_images function for both masks and images --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

github-actions bot marked this pull request as draft April 5, 2025 15:35

devxaitist changed the title ~~Add vilt fast image processor~~ Add Fast Image Processor for vilt Apr 5, 2025

devxaitist force-pushed the add-vilt-fast-image-processor branch from 73dc130 to a3777b6 Compare April 5, 2025 15:47

yonigozlan mentioned this pull request Apr 7, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Open

78 tasks

devxaitist marked this pull request as ready for review April 9, 2025 14:29

github-actions bot requested review from ydshieh and yonigozlan April 9, 2025 14:30

devxaitist force-pushed the add-vilt-fast-image-processor branch from 40019bf to 80fa011 Compare April 9, 2025 15:02

devxaitist added 6 commits April 10, 2025 00:13

init vilt image processor fast

d324e9f

Refactor image processor tests to use loop for all processors

537fc65

Add ViltImageProcessorFast with PyTorch-based optimized image processing

f63b103

Change made automatically by make fixup command

9672598

Change made automatically by make fix-copies command

f165906

Fix type hints in ViltImageProcessorFast for Python compatibility

e7fba0b

devxaitist force-pushed the add-vilt-fast-image-processor branch from 80fa011 to 53a4b63 Compare April 9, 2025 15:13

Define constants for image resizing based on COCO dataset aspect ratio

fe9cff4

devxaitist force-pushed the add-vilt-fast-image-processor branch from 53a4b63 to fe9cff4 Compare April 9, 2025 15:23

yonigozlan reviewed Apr 9, 2025

View reviewed changes

Add missing property initializations to ViltImageProcessorFast

ff1527a

devxaitist force-pushed the add-vilt-fast-image-processor branch from 51a9a2b to d5a82e8 Compare April 28, 2025 13:40

devxaitist added 4 commits April 28, 2025 22:47

Extract resize logic into dedicated method in ViltImageProcessorFast

f6e5150

Extract padding logic into dedicated method

08cf9a5

Implement shape-based image grouping for optimized processing in Vilt

5b4ba5d

Update test suite to verify ViltImageProcessorFast attributes

817cc72

devxaitist force-pushed the add-vilt-fast-image-processor branch from d5a82e8 to 817cc72 Compare April 28, 2025 13:47

devxaitist and others added 2 commits April 28, 2025 22:51

Merge branch 'main' into add-vilt-fast-image-processor

db3297b

# Conflicts: # src/transformers/__init__.py # src/transformers/utils/dummy_torchvision_objects.py

Merge branch 'main' into add-vilt-fast-image-processor

b6193ad

yonigozlan reviewed Apr 28, 2025

View reviewed changes

devxaitist force-pushed the add-vilt-fast-image-processor branch from cbff2ff to b6193ad Compare April 29, 2025 12:49

devxaitist added 6 commits April 29, 2025 21:50

Merge branch 'main' into add-vilt-fast-image-processor

bdbdc71

Move variable declarations to _preprocess method parameters

5f5cae4

Remove unused parameters

cee0161

Rename _resize method to resize to override existing function

20f9381

Remove whitespace

6802f39

Remove unnecessary type check and conversion for stacked_images

8fbb77b

devxaitist force-pushed the add-vilt-fast-image-processor branch 2 times, most recently from d500b11 to cbc4201 Compare April 29, 2025 14:06

devxaitist added 4 commits April 29, 2025 23:20

Remove redundant loop and apply padding directly to stacked images

6fc966b

Refactor pad function to return images and mask as tuple instead of dict

9d2653b

Add tests comparing padding masks in slow and fast implementations

137eb34

Update ViltImageProcessor tests to ensure compatibility between slow …

4c981ec

…and fast implementations

devxaitist force-pushed the add-vilt-fast-image-processor branch from cbc4201 to 4c981ec Compare April 29, 2025 14:20

yonigozlan reviewed May 9, 2025

View reviewed changes

devxaitist added 4 commits May 13, 2025 21:48

Merge branch 'main' into add-vilt-fast-image-processor

627a69b

Replace add_start_docstrings with auto_docstring in ViltImageProcesso…

8b27866

…rFast

Move docstrings of custom args to ViltFastImageProcessorKwargs

e67cb3d

Use reorder_images function for both masks and images

c85e8d4

yonigozlan approved these changes May 13, 2025

View reviewed changes

Merge branch 'main' into add-vilt-fast-image-processor

8440bb3

yonigozlan enabled auto-merge (squash) May 13, 2025 15:31

yonigozlan merged commit 342961f into huggingface:main May 13, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Fast Image Processor for vilt #37304

Add Fast Image Processor for vilt #37304

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add Fast Image Processor for vilt #37304

Add Fast Image Processor for vilt #37304

Uh oh!

Conversation

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!