8000 Add Fast Image Processor for vilt by devxaitist · Pull Request #37304 · huggingface/transformers · GitHub
[go: up one dir, main page]

Skip to content

Add Fast Image Processor for vilt #37304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 13, 2025

Conversation

devxaitist
Copy link
Contributor

What does this PR do?

Add ViltImageProcessorFast implementation for faster image processing using PyTorch. (#36978)

This PR adds a fast image processor for the Vilt model that leverages PyTorch and torchvision functions instead of PIL/numpy. The implementation improves performance by using tensor operations and enabling GPU processing.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@yonigozlan

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions bot marked this pull request as draft April 5, 2025 15:35
Copy link
Contributor
github-actions bot commented Apr 5, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@devxaitist devxaitist changed the title Add vilt fast image processor Add Fast Image Processor for vilt Apr 5, 2025
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from 73dc130 to a3777b6 Compare April 5, 2025 15:47
@devxaitist devxaitist marked this pull request as ready for review April 9, 2025 14:29
@github-actions github-actions bot requested review from ydshieh and yonigozlan April 9, 2025 14:30
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from 40019bf to 80fa011 Compare April 9, 2025 15:02
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from 80fa011 to 53a4b63 Compare April 9, 2025 15:13
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from 53a4b63 to fe9cff4 Compare April 9, 2025 15:23
Copy link
Member
@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @devxaitist , thanks for contributing, Looks great! Some things to modifiy to make the processor cleaner

@devxaitist
Copy link
Contributor Author
devxaitist commented Apr 10, 2025

Hi @devxaitist , thanks for contributing, Looks great! Some things to modifiy to make the processor cleaner

Hi @yonigozlan, thank you for pointing out the need to separate functionality according to the single responsibility principle. Breaking down the _preprocess function into specialized methods like _resize and _pad_batch was definitely the right approach. I appreciate your guidance on improving the code structure.

I've implemented all the requested changes, so please review the updated code when you have a chance.

@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from 51a9a2b to d5a82e8 Compare April 28, 2025 13:40
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from d5a82e8 to 817cc72 Compare April 28, 2025 13:47
devxaitist and others added 2 commits April 28, 2025 22:51
# Conflicts:
#	src/transformers/__init__.py
#	src/transformers/utils/dummy_torchvision_objects.py
@devxaitist
Copy link
Contributor Author

Hi @yonigozlan, I've updated the branch to reflect the latest changes in the main branch. When you have a moment, I'd appreciate your review on the updated PR. Thank you!

Copy link
Member
@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @devxaitist ! Still changes to be done, also it would be great to override the test_slow_fast_equivalence functions in the test to also compare the padding masks.

@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from cbff2ff to b6193ad Compare April 29, 2025 12:49
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch 2 times, most recently from d500b11 to cbc4201 Compare April 29, 2025 14:06
@devxaitist devxaitist force-pushed the add-vilt-fast-image-processor branch from cbc4201 to 4c981ec Compare April 29, 2025 14:20
@devxaitist
Copy link
Contributor Author

Thank you so much for your valuable feedback, @yonigozlan ! I've implemented all the requested changes, including overriding the test_slow_fast_equivalence functions to compare padding masks.

I greatly appreciate your time and guidance throughout this process. This is my first open-source contribution, and your feedback has been incredibly helpful for my learning.

When you have a moment, I'd be grateful if you could review these latest changes. Thank you again for your patience and support!

Copy link
Member
@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @devxaitist looks much better! I suggested a small improvement in the padding function, other than that LGTM!
You'll also have to rebase on main, and change the way the docstrings are handled, as this was recently updated. You'll need to replace the @add_start_docstrings with @auto_docstring from ...utils, and move the docstrings of the custom args to the ViltFastImageProcessorKwargs class. You can look at other fast image processors to see how it's done.
Thanks again for the great work!

@devxaitist
Copy link
Contributor Author

Thank you for your continuous support and detailed reviews, @yonigozlan ! I've implemented all the suggested changes, including the reorder_images approach for both masks and images. The tests are passing smoothly, and the code looks much cleaner now.

I'm both excited and a bit nostalgic as we're approaching the final review. Your thorough feedback throughout this process has been incredibly valuable, and I've learned so much from our interactions. It's been a wonderful journey contributing to the transformers library, and I'm grateful for your patience and guidance.

Could you please take a look at these final changes when you have a moment? Thank you again for making this first open-source contribution such a meaningful experience!

Copy link
Member
@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, LGTM!
I'm glad to hear you enjoyed working on this PR, and congrats on your first open-source contribution! 🤗

@yonigozlan yonigozlan enabled auto-merge (squash) May 13, 2025 15:31
@yonigozlan yonigozlan merged commit 342961f into huggingface:main May 13, 2025
20 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* init vilt image processor fast

* Refactor image processor tests to use loop for all processors

* Add ViltImageProcessorFast with PyTorch-based optimized image processing

* Change made automatically by make fixup command

* Change made automatically by make fix-copies command

* Fix type hints in ViltImageProcessorFast for Python compatibility

* Define constants for image resizing based on COCO dataset aspect ratio

* Add missing property initializations to ViltImageProcessorFast

* Extract resize logic into dedicated method in ViltImageProcessorFast

* Extract padding logic into dedicated method

* Implement shape-based image grouping for optimized processing in Vilt

* Update test suite to verify ViltImageProcessorFast attributes

* Move variable declarations to _preprocess method parameters

* Remove unused parameters

* Rename _resize method to resize to override existing function

* Remove whitespace

* Remove unnecessary type check and conversion for stacked_images

* Remove redundant loop and apply padding directly to stacked images

* Refactor pad function to return images and mask as tuple instead of dict

* Add tests comparing padding masks in slow and fast implementations

* Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations

* Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast

* Move docstrings of custom args to ViltFastImageProcessorKwargs

* Use reorder_images function for both masks and images

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0