-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Add Fast Image Processor for vilt #37304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fast Image Processor for vilt #37304
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
73dc130
to
a3777b6
Compare
40019bf
to
80fa011
Compare
80fa011
to
53a4b63
Compare
53a4b63
to
fe9cff4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @devxaitist , thanks for contributing, Looks great! Some things to modifiy to make the processor cleaner
Hi @yonigozlan, thank you for pointing out the need to separate functionality according to the single responsibility principle. Breaking down the _preprocess function into specialized methods like _resize and _pad_batch was definitely the right approach. I appreciate your guidance on improving the code structure. I've implemented all the requested changes, so please review the updated code when you have a chance. |
51a9a2b
to
d5a82e8
Compare
d5a82e8
to
817cc72
Compare
# Conflicts: # src/transformers/__init__.py # src/transformers/utils/dummy_torchvision_objects.py
Hi @yonigozlan, I've updated the branch to reflect the latest changes in the main branch. When you have a moment, I'd appreciate your review on the updated PR. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @devxaitist ! Still changes to be done, also it would be great to override the test_slow_fast_equivalence functions in the test to also compare the padding masks.
cbff2ff
to
b6193ad
Compare
d500b11
to
cbc4201
Compare
…and fast implementations
cbc4201
to
4c981ec
Compare
Thank you so much for your valuable feedback, @yonigozlan ! I've implemented all the requested changes, including overriding the test_slow_fast_equivalence functions to compare padding masks. I greatly appreciate your time and guidance throughout this process. This is my first open-source contribution, and your feedback has been incredibly helpful for my learning. When you have a moment, I'd be grateful if you could review these latest changes. Thank you again for your patience and support! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @devxaitist looks much better! I suggested a small improvement in the padding function, other than that LGTM!
You'll also have to rebase on main, and change the way the docstrings are handled, as this was recently updated. You'll need to replace the @add_start_docstrings
with @auto_docstring
from ...utils
, and move the docstrings of the custom args to the ViltFastImageProcessorKwargs
class. You can look at other fast image processors to see how it's done.
Thanks again for the great work!
Thank you for your continuous support and detailed reviews, @yonigozlan ! I've implemented all the suggested changes, including the reorder_images approach for both masks and images. The tests are passing smoothly, and the code looks much cleaner now. I'm both excited and a bit nostalgic as we're approaching the final review. Your thorough feedback throughout this process has been incredibly valuable, and I've learned so much from our interactions. It's been a wonderful journey contributing to the transformers library, and I'm grateful for your patience and guidance. Could you please take a look at these final changes when you have a moment? Thank you again for making this first open-source contribution such a meaningful experience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, LGTM!
I'm glad to hear you enjoyed working on this PR, and congrats on your first open-source contribution! 🤗
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* init vilt image processor fast * Refactor image processor tests to use loop for all processors * Add ViltImageProcessorFast with PyTorch-based optimized image processing * Change made automatically by make fixup command * Change made automatically by make fix-copies command * Fix type hints in ViltImageProcessorFast for Python compatibility * Define constants for image resizing based on COCO dataset aspect ratio * Add missing property initializations to ViltImageProcessorFast * Extract resize logic into dedicated method in ViltImageProcessorFast * Extract padding logic into dedicated method * Implement shape-based image grouping for optimized processing in Vilt * Update test suite to verify ViltImageProcessorFast attributes * Move variable declarations to _preprocess method parameters * Remove unused parameters * Rename _resize method to resize to override existing function * Remove whitespace * Remove unnecessary type check and conversion for stacked_images * Remove redundant loop and apply padding directly to stacked images * Refactor pad function to return images and mask as tuple instead of dict * Add tests comparing padding masks in slow and fast implementations * Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations * Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast * Move docstrings of custom args to ViltFastImageProcessorKwargs * Use reorder_images function for both masks and images --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
What does this PR do?
Add ViltImageProcessorFast implementation for faster image processing using PyTorch. (#36978)
This PR adds a fast image processor for the Vilt model that leverages PyTorch and torchvision functions instead of PIL/numpy. The implementation improves performance by using tensor operations and enabling GPU processing.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yonigozlan
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.