8000 Add support for Florence-2 by hlky · Pull Request #34160 · huggingface/transformers · GitHub
[go: up one dir, main page]

Skip to content

Add support for Florence-2 #34160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Add support for Florence-2 #34160

wants to merge 19 commits into from

Conversation

hlky
Copy link
Contributor
@hlky hlky commented Oct 14, 2024

What does this PR do?

This PR adds support for Florence-2.

Compared to the existing remote code the main difference is removal of MySequential, removal of einops.rearrange and trunc_normal_ and DropPath are copied from timm.

Fixes #34155

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@hlky hlky force-pushed the add-florence2 branch 2 times, most recently from 033979d to af0fd69 Compare October 14, 2024 16:10
@LysandreJik
Copy link
Member

Thanks! @Rocketknight1 would you be down to do a first review?

@Rocketknight1
Copy link
Member

Sure! Taking it today.

Copy link
Member
@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I just finished a partial review of this PR. However, I've realized that probably over 50% of the modelling code is copied from BART. We actually have a brand new method for handling cases like this, to both simplify the library and the PR! That method is Modular transformers.

If you want to try this in modular transformers style, you can replace the modeling_florence2.py file with modular_florence2.py. In this file, you can import modules from other classes in transformers, like BartEncoderLayer, and then simply define your layers like this, without any Copied from statements:

class Florence2EncoderLayer(BartEncoderLayer):
    pass

Thus, the only layers the modular_florence2.py file needs to contain are the ones that are unique, and everything else can be imported. You can see an example of this in action in this PR. If you do it this way, then the modeling_florence2.py will be autogenerated for you, and we don't need to review it in detail. You can leave the tokenization/processing files as-is for now.

Would you be willing to refactor this PR to modular style? We'd be happy to help, and it should make things much simpler.

Some other comments below as well, but you can ignore the copied from comments if you refactor to modular style - except as a guide for which classes to inherit from in your modular file!

Comment on lines 82 to 89
patch_size=[7, 3, 3, 3],
patch_stride=[4, 2, 2, 2],
patch_padding=[3, 1, 1, 1],
patch_prenorm=[False, True, True, True],
dim_embed=[256, 512, 1024, 2048],
num_heads=[8, 16, 32, 64],
num_groups=[8, 16, 32, 64],
depths=[1, 1, 9, 1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists as default arguments are highly dangerous, because they're mutable, and the same mutable list will be shared by the init method and any objects of this class. For example:

config = Florence2VisionConfig()
config.patch_size[3] = 4  # This actually mutates the same list object held by the init method!
new_config = Florence2VisionConfig()  # The new config will inherit the mutated patch size!
new_config.patch_size[3] = 5  # This will affect both the init and the first config object!

I recommend either:

  1. Replace default list args with default tuples, which are immutable, and then convert them to list() in the body of the method. This will create a new list each time, so there will be no shared list to be mutated.
  2. Replace default list args with None, and then in the body of the method, check for a None value and create a list with the default value if it's present. Again, this ensures there's no shared list to be mutated.

projection_dim=1024,
visual_temporal_embedding=None,
image_pos_embed=None,
image_feature_source=["spatial_avg_pool", "temporal_avg_pool"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a list as a default argument - it's less likely to be mutated, but still kind of dangerous.

if vision_config is not None:
vision_config = PretrainedConfig(**vision_config)
self.vision_config = vision_config
self.vocab_size = self.vocab_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.vocab_size = self.vocab_size

Redundant line

[1, T, D] or [T, D].
"""
shape_len = len(seq_embeds.shape)
assert 2 <= shape_len <= 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer to avoid asserts in favour of errors, like if ...: raise ValueError()

shape_len = len(seq_embeds.shape)
assert 2 <= shape_len <= 3
len_seq = seq_embeds.size(-2)
assert len_seq <= self.max_seq_len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More asserts

return super().forward(input_ids) * self.embed_scale


class Florence2Attention(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

return attn_output, attn_weights_reshaped, past_key_value


class Florence2FlashAttention2(Florence2Attention):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

return attn_output, attn_weights, past_key_value


class Florence2SdpaAttention(Florence2Attention):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

8000
}


class Florence2EncoderLayer(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

return outputs


class Florence2DecoderLayer(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

@hlky
Copy link
Contributor Author
hlky commented Oct 15, 2024

Thanks for the review, I'll rework this using modular.

@hlky hlky marked this pull request as draft October 16, 2024 17:03
@hlky
Copy link
Contributor Author
hlky commented Oct 16, 2024
Traceback (most recent call last):
  File "utils/check_modular_conversion.py", line 73, in <module>
    non_matching_files += compare_files(modular_file_path, args.fix_and_overwrite)
  File "utils/check_modular_conversion.py", line 53, in compare_files
    generated_modeling_content = convert_modular_file(modular_file_path)
  File "/home/user/transformers/utils/modular_model_converter.py", line 1141, in convert_modular_file
    wrapper.visit(cst_transformers)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/metadata/wrapper.py", line 204, in visit
    return self.module.visit(visitor)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/_nodes/module.py", line 89, in visit
    result = super(Module, self).visit(visitor)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/_nodes/base.py", line 236, in visit
    leave_result = visitor.on_leave(self, with_updated_children)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/_visitors.py", line 71, in on_leave
    updated_node = leave_func(original_node, updated_node)
  File "/home/user/transformers/utils/modular_model_converter.py", line 1115, in leave_Module
    self._recursively_add_all_new_needed_functions_in_files()
  File "/home/user/transformers/utils/modular_model_converter.py", line 1101, in _recursively_add_all_new_needed_functions_in_files
    dependency, body, self.all_definitions[dependency], parent=parent
KeyError: 'ValueError'

Seems to be some issue with modular conversion.

@Rocketknight1
Copy link
Member

@hlky we think we know what's happening there. There is a fix open for it, give us a little bit!

@Rocketknight1
Copy link
Member

@hlky modular conversions should be working now - I pulled in changes from another branch! Please let me know if you encounter any other problems.

To use the updated converter, pull the latest changes and then run python utils/modular_model_converter.py --files_to_parse src/transformers/models/florence2/modular_florence2.py to generate the modelling_florence2.py file.

Comment on lines 868 to 869
class Florence2Decoder(
Florence2LanguagePreTrainedModel,
Copy link
Contributor Author
@hlky hlky Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modular converter generates Florence2PreTrainedModel here instead of Florence2LanguagePreTrainedModel. I'll look into a fix, but given the similarity to Bart I'm thinking we can just use Bart directly in Florence2ForConditionalGeneration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep otherwise I can checkout the PR and push the fix! 🤗

return x, size


class DaViT(nn.Module):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we implement DaViT separately to follow the Transformers philosophy of "one model per file"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally lean towards yes, but I'm not sure how that interacts with modular transformers - cc @ArthurZucker

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda up to you, if you think DaVit will be used alone, maybe, but otherwise we can have FlorenceVit !

@ducviet00
Copy link
Contributor

Hi @hlky, could you provide an update on the status of this PR?
Since timm is compatible with transformers, would it be okay for me to continue integrating Florence-2 into transformers based on your work?

@hlky
Copy link
Contributor Author
hlky commented Jan 14, 2025

would it be okay for me to continue integrating Florence-2

Hi @ducviet00. That would be great, thank you! 🤗

@EFord36
Copy link
EFord36 commented Feb 4, 2025

Hi @hlky, could you provide an update on the status of this PR? Since timm is compatible with transformers, would it be okay for me to continue integrating Florence-2 into transformers based on your work?

Hi, how are you getting on with this?

No problem if there's no progress, just thought I'd check if you could use any help as I'm using Florence-2 for a project and support would be very helpful for me!

@ducviet00
Copy link
Contributor

@EFord36
I have a draft version that works well and is compatible with torch.compile.
However, it depends on #35314.
I'll make a PR soon

@ArthurZucker
Copy link
Collaborator
ArthurZucke 5 r 1028A commented Feb 6, 2025

In the mean time will review the PR to make sure it's ready! 🚀

@hlky hlky closed this Apr 15, 2025
@hlky hlky deleted the add-florence2 branch April 15, 2025 12:30
@atbe
Copy link
atbe commented May 12, 2025

Hi @hlky why was this PR closed? I noticed that the Florence2ForConditionalGeneration architecture support still hasn't made it into the latest transformers version and attempting to run the model with transformers results in the error

[VllmSidecar]: 2025-05-12T23:27:57.368Z 1 stderr: async with build_async_engine_client(args) as engine_client: {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.368Z 1 stderr: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.368Z 1 stderr: return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 259, in build_async_engine_client_from_engine_args
    mq_engine_client = await asyncio.get_running_loop().run_in_executor(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: result = self.fn(*self.args, **self.kwargs) {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/client.py", line 101, in __init__ {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: self.tokenizer = init_tokenizer_from_configs(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 101, in init_tokenizer_from_configs {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: return TokenizerGroup(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 23, in __init__
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 242, in get_tokenizer {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 221, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 973, in from_pretrained {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large.00d2f1570b00c6dea5df998f5635db96840436bc.configuration_florence2.Florence2Config'> to build an AutoTokenizer.
Model type should be one of AlbertConfig, AlignConfig, AriaConfig, BarkConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ClapConfig, CLIPConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, ColPaliConfig, ConvBertConfig, CpmAntConfig, CTRLConfig, Data2VecAudioConfig, Data2VecTextConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DiffLlamaConfig, DistilBertConfig, DPRConfig, ElectraConfig, Emu3Config, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FalconMambaConfig, FastSpeech2ConformerConfig, FlaubertConfig, FNetConfig, FSMTConfig, FunnelConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GroundingDinoConfig, GroupViTConfig, HeliumConfig, HubertConfig, IBertConfig, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InstructBlipVideoConfig, JambaConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MgpstrConfig, MistralConfig, MixtralConfig, MllamaConfig, MobileBertConfig, ModernBertConfig, MoonshineConfig, MoshiConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OmDetTurboConfig, OneFormerConfig, OpenAIGPTConfig, OPTConfig, Owlv2Config, OwlViTConfig, PaliGemmaConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, Pix2StructConfig, PixtralVisionConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2_5_VLConfig, Qwen2AudioConfig, Qwen2MoeConfig, Qwen2VLConfig, RagConfig, RealmConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SiglipConfig, Speech2TextConfig, Speech2Text2Config, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwitchTransformersConfig, T5Config, TapasConfig, TransfoXLConfig, TvpConfig, UdopConfig, UMT5Config, VideoLlavaConfig, ViltConfig, VipLlavaConfig, VisualBertConfig, VitsConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WhisperConfig, XCLIPConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig, ZambaConfig, Zamba2Config. {
  source: "vllm-sidecar",
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Florence-2
8 participants
0