[go: up one dir, main page]

InteractEdit: Zero-Shot Editing of
Human-Object Interactions in Images

1Nanyang Technological University,  2Sun Yat-sen University,  3Nanjing Forestry University,  4Universiti Malaya
Work done at Nanyang Technological University.

TL;DR: We enables zero-shot human-object interaction edit

Existing methods overly preserve structural details from the source image, limiting their ability to accommodate the substantial non-rigid changes required for effective interaction edits. In contrast, Our proposed InteractEdit employs regularization techniques to constrain model updates, preserving pretrained target interaction knowledge and enabling zero-shot interaction edits while maintaining identity consistency.

Qualitative Results

1. Sample of Interaction Edits

Orange indicates target interaction.

dribble ball

hold skateboard

2. Editing Interactions Compared to Other Methods

Orange indicates target interaction.

Source Image

walk dog

Ours

wash dog

Source Image

walk dog

Ours

hug dog

Source Image

blow cake

Ours

hold cake

Source Image

blow cake

Ours

eat cake

Source Image

hold skateboard

Ours

jump skateboard

Source Image

hold skateboard

Ours

ride skateboard

Abstract

This paper presents InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing, addressing the challenging task of transforming an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Unlike simpler image editing scenarios such as attribute manipulation, object replacement or style transfer, HOI editing involves complex spatial, contextual, and relational dependencies inherent in humans-objects interactions. Existing methods often overfit to the source image structure, limiting their ability to adapt to the substantial structural modifications demanded by new interactions. To address this, InteractEdit decomposes each scene into subject, object, and background components, then employs Low-Rank Adaptation (LoRA) and selective fine-tuning to preserve pretrained interaction priors while learning the visual identity of the source image. This regularization strategy effectively balances interaction edits with identity consistency. We further introduce IEBench, the most comprehensive benchmark for HOI editing, which evaluates both interaction editing and identity preservation. Our extensive experiments show that InteractEdit significantly outperforms existing methods, establishing a strong baseline for future HOI editing research and unlocking new possibilities for creative and practical applications.

Method

HOI components are disassembled into subject, object, and background clues during inversion. LoRA regularization enables non-rigid edits by capturing essential attributes while ignoring fine-grained structural details. Selective fine-tuning preserves interaction priors while adapting to the source image's identity. Editing reassembles these components with the target interaction, using trained LoRA weights to guide the diffusion model.

BibTeX

If you use our work in your research, please cite:

@misc{hoe2025interactedit,
    title={InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images}, 
    author={Jiun Tian Hoe and Weipeng Hu and Wei Zhou and Chao Xie and Ziwei Wang and Chee Seng Chan and Xudong Jiang and Yap-Peng Tan},
    year={2025},
    eprint={2503.09130},
    archivePrefix={arXiv},
    primaryClass={cs.GR},
    url={https://arxiv.org/abs/2503.09130}, 
}