InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images

InteractEdit: Zero-Shot Editing of
Human-Object Interactions in Images

¹Nanyang Technological University, ²Sun Yat-sen University, ³Nanjing Forestry University, ⁴Universiti Malaya

^†Work done at Nanyang Technological University.

TL;DR: We enables zero-shot human-object interaction edit

Qualitative Results

1. Sample of Interaction Edits

Orange indicates target interaction.

dribble ball

hold ball

kick ball

throw ball

hold ball

kick ball

hold skateboard

sit on skateboard

jump skateboard

ride skateboard

jump skateboard

ride skateboard

jump skateboard

ride skateboard

2. Editing Interactions Compared to Other Methods

Orange indicates target interaction.

Source Image

walk dog

Ours

wash dog

NTI

InstructPix2Pix

SVDiff

InfEdit

CDS

TurboEdit

Break-A-Scene

Source Image

walk dog

Ours

hug dog

NTI

InstructPix2Pix

SVDiff

InfEdit

CDS

TurboEdit

Break-A-Scene

Source Image

blow cake

Ours

hold cake

NTI

InstructPix2Pix

SVDiff

InfEdit

CDS

TurboEdit

Break-A-Scene

Source Image

blow cake

Ours

eat cake

NTI

InstructPix2Pix

SVDiff

InfEdit

CDS

TurboEdit

Break-A-Scene

Source Image

hold skateboard

Ours

jump skateboard

NTI

InstructPix2Pix

SVDiff

InfEdit

CDS

TurboEdit

Break-A-Scene

Source Image

hold skateboard

Ours

ride skateboard

NTI

InstructPix2Pix

SVDiff

InfEdit

CDS

TurboEdit

Break-A-Scene

Abstract

This paper presents InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing, addressing the challenging task of transforming an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Unlike simpler image editing scenarios such as attribute manipulation, object replacement or style transfer, HOI editing involves complex spatial, contextual, and relational dependencies inherent in humans-objects interactions. Existing methods often overfit to the source image structure, limiting their ability to adapt to the substantial structural modifications demanded by new interactions. To address this, InteractEdit decomposes each scene into subject, object, and background components, then employs Low-Rank Adaptation (LoRA) and selective fine-tuning to preserve pretrained interaction priors while learning the visual identity of the source image. This regularization strategy effectively balances interaction edits with identity consistency. We further introduce IEBench, the most comprehensive benchmark for HOI editing, which evaluates both interaction editing and identity preservation. Our extensive experiments show that InteractEdit significantly outperforms existing methods, establishing a strong baseline for future HOI editing research and unlocking new possibilities for creative and practical applications.

Method

HOI components are disassembled into subject, object, and background clues during inversion. LoRA regularization enables non-rigid edits by capturing essential attributes while ignoring fine-grained structural details. Selective fine-tuning preserves interaction priors while adapting to the source image's identity. Editing reassembles these components with the target interaction, using trained LoRA weights to guide the diffusion model.

BibTeX

If you use our work in your research, please cite:

@misc{hoe2025interactedit, title={InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images}, author={Jiun Tian Hoe and Weipeng Hu and Wei Zhou and Chao Xie and Ziwei Wang and Chee Seng Chan and Xudong Jiang and Yap-Peng Tan}, year={2025}, eprint={2503.09130}, archivePrefix={arXiv}, primaryClass={cs.GR}, url={https://arxiv.org/abs/2503.09130}, }

InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images

Qualitative Results

1. Sample of Interaction Edits

2. Editing Interactions Compared to Other Methods

Abstract

Method

Related Links

BibTeX

InteractEdit: Zero-Shot Editing of
Human-Object Interactions in Images