MiVE: Multiscale Vision-language features for reference-guided video Editing
arXiv:2605.14664v1 Announce Type: new
Abstract: Reference-guided video editing takes a source video, a text instruction, and a reference image as inputs, requiring the model to faithfully apply the instructed edits while preserving original motion and…