cs.CV

Rethinking Position Embedding as a Context Controller for Multi-Reference and Multi-Shot Video Generation

arXiv:2604.03738v1 Announce Type: new
Abstract: Recent proprietary models such as Sora2 demonstrate promising progress in generating multi-shot videos conditioned on multiple reference characters. However, academic research on this problem remains lim…