Boyuan Sun, Jiaxing Zhao, Xiang Chen, Xihan Wei, Qibin Hou

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Boyuan Sun, Jiaxing Zhao, Xiang Chen, Xihan Wei, Qibin Hou / April 21, 2026

arXiv:2501.05067v3 Announce Type: replace
Abstract: In this paper, we introduce LLaVA-Octopus, a novel video multimodal large language model. LLaVA-Octopus adaptively weights features from different visual projectors based on user instructions, enabli…

Author name: Boyuan Sun, Jiaxing Zhao, Xiang Chen, Xihan Wei, Qibin Hou

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding