cs.CV

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models

arXiv:2510.10606v4 Announce Type: replace
Abstract: Post-training Large Vision-and-Language Models (LVLMs) typically involves Supervised Fine-Tuning (SFT) for knowledge injection or Reinforcement Learning with Verifiable Rewards (RLVR) for performance…