S-GRPO: Unified Post-Training for Large Vision-Language Models
arXiv:2604.16557v1 Announce Type: new
Abstract: Current post-training methodologies for adapting Large Vision-Language Models (LVLMs) generally fall into two paradigms: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Despite their preval…