RL makes MLLMs see better than SFT
arXiv:2510.16333v2 Announce Type: replace-cross
Abstract: A dominant assumption in Multimodal Language Model (MLLM) research is that its performance is largely inherited from the LLM backbone, given its immense parameter scale and remarkable capabilit…