BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
arXiv:2605.07394v1 Announce Type: new
Abstract: Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In …