Hulingxiao He, Zijun Geng, Yuxin Peng

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

Hulingxiao He, Zijun Geng, Yuxin Peng / April 28, 2026

arXiv:2602.07605v3 Announce Type: replace-cross
Abstract: Any entity in the visual world can be hierarchically grouped based on shared characteristics and mapped to fine-grained sub-categories. While Multi-modal Large Language Models (MLLMs) achieve s…

Author name: Hulingxiao He, Zijun Geng, Yuxin Peng

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning