cs.LG

Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

arXiv:2604.15705v1 Announce Type: new
Abstract: Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Ne…