FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning
arXiv:2605.09932v1 Announce Type: new
Abstract: Large language models can now process increasingly long inputs, yet their ability to effectively use information spread across long contexts remains limited. We trace this gap to how attention budget is …