Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective
arXiv:2605.15976v1 Announce Type: new
Abstract: Production machine translation relies overwhelmingly on encoder-decoder Seq2Seq models, yet reinforcement learning approaches to MT fine-tuning have largely targeted decoder-only LLMs at $\geq$7B paramet…