cs.AI, cs.CL

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

arXiv:2512.08777v2 Announce Type: replace
Abstract: We propose a post-training method for lower-resource languages that preserves the fluency of language models even when aligned by disfluent reward models. Preference optimization is now a well-resear…