Distributional Alignment Games for Answer-Level Fine-Tuning
arXiv:2604.27166v1 Announce Type: new
Abstract: We focus on the problem of \emph{Answer-Level Fine-Tuning} (ALFT), where the goal is to optimize a language model based on the correctness or properties of its final answers, rather than the specific rea…