Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO
arXiv:2603.21016v2 Announce Type: replace-cross
Abstract: Large language models (LLMs) used for multiple-choice and pairwise evaluation tasks often exhibit selection bias due to non-semantic factors like option positions and label symbols. Existing in…