Maheed H. Ahmed, Mahsa Ghasemi

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

Maheed H. Ahmed, Mahsa Ghasemi / May 5, 2026

arXiv:2605.01961v1 Announce Type: new
Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the av…

Author name: Maheed H. Ahmed, Mahsa Ghasemi

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare