Uncertainty-Aware Variational Reward Factorization via Probabilistic Preference Bases for LLM Personalization
arXiv:2604.00997v1 Announce Type: new
Abstract: Reward factorization personalizes large language models (LLMs) by decomposing rewards into shared basis functions and user-specific weights. Yet, existing methods estimate user weights from scarce data i…