cs.AI, cs.LG, q-bio.BM, q-bio.QM

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

arXiv:2604.13175v1 Announce Type: cross
Abstract: Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world appl…