Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning
arXiv:2512.20220v2 Announce Type: replace
Abstract: We study offline multitask reinforcement learning in settings where multiple tasks share a low-rank representation of their action-value functions. In this regime, a learner is provided with fixed da…