Offline-Online Reinforcement Learning for Linear Mixture MDPs
arXiv:2604.11994v1 Announce Type: cross
Abstract: We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may…