cs.LG, math.OC, stat.ML

Offline-Online Reinforcement Learning for Linear Mixture MDPs

arXiv:2604.11994v1 Announce Type: cross
Abstract: We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may…