Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees
arXiv:2604.14243v2 Announce Type: replace
Abstract: Real-world decision-making systems operate in environments where state transitions depend not only on the agent’s actions, but also on \textbf{exogenous factors outside its control}–competing agents…