Learning When Not to Learn: Risk-Sensitive Abstention in Bandits with Unbounded Rewards
arXiv:2510.14884v3 Announce Type: replace
Abstract: In high-stakes AI applications, even a single action can cause irreparable damage. However, nearly all of sequential decision-making theory assumes that all errors are recoverable (e.g., by bounding …