Graph Learning Is Suboptimal in Causal Bandits
arXiv:2510.16811v3 Announce Type: replace
Abstract: We study regret minimization in causal bandits under causal sufficiency where the underlying causal structure is not known to the agent. Previous work has focused on identifying the reward’s parents …