Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions
arXiv:2604.22161v1 Announce Type: new
Abstract: We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O…