Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, Natalia Trankova

Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits

Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, Natalia Trankova / April 17, 2026

arXiv:2604.14961v1 Announce Type: new
Abstract: Contextual bandit algorithms suffer from high regret during cold-start, when the learner has insufficient data to distinguish good arms from bad. We propose augmenting Disjoint LinUCB with LLM pseudo-obs…

Author name: Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, Natalia Trankova

Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits