Pedro R. Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Pedro R. Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida / April 20, 2026

arXiv:2507.18756v2 Announce Type: replace
Abstract: Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing bet…

Author name: Pedro R. Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation