Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation
arXiv:2507.18756v2 Announce Type: replace
Abstract: Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing bet…