cs.IR, cs.LG

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

arXiv:2507.18756v2 Announce Type: replace
Abstract: Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing bet…