GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification
arXiv:2603.29112v1 Announce Type: cross
Abstract: We introduce GISTBench, a benchmark for evaluating Large Language Models’ (LLMs) ability to understand users from their interaction histories in recommendation systems. Unlike traditional RecSys benchm…