Generating Leakage-Free Benchmarks for Robust RAG Evaluation
arXiv:2605.08838v1 Announce Type: cross
Abstract: Retrieval-augmented generation (RAG) is widely used to augment large language models (LLMs) with external knowledge. However, many benchmark datasets, designed to test RAG performance, comprise many qu…