Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks
arXiv:2605.07024v1 Announce Type: cross
Abstract: Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks — plausible but incorrect completions such as invented API methods, invalid parameters, un…