Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs
arXiv:2509.17314v3 Announce Type: replace-cross
Abstract: Software increasingly relies on the emergent capabilities of Large Language Models (LLMs), from natural language understanding to program analysis and generation. Yet testing them on specific t…