cs.AI

Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

arXiv:2604.27637v1 Announce Type: new
Abstract: Current Large Language Model (LLM) evaluation frameworks utilize the same static prompt template across all models under evaluation. This differs from the common industry practice of using prompt optimiz…