cs.AI, q-fin.CP

ValueBlindBench: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable

arXiv:2604.25224v2 Announce Type: replace
Abstract: LLM-based financial agents increasingly produce investment rationales before the outcomes needed to evaluate them are observable. This creates a delayed-ground-truth evaluation problem: realized retu…