NC2.5 ↔ HORIZON: On the Structural Reducibility of Long-Horizon Agent Failures to a Single…

The HORIZON benchmark (arXiv:2604.11978) documents empirical failures of long-horizon agentic systems across four cognitive domains. NC2.5…

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top