$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution
arXiv:2604.01212v1 Announce Type: new
Abstract: As LLM agents tackle increasingly complex tasks, a critical question is whether they can maintain strategic coherence over long horizons: planning under uncertainty, learning from delayed feedback, and a…