LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
arXiv:2604.14140v1 Announce Type: cross
Abstract: As language models are increasingly deployed for complex autonomous tasks, their ability to reason accurately over longer horizons becomes critical. An essential component of this ability is planning a…