cs.AI

PilotBench: A Benchmark for General Aviation Agents with Safety Constraints

arXiv:2604.08987v1 Announce Type: new
Abstract: As Large Language Models (LLMs) advance toward embodied AI agents operating in physical environments, a fundamental question emerges: can models trained on text corpora reliably reason about complex phys…