Before We Trust Them: Decision-Making Failures in Navigation of Foundation Models
arXiv:2601.05529v5 Announce Type: replace
Abstract: High success rates on navigation-related tasks do not necessarily translate into reliable decision making by foundation models. To examine this gap, we evaluate current models on six diagnostic tasks…