SAGA: A Robust Self-Attention and Goal-Aware Anchor-based Planner for Safe UAV Autonomous Navigation
arXiv:2605.02301v1 Announce Type: new
Abstract: Agile unmanned aerial vehicle (UAV) navigation in cluttered environments demands a planning architecture that is both computationally efficient and structurally expressive enough to reason over multiple feasible motions. This paper presents SAGA, a robust self-attention and goal-aware anchor-based planner for safe UAV autonomous navigation. SAGA formulates local planning as a one-stage joint regression-and-ranking problem over a fixed lattice of motion anchors. Given a depth image and a body-frame motion state, the planner predicts refined terminal states and planning scores for all anchors in a single forward pass, after which the best candidate is decoded into a dynamically feasible trajectory. The key idea of SAGA is to transform anchor-aligned features into geometry-aware tokens and perform cross-anchor global reasoning with self-attention. To preserve directional structure in the token space, we further introduce a polar positional encoding derived from anchor yaw and pitch. In addition, a goal-aware modulation module injects velocity, acceleration, and target information into the token representation before final score prediction. Experiments in cluttered pillar-map environments under maximum speed settings of 2.0, 3.0, and 4.0~m/s show that SAGA consistently achieves a 100\% success rate, while YOPO drops from 90.91\% to 62.50\%, Ego-planner from 71.43\% to 52.63\%, and Fast-planner from 52.63\% to 38.46\%. Under the 4.0~m/s maximum speed setting, SAGA also improves average safety from 1.9843~m to 2.3888~m and minimum safety from 0.4390~m to 0.7576~m over YOPO, while reducing total flight time from 40.4631~s to 27.4901~s. The comparison with SAGA w/o PPE further shows that explicit polar positional encoding is critical for stable cross-anchor reasoning and safe passage selection in cluttered scenes.