- Provide.ai - Page 2

MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction

/ May 5, 2026

arXiv:2602.03668v2 Announce Type: replace
Abstract: Latent actions learned from diverse human videos serve as pseudo-labels for vision-language-action (VLA) pretraining, but provide effective supervision only if they remain informative about the under…

cs.RO

Evidence-Based Landing Site Selection and Vison-Based Landing for UAVs in Unstructured Environments

/ May 5, 2026

arXiv:2605.01432v1 Announce Type: new
Abstract: Autonomous landing in cluttered or unstructured environments remains a safety-critical challenge for unmanned aerial vehicles (UAVs), particularly under noisy perception caused by sensor uncertainty and …

cs.RO

High-Speed, Scalable Sensor Readout for Dexterous Robotic Hands via Shift-Register Multiplexing

/ May 5, 2026

arXiv:2605.01434v1 Announce Type: new
Abstract: Dexterous robotic hands require high-speed multimodal sensing across many degrees of freedom, yet existing readout architectures often impose trade-offs between sensor count, wiring complexity, and sampl…

cs.AI, cs.CV, cs.RO

IMPACT-HOI: Supervisory Control for Onset-Anchored Partial HOI Event Construction

/ May 5, 2026

arXiv:2605.01666v1 Announce Type: cross
Abstract: We present IMPACT-HOI, a mixed-initiative framework for annotating egocentric procedural video by constructing structured event graphs for Human-Object Interactions (HOI), motivated by the need for hig…

cs.RO

VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids

/ May 5, 2026

arXiv:2605.01518v1 Announce Type: new
Abstract: The ability to push large objects in a goal-directed manner using onboard egocentric perception is an essential skill for humanoid robots to perform complex tasks such as material handling in warehouses….

cs.AI, cs.RO

STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction

/ May 5, 2026

arXiv:2602.08245v2 Announce Type: replace
Abstract: Diffusion policies have recently emerged as a powerful paradigm for visuomotor control in robotic manipulation due to their ability to model the distribution of action sequences and capture multimoda…

cs.RO

Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids

/ May 5, 2026

arXiv:2603.02856v2 Announce Type: replace
Abstract: Realizing interactive whole-body control for multi-humanoid systems is critical for unlocking complex collaborative capabilities in shared environments. Although recent advancements have significantl…

cs.LG, cs.RO

Anticipation-VLA: Solving Long-Horizon Embodied Tasks via Anticipation-based Subgoal Generation

/ May 5, 2026

arXiv:2605.01772v1 Announce Type: new
Abstract: Vision-Language-Action (VLA) models have emerged as a powerful paradigm for embodied intelligence, enabling robots to perform tasks based on natural language instructions and current visual input. Howeve…

cs.CV, cs.RO

SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time

/ May 5, 2026

arXiv:2507.18713v2 Announce Type: replace-cross
Abstract: High-fidelity sensor simulation of light-based sensors such as cameras and LiDARs is critical for safe and accurate autonomy testing. Neural radiance field (NeRF)-based methods that reconstruct…

cs.AI, cs.RO

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

/ May 5, 2026

arXiv:2605.02037v1 Announce Type: new
Abstract: We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system int…