J. E. Dom\'inguez-Vidal

A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

J. E. Dom\'inguez-Vidal / April 2, 2026

arXiv:2604.01179v1 Announce Type: cross
Abstract: Foundation vision-language models are becoming increasingly relevant to robotics because they can provide richer semantic perception than narrow task-specific pipelines. However, their practical adopti…

Author name: J. E. Dom\'inguez-Vidal

A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems