cs.AI, cs.CV, cs.RO

A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

arXiv:2604.01179v1 Announce Type: cross
Abstract: Foundation vision-language models are becoming increasingly relevant to robotics because they can provide richer semantic perception than narrow task-specific pipelines. However, their practical adopti…