- Provide.ai - Page 30

M$^2$E-UAV: A Benchmark and Analysis for Onboard Motion-on-Motion Event-Based Tiny UAV Detection

/ May 12, 2026

arXiv:2605.10496v1 Announce Type: new
Abstract: Tiny UAV detection from an onboard event camera is difficult when the observer and target move at the same time. In this motion-on-motion regime, ego-motion activates background edges across buildings, v…

cs.CV, cs.GR

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

/ May 12, 2026

arXiv:2601.22143v2 Announce Type: replace-cross
Abstract: Audio-Visual Foundation Models, which are pretrained to jointly generate sound and visual content, have recently shown an unprecedented ability to model multi-modal generation and editing, open…

cs.CV

CAST: Channel-Aware Spatial Transfer Learning with Pseudo-Image Radar for Sign Language Recognition

/ May 12, 2026

arXiv:2605.08663v1 Announce Type: new
Abstract: We propose CAST, a dual-stream architecture that utilizes channel-aware spatial transfer learning for isolated sign language recognition addressing the challenges of magnitude-only 60~GHz radar Range-Tim…

cs.CV, eess.IV

GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification

/ May 12, 2026

arXiv:2603.12800v2 Announce Type: replace-cross
Abstract: We propose glaucoma lesion evaluation and analysis with multimodal imaging (GLEAM), the first publicly available tri-modal glaucoma dataset comprising scanning laser ophthalmoscopy fundus image…

cs.CV

TIE: Time Interval Encoding for Video Generation over Events

/ May 12, 2026

arXiv:2605.10543v1 Announce Type: new
Abstract: Director-style prompting, robotic action prediction, and interactive video agents demand temporal grounding over concurrent events — a regime in which 68% of general clips and over 99% of robotics/gamep…

cs.CV

Thinking with Novel Views: A Systematic Analysis of Generative-Augmented Spatial Intelligence

/ May 12, 2026

arXiv:2605.10588v1 Announce Type: new
Abstract: Current Large Multimodal Models (LMMs) struggle with spatial reasoning tasks requiring viewpoint-dependent understanding, largely because they are confined to a single, static observation. We propose Thi…

cs.CV

SAMOFT: Robust Multi-Object Tracking via Region and Flow

/ May 12, 2026

arXiv:2605.09417v1 Announce Type: new
Abstract: Multi-object tracking (MOT) is a fundamental task in computer vision that requires continuously tracking multiple targets while maintaining consistent identities across frames. However, most existing app…

cs.CV

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation

/ May 12, 2026

arXiv:2605.08712v1 Announce Type: new
Abstract: Action-conditioned surgical video generation is a critical yet highly challenging problem for robotic surgery. The core difficulty is that low-dimensional control vectors must precisely govern complex im…

cs.CV

Qwen-Image-2.0 Technical Report

/ May 12, 2026

arXiv:2605.10730v1 Announce Type: new
Abstract: We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing m…

cs.CV

GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks

/ May 12, 2026

arXiv:2605.10645v1 Announce Type: new
Abstract: Data-driven medical AI is traditionally formulated as a discriminative mapping from input $X$ to output $Y$ via a learned function $f$, which does not generalize well across heterogeneous data and modali…