Provide.ai - We Provide AI To Companies

DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding

/ March 26, 2026

arXiv:2603.11380v2 Announce Type: replace
Abstract: Fusing sensors with complementary modalities is crucial for maintaining a stable and comprehensive understanding of abnormal driving scenes. However, Multimodal Large Language Models (MLLMs) are unde…

cs.CV

CAKE: Real-time Action Detection via Motion Distillation and Background-aware Contrastive Learning

Hieu Hoang, Dung Trung Tran, Hong Nguyen, Nam-Phong Nguyen / March 26, 2026

arXiv:2603.23988v1 Announce Type: new
Abstract: Online Action Detection (OAD) systems face two primary challenges: high computational cost and insufficient modeling of discriminative temporal dynamics against background motion. Adding optical flow cou…

cs.CV

Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

/ March 26, 2026

arXiv:2602.21917v2 Announce Type: replace
Abstract: Ultra-High-Definition (UHD) image restoration is trapped in a scalability crisis: existing models, bound to pixel-wise operations, demand unsustainable computation. While state space models (SSMs) li…

cs.CV

SilLang: Improving Gait Recognition with Silhouette Language Encoding

Ruiyi Zhan, Guozhen Peng, Canyu Chen, Jian Lei, Annan Li / March 26, 2026

arXiv:2603.23976v1 Announce Type: new
Abstract: Gait silhouettes, which can be encoded into binary gait codes, are widely adopted to representing motion patterns of pedestrian. Recent approaches commonly leverage visual backbones to encode gait silhou…

cs.CV

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

/ March 26, 2026

arXiv:2602.18996v2 Announce Type: replace
Abstract: We study the task of establishing object-level visual correspondence across different viewpoints in videos, focusing on the challenging egocentric-to-exocentric and exocentric-to-egocentric scenarios…

cs.CV

HyDRA: Hybrid Domain-Aware Robust Architecture for Heterogeneous Collaborative Perception

Minwoo Song, Minhee Kang, Heejin Ahn / March 26, 2026

arXiv:2603.23975v1 Announce Type: new
Abstract: In collaborative perception, an agent’s performance can be degraded by heterogeneity arising from differences in model architecture or training data distributions. To address this challenge, we propose H…

cs.CV

Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

Haoyuan Li, Qihang Cao, Tao Tang, Kun Xiang, Zihan Guo, Jianhua Han, Hang Xu, Xiaodan Liang / March 26, 2026

arXiv:2602.06037v3 Announce Type: replace
Abstract: Recent progress in spatial reasoning with Multimodal Large Language Models (MLLMs) increasingly leverages geometric priors from 3D encoders. However, most existing integration strategies remain passi…

cs.CV, cs.GR, cs.RO

SLAT-Phys: Fast Material Property Field Prediction from Structured 3D Latents

Rocktim Jyoti Das, Dinesh Manocha / March 26, 2026

arXiv:2603.23973v1 Announce Type: new
Abstract: Estimating the material property field of 3D assets is critical for physics-based simulation, robotics, and digital twin generation. Existing vision-based approaches are either too expensive and slow or …

cs.CV

Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection

Jielun Peng, Yabin Wang, Yaqi Li, Long Kong, Xiaopeng Hong / March 26, 2026

arXiv:2603.23960v1 Announce Type: new
Abstract: The rapid progress of generative AI has enabled hyper-realistic audio-visual deepfakes, intensifying threats to personal security and social trust. Most existing deepfake detectors rely either on uni-mod…

cs.CV

Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach

Masashi Hatano, Saptarshi Sinha, Jacob Chalk, Wei-Hong Li, Hideo Saito, Dima Damen / March 26, 2026

arXiv:2512.16456v2 Announce Type: replace
Abstract: Human motion generation is a challenging task that aims to create realistic motion imitating natural human behaviour. We focus on the well-studied behaviour of priming an object/location for pick up …