- Provide.ai - Page 20

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

/ April 29, 2026

arXiv:2604.25636v1 Announce Type: new
Abstract: Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their …

cs.CV

Detecting Dental Landmarks from Intraoral 3D Scans: the 3DTeethLand challenge

/ April 29, 2026

arXiv:2512.08323v2 Announce Type: replace
Abstract: Teeth landmark detection is a key task in modern orthodontics, supporting advanced diagnosis, personalized treatment planning, and effective monitoring of treatment progress. However, several signifi…

cs.CV

AdaTooler-V: Adaptive Tool-Use for Images and Videos

/ April 29, 2026

arXiv:2512.16918v3 Announce Type: replace
Abstract: Recent advances have shown that multimodal large language models (MLLMs) benefit from multimodal interleaved chain-of-thought (CoT) with vision tool interactions. However, existing open-source models…

cs.RO

RISE: Self-Improving Robot Policy with Compositional World Model

/ April 29, 2026

arXiv:2602.11075v2 Announce Type: replace
Abstract: Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviatio…

cs.CV

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

/ April 29, 2026

arXiv:2604.00270v3 Announce Type: replace
Abstract: Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) …

cs.RO

KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning

/ April 29, 2026

arXiv:2604.25788v1 Announce Type: new
Abstract: Robotic systems that interact with the physical world must reason about kinematic and dynamic constraints imposed by their own embodiment, their environment, and the task at hand. We introduce KinDER, a …

cs.CV, cs.RO

BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving

/ April 29, 2026

arXiv:2408.16322v4 Announce Type: replace-cross
Abstract: Current research in semantic bird’s-eye view segmentation for autonomous driving focuses solely on optimizing neural network models using a single dataset, typically nuScenes. This practice lea…

cs.AI, cs.CV

Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

/ April 29, 2026

arXiv:2604.07802v3 Announce Type: replace
Abstract: Large-scale vision-language models (VLMs) exhibit remarkable zero-shot capabilities, yet the internal mechanisms driving their anomaly detection (AD) performance remain poorly understood. Current met…

cs.CV

NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge Report

/ April 29, 2026

arXiv:2604.17070v2 Announce Type: replace
Abstract: This report presents the NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge, which targets automatic rip current understanding in images. Rip currents are hazardous nearshore flo…

cs.AI, cs.CV, cs.RO

From Scene to Object: Text-Guided Dual-Gaze Prediction

/ April 29, 2026

arXiv:2604.20191v2 Announce Type: replace-cross
Abstract: Interpretable driver attention prediction is crucial for human-like autonomous driving. However, existing datasets provide only scene-level global gaze rather than fine-grained object-level ann…