- Provide.ai - Page 17

A Systematic Post-Train Framework for Video Generation

/ April 29, 2026

arXiv:2604.25427v1 Announce Type: new
Abstract: While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining perform…

cs.AI, cs.CV, cs.GR

Representation Paradigms in AI-based 3D Radiological Image Reconstruction: A Systematic Review

/ April 29, 2026

arXiv:2504.11349v3 Announce Type: replace
Abstract: The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D image reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has…

cs.AI, cs.CV

OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning

/ April 29, 2026

arXiv:2511.20211v2 Announce Type: replace
Abstract: Transparency-aware generation requires modeling not only RGB appearance but also alpha-based opacity and cross-layer composition, which are essential for tasks such as image matting, object removal, …

cs.CV

OneThinker: All-in-one Reasoning Model for Image and Video

/ April 29, 2026

arXiv:2512.03043v3 Announce Type: replace
Abstract: Reinforcement learning (RL) has recently achieved remarkable success in eliciting visual reasoning within Multimodal Large Language Models (MLLMs). However, existing approaches typically train separa…

cs.CV

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

/ April 29, 2026

arXiv:2604.25636v1 Announce Type: new
Abstract: Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their …

cs.CV

Detecting Dental Landmarks from Intraoral 3D Scans: the 3DTeethLand challenge

/ April 29, 2026

arXiv:2512.08323v2 Announce Type: replace
Abstract: Teeth landmark detection is a key task in modern orthodontics, supporting advanced diagnosis, personalized treatment planning, and effective monitoring of treatment progress. However, several signifi…

cs.CV

AdaTooler-V: Adaptive Tool-Use for Images and Videos

/ April 29, 2026

arXiv:2512.16918v3 Announce Type: replace
Abstract: Recent advances have shown that multimodal large language models (MLLMs) benefit from multimodal interleaved chain-of-thought (CoT) with vision tool interactions. However, existing open-source models…

cs.CV

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

/ April 29, 2026

arXiv:2604.00270v3 Announce Type: replace
Abstract: Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) …

cs.AI, cs.CV

Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

/ April 29, 2026

arXiv:2604.07802v3 Announce Type: replace
Abstract: Large-scale vision-language models (VLMs) exhibit remarkable zero-shot capabilities, yet the internal mechanisms driving their anomaly detection (AD) performance remain poorly understood. Current met…

cs.CV

NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge Report

/ April 29, 2026

arXiv:2604.17070v2 Announce Type: replace
Abstract: This report presents the NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge, which targets automatic rip current understanding in images. Rip currents are hazardous nearshore flo…