- Provide.ai - Page 106

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

/ May 7, 2026

arXiv:2605.04641v1 Announce Type: new
Abstract: Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object halluc…

cs.AI, cs.CV

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

/ May 7, 2026

arXiv:2605.04702v1 Announce Type: new
Abstract: Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer…

cs.CV

Anny-Fit: All-Age Human Mesh Recovery

/ May 7, 2026

arXiv:2605.04728v1 Announce Type: new
Abstract: Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail…

cs.CV, eess.IV

External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images

/ May 7, 2026

arXiv:2605.05082v1 Announce Type: cross
Abstract: We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external v…

cs.CV, cs.SY, eess.SY

VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision

/ May 7, 2026

arXiv:2605.04750v1 Announce Type: new
Abstract: Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poo…

cs.CV

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

/ May 7, 2026

arXiv:2605.04772v1 Announce Type: new
Abstract: Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understand…

cs.CV

UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning

/ May 7, 2026

arXiv:2508.11196v2 Announce Type: replace
Abstract: Recent advances in vision-language models (VLMs) have demonstrated strong generalization in natural image tasks. However, their performance often degrades on unmanned aerial vehicle (UAV)-based aeria…

cs.CV, cs.LG

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

/ May 7, 2026

arXiv:2510.08431v3 Announce Type: replace
Abstract: Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-ima…

cs.CV

HistoMet: A Pan-Cancer Deep Learning Framework for Prognostic Prediction of Metastatic Progression and Site Tropism from Primary Tumor Histopathology

/ May 7, 2026

arXiv:2602.07608v2 Announce Type: replace
Abstract: Metastatic Progression remains the leading cause of cancer-related mortality, yet predicting whether a primary tumor will metastasize and where it will disseminate directly from histopathology remain…

cs.CV

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

/ May 7, 2026

arXiv:2605.04435v1 Announce Type: new
Abstract: Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-fr…