- Provide.ai - Page 89

Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks

/ May 11, 2026

arXiv:2605.07024v1 Announce Type: cross
Abstract: Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks — plausible but incorrect completions such as invented API methods, invalid parameters, un…

cs.CV

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

/ May 11, 2026

arXiv:2605.07915v1 Announce Type: new
Abstract: Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruc…

cs.CV

MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

/ May 11, 2026

arXiv:2605.07919v1 Announce Type: new
Abstract: Medical vision–language models (VLMs) are usually evaluated on intact image–question pairs, but trustworthy clinical use requires a stronger property: a model must recognise when the evidential basis f…

cs.AI, cs.IR

LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation

/ May 11, 2026

arXiv:2605.07517v1 Announce Type: cross
Abstract: Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat natu…

cs.CV

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

/ May 11, 2026

arXiv:2604.19697v2 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuab…

cs.CV

SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration

/ May 11, 2026

arXiv:2605.07346v1 Announce Type: new
Abstract: Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer…

cs.LG

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

/ May 11, 2026

arXiv:2605.07039v1 Announce Type: new
Abstract: Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and r…

cs.AI, cs.LG

EviDep: Trustworthy Multimodal Depression Estimation via Disentangled Evidential Learning

/ May 11, 2026

arXiv:2604.16579v2 Announce Type: replace-cross
Abstract: Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, how…

cs.AI, cs.CV

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

/ May 11, 2026

arXiv:2605.00814v2 Announce Type: replace
Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a “Visual Signal Dilution” phenomenon, where the accumulation of textual hi…

cs.CV

Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis

/ May 11, 2026

arXiv:2605.02357v2 Announce Type: replace
Abstract: In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tas…