- Provide.ai - Page 94

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

/ May 11, 2026

arXiv:2604.19697v2 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) have shown promising reasoning abilities, yet evaluating their performance in specialized domains remains challenging. STEM reasoning is a particularly valuab…

cs.AI, cs.CL

GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations

/ May 11, 2026

arXiv:2605.07053v1 Announce Type: new
Abstract: Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most robustness variants apply surface-le…

cs.CV

SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration

/ May 11, 2026

arXiv:2605.07346v1 Announce Type: new
Abstract: Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer…

cs.LG

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

/ May 11, 2026

arXiv:2605.07039v1 Announce Type: new
Abstract: Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and r…

cs.AI, cs.LG

EviDep: Trustworthy Multimodal Depression Estimation via Disentangled Evidential Learning

/ May 11, 2026

arXiv:2604.16579v2 Announce Type: replace-cross
Abstract: Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, how…

cs.AI, cs.CV

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

/ May 11, 2026

arXiv:2605.00814v2 Announce Type: replace
Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a “Visual Signal Dilution” phenomenon, where the accumulation of textual hi…

cs.CV, cs.RO

3D Generation for Embodied AI and Robotic Simulation: A Survey

/ May 11, 2026

arXiv:2604.26509v3 Announce Type: replace
Abstract: Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has …

cs.AI, physics.med-ph

Overcoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy

/ May 11, 2026

arXiv:2605.06820v1 Announce Type: cross
Abstract: Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robu…

cs.CV

Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis

/ May 11, 2026

arXiv:2605.02357v2 Announce Type: replace
Abstract: In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tas…

cs.AI, cs.LG

Exact Is Easier: Credit Assignment for Cooperative LLM Agents

/ May 11, 2026

arXiv:2603.06859v2 Announce Type: replace-cross
Abstract: Removing an agent from a cooperative team to measure its contribution seems natural, yet in multi-agent LLM systems this evaluation distorts the result it claims to measure. This failure is not…