- Provide.ai - Page 4

Hidden States Know Where Reasoning Diverges: Credit Assignment via Span-Level Wasserstein Distance

/ April 28, 2026

arXiv:2604.23318v1 Announce Type: new
Abstract: Group Relative Policy Optimization (GRPO) performs coarse-grained credit assignment in reinforcement learning with verifiable rewards (RLVR) by assigning the same advantage to all tokens in a rollout. Pr…

cs.CL, cs.SD

Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss

/ April 28, 2026

arXiv:2604.23323v1 Announce Type: new
Abstract: Audio-text retrieval enables semantic alignment between audio content and natural language queries, supporting applications in multimedia search, accessibility, and surveillance. However, current state-o…

cs.CL

Evaluating Large Language Models on Computer Science University Exams in Data Structures

/ April 28, 2026

arXiv:2604.23347v1 Announce Type: new
Abstract: We present a comprehensive evaluation of Large Language Models (LLMs) on Computer Science (CS) Data Structure examination questions. Our work introduces a new benchmark dataset comprising exam questions …

cs.CL, cs.HC

VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

/ April 28, 2026

arXiv:2604.23356v1 Announce Type: new
Abstract: Large language models (LLMs) show promise in medical diagnosis, but real-world deployment remains challenging due to high-stakes clinical decisions and imperfect reasoning reliability. As a result, caref…

cs.CL

Beyond Local vs. External: A Game-Theoretic Framework for Trustworthy Knowledge Acquisition

/ April 28, 2026

arXiv:2604.23413v1 Announce Type: new
Abstract: Cloud-hosted Large Language Models (LLMs) offer unmatched reasoning capabilities and dynamic knowledge, yet submitting raw queries to these external services risks exposing sensitive user intent. Convers…

cs.AI, cs.CL

Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process

/ April 28, 2026

arXiv:2512.23213v3 Announce Type: replace
Abstract: We propose LLM-PeerReview, an unsupervised LLM Ensemble method that selects the most ideal response from multiple LLM-generated candidates for each query, harnessing the collective wisdom of multiple…

cs.AI, cs.CL

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

/ April 28, 2026

arXiv:2604.23530v1 Announce Type: new
Abstract: Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. H…

cs.CL, cs.SE

SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

/ April 28, 2026

arXiv:2601.16746v3 Announce Type: replace-cross
Abstract: LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While vario…

cs.CL, cs.CV

LinguDistill: Recovering Linguistic Ability in Vision-Language Models via Selective Cross-Modal Distillation

/ April 28, 2026

arXiv:2604.00829v3 Announce Type: replace-cross
Abstract: Adapting pretrained language models (LMs) into vision-language models (VLMs) can degrade their native linguistic capability due to representation shift and cross-modal interference introduced d…

cs.AI, cs.CL

Agentic clinical reasoning over longitudinal myeloma records: a retrospective evaluation against expert consensus

/ April 28, 2026

arXiv:2604.24473v1 Announce Type: cross
Abstract: Multiple myeloma is managed through sequential lines of therapy over years to decades, with each decision depending on cumulative disease history distributed across dozens to hundreds of heterogeneous …