- Provide.ai - Page 14

Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines

/ April 28, 2026

arXiv:2604.23001v1 Announce Type: cross
Abstract: Despite remarkable progress in Vision–Language–Action (VLA) models, a central bottleneck remains underexamined: the data infrastructure that underlies embodied learning. In this survey, we argue that…

cs.CL

Evaluating Large Language Models on Computer Science University Exams in Data Structures

/ April 28, 2026

arXiv:2604.23347v1 Announce Type: new
Abstract: We present a comprehensive evaluation of Large Language Models (LLMs) on Computer Science (CS) Data Structure examination questions. Our work introduces a new benchmark dataset comprising exam questions …

cs.CL, cs.HC

VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

/ April 28, 2026

arXiv:2604.23356v1 Announce Type: new
Abstract: Large language models (LLMs) show promise in medical diagnosis, but real-world deployment remains challenging due to high-stakes clinical decisions and imperfect reasoning reliability. As a result, caref…

cs.CL

Beyond Local vs. External: A Game-Theoretic Framework for Trustworthy Knowledge Acquisition

/ April 28, 2026

arXiv:2604.23413v1 Announce Type: new
Abstract: Cloud-hosted Large Language Models (LLMs) offer unmatched reasoning capabilities and dynamic knowledge, yet submitting raw queries to these external services risks exposing sensitive user intent. Convers…

cs.AI, cs.CL

Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process

/ April 28, 2026

arXiv:2512.23213v3 Announce Type: replace-cross
Abstract: We propose LLM-PeerReview, an unsupervised LLM Ensemble method that selects the most ideal response from multiple LLM-generated candidates for each query, harnessing the collective wisdom of mu…

cs.AI, cs.CL

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

/ April 28, 2026

arXiv:2604.23530v1 Announce Type: cross
Abstract: Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs….

cs.CL, cs.SE

SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

/ April 28, 2026

arXiv:2601.16746v3 Announce Type: replace-cross
Abstract: LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While vario…

cs.CL, cs.CV

LinguDistill: Recovering Linguistic Ability in Vision-Language Models via Selective Cross-Modal Distillation

/ April 28, 2026

arXiv:2604.00829v3 Announce Type: replace-cross
Abstract: Adapting pretrained language models (LMs) into vision-language models (VLMs) can degrade their native linguistic capability due to representation shift and cross-modal interference introduced d…

cs.AI, cs.CL

Agentic clinical reasoning over longitudinal myeloma records: a retrospective evaluation against expert consensus

/ April 28, 2026

arXiv:2604.24473v1 Announce Type: new
Abstract: Multiple myeloma is managed through sequential lines of therapy over years to decades, with each decision depending on cumulative disease history distributed across dozens to hundreds of heterogeneous cl…

cs.AI, cs.CL, cs.LG

Quantifying and Improving the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data

/ April 28, 2026

arXiv:2503.05587v3 Announce Type: replace-cross
Abstract: Robustness has become a critical attribute for the deployment of RAG systems in real-world applications. Existing research focuses on robustness to explicit noise (e.g., document semantics) but…