- Provide.ai - Page 58

Only Say What You Know: Calibration-Aware Generation for Long-Form Factuality

/ May 5, 2026

arXiv:2605.01749v1 Announce Type: new
Abstract: Large Reasoning Models achieve strong performance on complex tasks but remain prone to hallucinations, particularly in long-form generation where errors compound across reasoning steps. Existing approach…

cs.AI, cs.CL

Beyond Sentiment: A Multi-Agent Pipeline for Actionable Business Advice from Reviews

/ May 5, 2026

arXiv:2601.12024v2 Announce Type: replace
Abstract: Customer reviews contain valuable signals about service quality, but converting large-scale review corpora into actionable business recommendations remains difficult. Standard sentiment/aspect analys…

cs.AI, cs.CL

Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future

/ May 5, 2026

arXiv:2604.27924v2 Announce Type: replace-cross
Abstract: Peer review is a multi-stage process involving reviews, rebuttals, meta-reviews, final decisions, and subsequent manuscript revisions. Recent advances in large language models (LLMs) have motiv…

cs.AI, cs.CL

AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction

/ May 5, 2026

arXiv:2602.05353v3 Announce Type: replace
Abstract: Large Language Models have shown strong capabilities in complex problem solving, yet many agentic systems remain difficult to interpret and control due to opaque internal workflows. While some framew…

cs.CL

The Cylindrical Representation Hypothesis for Language Model Steering

/ May 5, 2026

arXiv:2605.01844v1 Announce Type: new
Abstract: Steering is a widely used technique for controlling large language models, yet its effects are often unstable and hard to predict. Existing theoretical accounts are largely based on the Linear Representa…

cs.CL, cs.CR

Watermarking LLM Agent Trajectories

/ May 5, 2026

arXiv:2602.18700v2 Announce Type: replace-cross
Abstract: LLM agents rely heavily on high-quality trajectory data to guide their problem-solving behaviors, yet producing such data requires substantial task design, high-capacity model generation, and m…

cs.CL, cs.CV

MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models

/ May 5, 2026

arXiv:2605.01520v1 Announce Type: cross
Abstract: Vision-Language Models (VLMs) frequently suffer from visual perception errors and hallucinations that compromise answer accuracy in complex reasoning tasks. Reinforcement Learning with Verifiable Rewar…

cs.AI, cs.CL, cs.LG

VeRO: An Evaluation Harness for Agents to Optimize Agents

/ May 5, 2026

arXiv:2602.22480v2 Announce Type: replace
Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks …

cs.CL, cs.IR

Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models

/ May 5, 2026

arXiv:2605.01591v1 Announce Type: cross
Abstract: Neural Ranking Models (NRMs) are central to modern information retrieval but remain highly vulnerable to adversarial manipulation. Existing attacks often rely on heuristics or surrogate models, limitin…

cs.CL, cs.CV

Medical thinking with multiple images

/ May 5, 2026

arXiv:2604.16506v2 Announce Type: replace-cross
Abstract: Large language models perform well on many medical QA benchmarks, but real clinical reasoning often requires integrating evidence across multiple images rather than interpreting a single view. …