- Provide.ai - Page 557

SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

/ April 13, 2026

arXiv:2604.08988v1 Announce Type: new
Abstract: Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by static toolsets and episodic amnesia, failing to accumulate experience or optimize strategies …

cs.AI, cs.DC

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

/ April 13, 2026

arXiv:2604.09107v1 Announce Type: cross
Abstract: Modern LLM reinforcement learning (RL) workloads require a highly efficient weight transfer system to scale training across heterogeneous computational resources. However, existing weight transfer appr…

cs.AI, eess.AS

PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

/ April 13, 2026

arXiv:2604.09111v1 Announce Type: cross
Abstract: Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natu…

cs.AI

SAGE: A Service Agent Graph-guided Evaluation Benchmark

/ April 13, 2026

arXiv:2604.09285v1 Announce Type: new
Abstract: The development of Large Language Models (LLMs) has catalyzed automation in customer service, yet benchmarking their performance remains challenging. Existing benchmarks predominantly rely on static para…

cs.AI

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

/ April 13, 2026

arXiv:2604.09408v1 Announce Type: new
Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are incomplete or ambiguous. The bottleneck is not raw capability, but judgment: knowing when to ac…

cs.AI

Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization

/ April 13, 2026

arXiv:2602.02188v2 Announce Type: replace
Abstract: While large language models (LLMs) have shown strong performance in math and logic reasoning, their ability to handle combinatorial optimization (CO) — searching high-dimensional solution spaces und…

cs.AI, cs.MA

Memory Intelligence Agent

/ April 13, 2026

arXiv:2604.04503v3 Announce Type: replace
Abstract: Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evol…

cs.AI, cs.SI

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

/ April 13, 2026

arXiv:2502.08691v2 Announce Type: replace-cross
Abstract: Understanding human behavior and society is a central focus in social sciences, with the rise of generative social science marking a significant paradigmatic shift. By leveraging bottom-up simu…

cs.AI, cs.CL

Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

/ April 13, 2026

arXiv:2604.08723v1 Announce Type: new
Abstract: Preference optimization methods such as DPO and KTO are widely used for aligning language models, yet little is understood about what properties of preference data drive downstream reasoning gains. We as…

cs.AI, cs.DB

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

/ April 13, 2026

arXiv:2604.08552v1 Announce Type: cross
Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. When reporting guidelines exist, they typically lack machi…