- Provide.ai - Page 476

Rethinking Prospect Theory for LLMs: Revealing the Instability of Decision-Making under Epistemic Uncertainty

/ April 13, 2026

arXiv:2508.08992v3 Announce Type: replace
Abstract: Prospect Theory (PT) models human decision-making behaviour under uncertainty, among which linguistic uncertainty is commonly adopted in real-world scenarios. Although recent studies have developed s…

cs.AI, cs.CR, cs.SE

DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation

/ April 13, 2026

arXiv:2604.09089v1 Announce Type: cross
Abstract: Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using superv…

cs.AI

SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

/ April 13, 2026

arXiv:2604.08988v1 Announce Type: new
Abstract: Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by static toolsets and episodic amnesia, failing to accumulate experience or optimize strategies …

cs.AI, cs.DC

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

/ April 13, 2026

arXiv:2604.09107v1 Announce Type: cross
Abstract: Modern LLM reinforcement learning (RL) workloads require a highly efficient weight transfer system to scale training across heterogeneous computational resources. However, existing weight transfer appr…

cs.AI, eess.AS

PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

/ April 13, 2026

arXiv:2604.09111v1 Announce Type: cross
Abstract: Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natu…

cs.AI

SAGE: A Service Agent Graph-guided Evaluation Benchmark

/ April 13, 2026

arXiv:2604.09285v1 Announce Type: new
Abstract: The development of Large Language Models (LLMs) has catalyzed automation in customer service, yet benchmarking their performance remains challenging. Existing benchmarks predominantly rely on static para…

cs.AI

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

/ April 13, 2026

arXiv:2604.09408v1 Announce Type: new
Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are incomplete or ambiguous. The bottleneck is not raw capability, but judgment: knowing when to ac…

cs.AI

Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization

/ April 13, 2026

arXiv:2602.02188v2 Announce Type: replace
Abstract: While large language models (LLMs) have shown strong performance in math and logic reasoning, their ability to handle combinatorial optimization (CO) — searching high-dimensional solution spaces und…

cs.AI, cs.MA

Memory Intelligence Agent

/ April 13, 2026

arXiv:2604.04503v3 Announce Type: replace
Abstract: Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evol…

cs.AI, cs.SI

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

/ April 13, 2026

arXiv:2502.08691v2 Announce Type: replace-cross
Abstract: Understanding human behavior and society is a central focus in social sciences, with the rise of generative social science marking a significant paradigmatic shift. By leveraging bottom-up simu…