SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment
arXiv:2604.08988v1 Announce Type: new
Abstract: Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by static toolsets and episodic amnesia, failing to accumulate experience or optimize strategies …