DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents
arXiv:2604.24320v1 Announce Type: new
Abstract: Large language model (LLM) agents that follow the sequential “reason-then-act” paradigm have achieved superior performance in many complex tasks.However, these methods suffer from limited exploration and…