CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs
arXiv:2604.22785v1 Announce Type: new
Abstract: Large language model (LLM) deployments increasingly rely on multi-agent architectures in which multiple models either compete through routing mechanisms or collaborate to produce a final answer. In both …