When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
arXiv:2605.07313v1 Announce Type: new
Abstract: Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant …