Project Shadows: Turns out "just add memory" doesn’t fix your agent

Been building a multi-agent system called Shadows for a few months. Nine agents collaborating on strategy work with a shared memory layer.

I spent most of my time on retrieval because that's what every benchmark measures. Mem0, MemPalace, Graphiti, all of them.

On LongMemEval, recall_all@5 hit 97%. Overall accuracy was 73%.

So the right memories are there. The agent still picks the wrong answer. It can't aggregate across sessions, doesn't know when to abstain, and guesses which aspect of a preference the user meant.

That lined up with something I've been stuck on. Most LLMs jump straight to execution when you give them a task. People don't. We filter first, check if we're even the right person, then start.

Next direction: Agents that can be moved with their identity and memory!

submitted by /u/MegaWa7edBas
[link] [comments]

Leave a Comment