Benchmarking local agent memory, 59% vs Zep’s 28% on LoCoMo, 71.5% on HotpotQA multi-hop

Benchmarking local agent memory, 59% vs Zep's 28% on LoCoMo, 71.5% on HotpotQA multi-hop

I'm the author of YourMemory (disclosing upfront). Sharing benchmark results because I couldn't find reproducible comparisons for agent memory retrieval, specifically on multi hop facts where vector search has a known blind spot.

The problem:
Bridge questions require two linked facts. Fact 2 has near zero cosine similarity to the query because it's about the answer entity in Fact 1, not the question itself. Vector search retrieves Fact 1 and stops.

How the retrieval stack works:

  • Round 1: 0.4 × BM25 + 0.6 × cosine
  • Round 2: spaCy NER entity graph traversal, once Fact 1 is retrieved, entity edges surface Fact 2 regardless of query similarity
  • Decay: strength = importance × e^(−λt) × (1 + recall_count × 0.2), memories below 0.05 pruned every 24h

Benchmark results:

LoCoMo-10 (1,534 QA pairs, 10 multi session conversations)

  • YourMemory: 59% vs Zep Cloud: 28%

LongMemEval-S (500 questions, ~53 haystack sessions each)

  • 84.8% recall-all@5

HotpotQA multi-hop (200 questions)

BOTH_FOUND@5
With entity graph 71.5%
Without entity graph 59.5%

+14pp on bridge questions specifically.

How are others handling bridge type retrieval in long running agents?

Website: https://yourmemoryai.xyz/

Dashboard Graphic

submitted by /u/Sufficient_Sir_5414
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top