So I’ve been nerding out hard about memory, and have come to the conclusion that context management is too high level and dynamically changing the weights would be best. Luckily, this morning I checked my news feed and saw this new paper! https://arxiv.org/abs/2605.12357
It improves model attention direction without using context or a lora with 20% better answers from their tests! It doesn’t use direct memory queries, but weighted attention direction.
I wanted to try it out on my MacMini 64g Apple silicon to see if it could improve answers. Local agents are already usable, but even a slight improvement would be huge!
I implemented it using mlx (way faster than ollama btw) and tested it with and without my openclaw session history.
https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw
Here’s the adaptor I made so it works with mlx: https://huggingface.co/ofthetrees/delta-mem-qwen3-4b-instruct-mlx-adapter
δ-mem paper results (Qwen3-4B-Instruct) showed solid gains:
δ-mem paper results (Qwen3-4B-Instruct) showed solid gains:
- Avg vs frozen backbone: `1.10x`
- MemoryAgentBench: `1.31x`
- LoCoMo: `1.20x`
Local normalized mlx tests were more mixed:
| Test | Plain | δ-mem | Ratio |
|---|---|---|---|
| Synthetic paper-style | `0.5129` | `0.5129` | `1.00x` |
| LoCoMo-10 mini | `0.0500` | `0.1833` | `3.67x` |
| OpenClaw replay | `0.5701` | `0.6667` | `1.17x` |
Latency costs:
| Test | Latency |
|---|---|
| Synthetic | `1.013x` |
| LoCoMo-10 mini | `1.33x` query / `1.50x` total |
| OpenClaw replay | `1.30x` |
I
Takeaway:
- Synthetic probes were flat.
- LoCoMo-mini showed surprisingly strong relative gains.
- OpenClaw-style replay showed a smaller but more practically meaningful improvement (`6/8 → 7/8` probes passed).
Overall the paper benchmarks look real, and local tests suggest δ-mem is doing something useful in realistic replay/memory scenarios.
Finally.. the lower results are expected as Apple Silicon cannot run CUDA efficiently. I really want to try it on latest greatest local model for me qwen3.6:27b for mlx, which needs an adaptor model trained. My current estimate is that would cost like 6k to run in the cloud and as I am unemployed (hire me) I cannot afford that rn. If someone with a huge computer wants to pick up where I left off, it’s nearly all there, just need to tweak adaption generation for new qwens attention structure. The original test was already in qwen so that helps a lot.
Thanks for reading! I’m proud of the project, which is my first groundbreaking in the field of open source ai!
[link] [comments]