Toy Experiment: Frozen Pythia-70M Using Forward-Derived Fast Memory for Contextual One-Shot Recall
I have been running a small research/toy experiment around fast memory on top of a frozen open-weight transformer.
The motivation is simple: normal transformer learning requires backprop and weight updates, but in-context adaptation feels more like temporary forward-pass memory. I wanted to test whether a frozen model exposes enough geometry that a small external memory can do limited one-shot binding without changing the transformer weights.
Setup
- Model: frozen EleutherAI/pythia-70m
- No transformer weights updated during recall
- Task: invented symbolic bindings
- Answers are one-token labels like
red,blue,cat,dog - Memory write sees the target answer
- Memory read does greedy generation from a separate question prompt
The memory value is computed from the output embedding geometry:
value = E[target] - sum_over_tokens p(token | h) * E[token] This is the cross-entropy output correction direction under tied embeddings. So instead of backpropagating through the whole model, the memory stores a forward-derived correction vector.
Mechanism
key: hidden geometry at the invented word token value: E[target] - E_p from the factual write statement read: cosine top-1 retrieval inject: add retrieved correction at the answer position during generation Example Task
Write examples:
In game A, blicket means red In game B, blicket means blue Read examples:
Question: in game A, what is blicket? Answer: red Question: in game B, what is blicket? Answer: blue So the same invented word can have two conflicting meanings depending on context.
Same-Context Write/Read Results
Frozen Pythia-70M, greedy exact match:
| Mode | Write | Read | Plain | Unrelated |
|---|---|---|---|---|
| both_top1 | 1.000 | 0.805 | 0.008 | 0.000 |
| context_gate | 1.000 | 0.801 | 0.000 | 0.000 |
| raw_both_top1 | 1.000 | 0.180 | 0.031 | 0.000 |
| average | 0.484 | 0.309 | 0.000 | 0.000 |
- both_top1: one combined memory containing both game A and game B facts, retrieve top-1 by learned key geometry.
- context_gate: explicit upper-bound gate selecting the correct context bank.
- raw_both_top1: raw hidden-state similarity instead of learned key projection.
- average: averages the conflicting memory values.
The interesting part is that both_top1 almost matched the explicit context_gate. That suggests the learned retrieval geometry was able to keep two conflicting meanings separated by context, without overwriting one with the other.
Context Generalization
I then tested context generalization. The projector was trained on game A / game B, but memory was written/read using new context names.
| Experiment | Mode | Read | Plain | Unrelated |
|---|---|---|---|---|
| same game A/B | both_top1 | 0.805 | 0.008 | 0.000 |
| same game A/B | context_gate | 0.801 | 0.000 | 0.000 |
| new game C/D | both_top1 | 0.602 | 0.031 | 0.000 |
| new game C/D | context_gate | 0.863 | 0.000 | 0.000 |
| new lab north/south | both_top1 | 0.340 | 0.023 | 0.000 |
| new lab north/south | context_gate | 0.668 | 0.000 | 0.000 |
So it partially generalizes, but it is fragile. Transfer to stylistically similar contexts like game C / game D works better than transfer to different context phrasing like lab north / lab south.
Current Interpretation
This does not solve continual learning. It is a toy task, the labels are one-token, and the key projector is trained with backprop. But it does suggest that frozen transformers expose useful local geometry for fast memory:
- Symbolic one-shot binding
- Contextual branching
- Avoiding unrelated/contextless activation
- Forward-derived answer correction without updating slow weights
The next experiment I am considering is a dual-key memory:
symbol key: which invented word is this? context key: which branch/world/frame is active? value: E[target] - E_p with retrieval something like:
score = symbol_similarity * context_similarity or a learned weighted version.
I am not claiming novelty here. I am mostly trying to understand whether this direction is mechanistically meaningful, and whether there is a useful bridge between activation steering, fast weights, and lightweight continual/in-context learning.
[link] [comments]