I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token
Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON. The idea: every time GPT-2 generates a token, its residual stream gets passed through a Sparse Autoencoder (Joseph Blo…