A new paper argues that transformers naturally factor complex data into low-dimensional modules inside the residual stream. If true, this…
A new paper argues that transformers naturally factor complex data into low-dimensional modules inside the residual stream. If true, this…