DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces
arXiv:2505.18441v2 Announce Type: replace
Abstract: Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings requires algorithm…