Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
arXiv:2604.13304v1 Announce Type: cross
Abstract: Understanding the internal activations of Vision Transformers (ViTs) is critical for building interpretable and trustworthy models. While Sparse Autoencoders (SAEs) have been used to extract human-inte…