MoRFI: Monotonic Sparse Autoencoder Feature Identification
arXiv:2604.26866v1 Announce Type: new
Abstract: Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the…