cs.CL, cs.LG

MoRFI: Monotonic Sparse Autoencoder Feature Identification

arXiv:2604.26866v1 Announce Type: new
Abstract: Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the…