Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas

MoRFI: Monotonic Sparse Autoencoder Feature Identification

Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas / April 30, 2026

arXiv:2604.26866v1 Announce Type: new
Abstract: Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the…

Author name: Dimitris Dimakopoulos, Shay B. Cohen, Ioannis Konstas

MoRFI: Monotonic Sparse Autoencoder Feature Identification