Correlates of Image Memorability in Vision Encoders: Activations, Attention Entropy, Patch Uniformity and Autoencoder Losses

arXiv:2509.01453v2 Announce Type: replace Abstract: Images vary in how memorable they are to humans. Inspired by findings from cognitive science and computer vision, we explore correlates of image memorability in pretrained transformer-based vision encoders for the first time. Focusing initially on activations, attention distributions, and the uniformity of image patches, we find that these features correlate with memorability to some extent. Additionally, we explore sparse autoencoder loss over the representations of vision encoders as a proxy for memorability, which yields results outperforming past methods using convolutional neural network representations. Our results shed light on the relationship between model-internal features and memorability. They show that some features are informative predictors of what makes images memorable to humans; revealing that, in particular, the reconstruction loss from our autoencoders is a strong correlate of image memorability.

Leave a Comment