cs.CL, cs.LG

A framework for analyzing concept representations in neural models

arXiv:2605.01381v1 Announce Type: new
Abstract: Understanding how neural models represent human-interpretable concepts is challenging. Prior work has explored linear concept subspaces from diverse perspectives, such as probing and concept erasure. We …