Burin Naowarat, Hao Tang, Sharon Goldwater

A framework for analyzing concept representations in neural models

Burin Naowarat, Hao Tang, Sharon Goldwater / May 5, 2026

arXiv:2605.01381v1 Announce Type: new
Abstract: Understanding how neural models represent human-interpretable concepts is challenging. Prior work has explored linear concept subspaces from diverse perspectives, such as probing and concept erasure. We …

Author name: Burin Naowarat, Hao Tang, Sharon Goldwater

A framework for analyzing concept representations in neural models