CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection
arXiv:2604.00716v1 Announce Type: new
Abstract: Transformer language models contain localized reasoning circuits, contiguous layer blocks that improve reasoning when duplicated at inference time. Finding these circuits currently requires brute-force s…