cs.AI, cs.CL

Hierarchical vs. Flat Iteration in Shared-Weight Transformers

arXiv:2604.14442v1 Announce Type: new
Abstract: We present an empirical study of whether hierarchically structured, shared-weight recurrence can match the representational quality of independent-layer stacking in a Transformer-based language model. HR…