Weight Tying Biases Token Embeddings Towards the Output Space
arXiv:2603.26663v1 Announce Type: new
Abstract: Weight tying, i.e. sharing parameters between input and output embedding matrices, is common practice in language model design, yet its impact on the learned embedding space remains poorly understood. In…