Can orthogonalizing the embedding matrix make weight tying work better?

Weight tying is a beloved trick — share the input embedding and output projection, halve your parameters.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top