Can orthogonalizing the embedding matrix make weight tying work better?
Weight tying is a beloved trick — share the input embedding and output projection, halve your parameters.Continue reading on Medium ยป
Weight tying is a beloved trick — share the input embedding and output projection, halve your parameters.Continue reading on Medium ยป