cs.LG

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

arXiv:2504.20966v4 Announce Type: replace
Abstract: We introduce softpick, a rectified, not sum-to-one, drop-in replacement for softmax in transformer attention mechanisms that eliminates attention sink and massive activations. Our experiments with 34…