Better Models, Faster Training: Sigmoid Attention for single-cell Foundation Models
arXiv:2604.27124v1 Announce Type: new
Abstract: Training stable biological foundation models requires rethinking attention mechanisms: we find that using sigmoid attention as a drop in replacement for softmax attention a) produces better learned repre…