TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]

We are open-sourcing TritonSigmoid — a fast, padding-aware sigmoid attention kernel for GPUs.

We built this for single-cell foundation models, where every cell is represented as a sequence of genes. A single gene can be regulated by multiple transcription factors at once. Softmax forces them to compete for attention, but sigmoid lets the model attend strongly to many genes (tokens) simultaneously. Because cells express anywhere from 200 to 16,000+ genes (tokens), the kernel handles variable-length padding natively so you're not wasting compute on empty positions.

What we found during our experiments:
• Hardware: Up to 515 TFLOPS on H100 (vs. FlashAttention-2 at 361, FlashSigmoid at 440)
• Accuracy: Lower validation loss than softmax attention across 6 held-out datasets
• Representation: 25% better cell-type separation
• Stability: Stable training where softmax catastrophically diverges

We would welcome any discussion or feedback.

Links to our work:
Paper: https://arxiv.org/abs/2604.27124
Code: https://github.com/MSDLLCpapers/triton-sigmoid

submitted by /u/vjysd
[link] [comments]

Leave a Comment