FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning
arXiv:2605.04421v1 Announce Type: new
Abstract: Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attentio…