Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

arXiv:2603.02727v4 Announce Type: replace Abstract: Medical image segmentation requires models that preserve fine anatomical boundaries while remaining practical for clinical deployment. Transformers capture long-range dependencies but incur quadratic attention cost, whereas CNNs are efficient but less effective at global reasoning. Linear attention offers \(\mathcal{O}(N)\) scaling, but often produces diffuse feature aggregation that weakens boundary-sensitive prediction. We introduce a gated differential linear-attention mixer for medical image segmentation. Its global path, Gated Differential Linear Attention (GDLA), performs differential subtraction between two kernelized attention branches over complementary query/key subspaces to suppress redundant responses, and employs a data-dependent gate for token refinement. A parallel local token-mixing branch with depthwise convolution strengthens neighborhood interactions for better refinement, and the two branches are fused while preserving \(\mathcal{O}(N)\) complexity. When instantiated in a pretrained Pyramid Vision Transformer (PVT)-based encoder--decoder model, \name achieves state-of-the-art results on the evaluated 2D medical segmentation benchmarks spanning CT, MRI, ultrasound, and dermoscopy, with a favorable accuracy--efficiency trade-off over closely related baselines. The code is publicly available at \href{https://github.com/xmindflow/gdla}{https://github.com/xmindflow/gdla}.

Leave a Comment