MachineLearning

[P] I implemented "Screening Is Enough" (arXiv:2604.01178) in PyTorch and benchmarked it

Last week's paper replaces softmax attention with an absolute threshold mechanism: “` alpha = [max(1 – r * (1 – cosine_sim), 0)]^2 “` Keys below the threshold get zeroed out entirely — no global competition, no softmax denominator. Paper claims ~…