Taimur Khan - Provide.ai

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

Taimur Khan / May 12, 2026

arXiv:2603.01960v2 Announce Type: replace
Abstract: TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easi…

Author name: Taimur Khan

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch