cs.AI, cs.LG

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

arXiv:2603.01960v2 Announce Type: replace
Abstract: TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easi…