/u/Critical-Chef9211

Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU

/u/Critical-Chef9211 / April 9, 2026

Quick summary: I found a way to use the RT Cores (normally used for ray tracing in games) to handle expert routing in MoE models. Those cores sit completely idle during LLM inference, so why not put them to work? What it does: Takes the routing decisi…

Author name: /u/Critical-Chef9211

Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU