/u/Creative-Type9411

Anyone running Kimi on low VRAM + offloading to RAM? (im sure most)

/u/Creative-Type9411 / May 5, 2026

Im curious how much output token benefits from something smaller like a 12gb Tesla T4, and offloading the remainder of the model to RAM I get about ~1.6t/s output ~20t/s input CPU only.. which is obviously terrible. I'm using NUMA.. I have dual xeo…

Author name: /u/Creative-Type9411

Anyone running Kimi on low VRAM + offloading to RAM? (im sure most)