Anyone running Kimi on low VRAM + offloading to RAM? (im sure most)
Im curious how much output token benefits from something smaller like a 12gb Tesla T4, and offloading the remainder of the model to RAM I get about ~1.6t/s output ~20t/s input CPU only.. which is obviously terrible. I'm using NUMA.. I have dual xeo…