Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant
I'm a master's student in Germany and I got obsessed with one question: can you run a model that's "too big" for your hardware? After weeks of experimenting I combined three techniques — lazy MoE expert loading, TurboQuant KV co…