LocalLLaMA

One year ago DeepSeek R1 was 25 times bigger than Gemma 4

I'm mind blown by the fact that about a year ago DeepSeek R1 came out with a MoE architecture at 671B parameters and today Gemma 4 MoE is only 26B and is genuinely impressive. It's 25 times smaller, but is it 25 times worse? I'm exited abou…