Finetuned Gemma 3 270m on CPU only - full weights, no LoRA, no GPU, no cloud compute. ms-swift and a few minutes of patience.
Small absurd dataset deliberately to make verification trivial: if the model outputs exactly what wasn't in its pretraining, the finetuning wrote into the weights. It did.
Curious whether anyone here has done serious CPU finetuning beyond proof-of-concept - and at what model size it becomes genuinely impractical vs. just slow.
Full process including parameters:
https://www.promptinjection.net/p/can-you-train-an-ai-llm-on-cpu-only
[link] [comments]