| mii-llm just released a detailed technical report on the development of the Zagreus and Nesso model families: a set of 0.4B parameter language models trained from scratch with a focus on edge deployment, multilingual capability, and European languages. The report documents the full pipeline behind a family of small language models designed for Italian, Spanish, French, and Portuguese, with bilingual pretraining centered on English + target language settings. Released models
Training setupAccording to the report, the project used:
The report also explains why a dense 0.4B architecture was selected instead of MoE, arguing that in the sub-1B regime, stability and utilization can matter more than sparse efficiency. Why this is interestingA lot of current discussion focuses on frontier-scale models, but this report is a useful example of the opposite direction: small models trained from scratch for practical multilingual edge scenarios. Some points that stand out:
Benchmark notesThe report includes comparisons against Qwen3-0.6B and Qwen3.5-0.8B, along with multilingual evaluations and task-by-task analysis. A few interesting takeaways:
Figuresllm-as-judge comparison Classical benchmark Italian benchmark results English benchmark results english-nesso.png Main takeawayThis is a solid case study on what it actually looks like to train a small multilingual LLM from scratch in 2026: tokenization, storage, Slurm orchestration, distributed training, post-training, evaluation, and model release. For anyone interested in small language models, multilingual training, edge deployment, or open LLM engineering, the report is worth a read. [link] [comments] |