Author name: Fabio Angeletti

data-engineering, italian, llm, Machine Learning, nlp

Assembling 450 Billion Tokens: The Training Data Nobody Had Ready

Fabio Angeletti / April 23, 2026

Ten datasets. Three languages. Broken APIs, nested fields, and giant books that didn’t fit in my pipeline. The unglamorous foundation of everything that follows.Fabio Angeletti — PhD in Computer Engineering (Sapienza), Adjunct Professor at LUISS and LU…

Artificial Intelligence, Italy, Machine Learning, nlp, Open Source

Why I’m Training an Italian Language Model from Scratch — With Two GPUs and No Funding

Fabio Angeletti / April 22, 2026

Italy is building its own AI models. Big companies, supercomputers, government backing. I’m doing it alone, on a single server, and I think that’s exactly the point.Fabio Angeletti — PhD in Computer Engineering (La Sapienza), Adjunct Professor at LUISS…