I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch

Hey everyone,

I've been working on a repo where I implement large language model architectures using the simplest PyTorch code possible. No bloated frameworks, no magic abstractions — just clean, readable code that shows exactly what's happening under the hood.

The mission is simple: make LLM internals approachable. If you've ever wanted to understand how these models actually work — not just use them — this is the kind of place where you can read the code and actually follow it.

Right now it has a GPT implementation with:

- A clean decoder-only transformer

- Flash attention support

- A minimal trainer with loss tracking

- Support for CPU and GPU with Multiple precision

Link: https://github.com/mohamedrxo/simplegpt

submitted by /u/PerspectiveJolly952
[link] [comments]

I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch — contributions welcome! [P]

Leave a Comment