I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM

Hardware:

• Stock iMac G3 Rev B (October 1998). 233 MHz PowerPC 750, 32 MB RAM, Mac OS 8.5. No upgrades.

• Model: Andrej Karpathy’s 260K TinyStories (Llama 2 architecture). ~1 MB checkpoint.

Toolchain:

• Cross-compiled from a Mac mini using Retro68 (GCC for classic Mac OS → PEF binaries)

• Endian-swapped model + tokenizer from little-endian to big-endian for PowerPC

• Files transferred via FTP to the iMac over Ethernet

Challenges:

• Mac OS 8.5 gives apps a tiny memory partition by default. Had to use MaxApplZone() + NewPtr() from the Mac Memory Manager to get enough heap

• RetroConsole crashes on this hardware, so all output writes to a text file you open in SimpleText

• The original llama2.c weight layout assumes n_kv_heads == n_heads. The 260K model uses grouped-query attention (kv_heads=4, heads=8), which shifted every pointer after wk and produced NaN. Fixed by using n_kv_heads * head_size for wk/wv sizing

• Static buffers for the KV cache and run state to avoid malloc failures on 32 MB

It reads a prompt from prompt.txt, tokenizes with BPE, runs inference, and writes the continuation to output.txt.

Obviously the output is very short, but this is definitely meant to just be a fun experiment/demo!

Here’s the repo link: https://github.com/maddiedreese/imac-llm

submitted by /u/maddiedreese
[link] [comments]

Leave a Comment