764 calls across 8 models: too much detail kills small models, filler words are load-bearing, and format preference is a myth
I wanted to know if the prompting advice you see everywhere, be specific, add examples, use XML tags, actually works on small local models. So I ran 572 calls across 8 models, 6 local on M2 96GB and RTX 5070 Ti via Ollama, and 2 frontier APIs (GPT-4.1-…