| Like many here, I kept running into Claude usage limits when building anything non-trivial. I was working with a job search automation pipeline (based on the Career-Ops project), and the naive flow was burning ~16k tokens per application — completely unsustainable. So I spent some time reworking it with a focus on token efficiency as a first-class concern, not an afterthought. 🚀 Results
⚡ What actually helped (practical takeaways)1. Prompt caching (biggest win)
👉 If you're re-sending the same context every time, you're wasting tokens. 2. Model routing instead of defaulting to Sonnet/Opus
👉 Most steps don’t need expensive models. 3. Precompute anything reusable
👉 Eliminated ~94% of LLM calls during form filling. 4. Avoid duplicate work
👉 Prevents burning tokens on the same content repeatedly. 5. Reduce “over-intelligence”
👉 Not everything needs full LLM reasoning. 🧠 Key insightMost Claude workflows hit limits not because they’re complex — 🧩 Curious about others’ setups
https://github.com/maddykws/jubilant-waddle Inspired by Santiago Fernández’s Career-Ops — this is a fork focused on efficiency + scaling under usage limits. [link] [comments] |