Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

Like many here, I kept running into Claude usage limits when building anything non-trivial.

I was working with a job search automation pipeline (based on the Career-Ops project), and the naive flow was burning ~16k tokens per application — completely unsustainable.

So I spent some time reworking it with a focus on token efficiency as a first-class concern, not an afterthought.

🚀 Results

  • ~85% reduction in token usage
  • ~900 tokens per application
  • Most repeated context calls eliminated
  • Much more stable under usage limits

⚡ What actually helped (practical takeaways)

1. Prompt caching (biggest win)

  • Cached system + profile context (cache_control: ephemeral)
  • Break-even after 2 calls, strong gains after that
  • ~40% reduction on repeated operations

👉 If you're re-sending the same context every time, you're wasting tokens.

2. Model routing instead of defaulting to Sonnet/Opus

  • Lightweight tasks → Haiku
  • Medium reasoning → Sonnet
  • Heavy tasks only → Opus

👉 Most steps don’t need expensive models.

3. Precompute anything reusable

  • Built an answer bank (25 standard responses) in one call
  • Reused across applications

👉 Eliminated ~94% of LLM calls during form filling.

4. Avoid duplicate work

  • TF-IDF semantic dedup (threshold 0.82)
  • Filters duplicate job listings before evaluation

👉 Prevents burning tokens on the same content repeatedly.

5. Reduce “over-intelligence”

  • Added a lightweight classifier step before heavy reasoning
  • Only escalate to deeper models when needed

👉 Not everything needs full LLM reasoning.

🧠 Key insight

Most Claude workflows hit limits not because they’re complex —
but because they recompute everything every time.

🧩 Curious about others’ setups

  • How are you handling repeated context?
  • Anyone using caching aggressively in multi-step pipelines?
  • Any good patterns for balancing Haiku vs Sonnet vs Opus?

https://github.com/maddykws/jubilant-waddle

Inspired by Santiago Fernández’s Career-Ops — this is a fork focused on efficiency + scaling under usage limits.

submitted by /u/distanceidiot
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top