I got tired of watching Claude Code re-read the same files over and over. A 2,000-token file read 5 times = 10,000 tokens gone. So I built sqz.

The key insight: most token waste isn't from verbose content - it's from repetition. sqz keeps a SHA-256 content cache. First read compresses normally. Every subsequent read of the same file returns a 13-token inline reference instead of the full content. The LLM still understands it.

Real numbers from my sessions:

File read 5x: 10,000 tokens → 1,400 tokens (86% saved)

JSON API response with nulls: 56% reduction (strips nulls, TOON-encodes)

Repeated log lines: 58% reduction (condenses duplicates)

Stack traces: 0% reduction (intentionally — error content is sacred)

That last point is the whole philosophy. Aggressive compression can save more tokens on paper, but if it strips context from your error messages or drops lines from your diffs, the LLM gives you worse answers and you end up spending more tokens fixing the mistakes. sqz compresses what's safe to compress and leaves critical content untouched. You save tokens without sacrificing result quality.

It works across 4 surfaces:

Shell hook (auto-compresses CLI output)

MCP server (compiled Rust, not Node)

Browser extension (Chrome + Firefox (currently in approval phase)— works on ChatGPT, Claude, Gemini, Grok, Perplexity)

IDE plugins (JetBrains, VS Code)

Single Rust binary. Zero telemetry. 549 tests + 57 property-based correctness proofs.

cargo install sqz-cli

sqz init

Track your savings:

sqz gain # ASCII chart of daily token savings

sqz stats # cumulative report

Token Savings

sqz saves tokens in two ways: compression (removing noise from content) and deduplication (replacing repeated reads with 13-token references). The dedup cache is where the biggest savings happen in real sessions.

Where sqz shines

Scenario	Savings	Why

Repeated file reads (5x)	86%	Dedup cache: 13-token ref after first read
JSON API responses with nulls	7–56%	Strip nulls + TOON encoding (varies by null density)
Repeated log lines	58%	Condense stage collapses duplicates
Large JSON arrays	77%	Array sampling + collapseToken Savingssqz saves tokens in two ways: compression (removing noise from content) and deduplication (replacing repeated reads with 13-token references). The dedup cache is where the biggest savings happen in real sessions.Where sqz shinesScenario Savings WhyRepeated file reads (5x) 86% Dedup cache: 13-token ref after first readJSON API responses with nulls 7–56% Strip nulls + TOON encoding (varies by null density)Repeated log lines 58% Condense stage collapses duplicatesLarge JSON arrays 77% Array sampling + collapse

GitHub: https://github.com/ojuschugh1/sqz

Happy to answer questions about the architecture or benchmarks. Hope this tool will Sqz your tokens and save your credits.

If you try it, a ⭐ helps with discoverability — and bug reports are extra welcome since this is v0.2 so rough edges exist.

It is available as IDE Extension , CLI , so it will be able as web extension to use with chatgpt, claude, gemmini websites as well.

submitted by /u/Due_Anything4678
[link] [comments]

I built a tool that turns repeated file reads into 13-token references. My AI Coding sessions now use 86% fewer tokens on file-heavy tasks based on mathematics and research. [P]

Token Savings

Where sqz shines

Leave a Comment