Built a Japanese ASR benchmark because existing ones can’t measure quality differences properly
Was fine-tuning a Japanese ASR model (based on Qwen3-ASR) to handle technical terminology better. The model clearly improved — "Next.js" comes out as "Next.js" instead of "ネクストジェイズ", punctuation works, etc. But existing Ja…