VoxCPM2 is out – 2B params, 30 languages. Major upgrade over VoxCPM1.5.

OpenBMB just dropped VoxCPM2, the follow-up to their VoxCPM-0.5B. Big jump in scale and capabilities.

OpenBMB just released VoxCPM2, a significant step up from VoxCPM1.5.

VoxCPM1.5 → VoxCPM2:

New in VoxCPM2:

Voice Design — generate a novel voice from a text description alone, no reference audio needed
Controllable Cloning — clone + steer emotion, pace, expression
Ultimate Cloning — max fidelity with reference audio + transcript
~8GB VRAM, streaming support

Anyone tested VoxCPM2 yet?

vs Qwen3-TTS — naturalness and multilingual coverage?
vs Open-MOSS — latency and voice quality?
OmniVoice (k2-fsa) — covers 646 languages vs VoxCPM2's 30, RTF of 0.025 vs 0.30, but 24kHz vs 48kHz. Quality tradeoff worth it for the speed and language coverage?
Does Voice Design (no reference audio) actually hold up?
Non-English results?

Audio comparisons would be great if anyone has them.