VoxCPM2 is out – 2B params, 30 languages. Major upgrade over VoxCPM1.5.

OpenBMB just dropped VoxCPM2, the follow-up to their VoxCPM-0.5B. Big jump in scale and capabilities.

OpenBMB just released VoxCPM2, a significant step up from VoxCPM1.5.

VoxCPM1.5 → VoxCPM2:

VoxCPM1.5 VoxCPM2
Params 0.5B
Audio quality 44.1kHz
Languages Chinese + English
Training data 1.8M hours
RTF (RTX 4090) 0.17
Voice Design

New in VoxCPM2:

  • Voice Design — generate a novel voice from a text description alone, no reference audio needed
  • Controllable Cloning — clone + steer emotion, pace, expression
  • Ultimate Cloning — max fidelity with reference audio + transcript
  • ~8GB VRAM, streaming support

HuggingFace: https://huggingface.co/openbmb/VoxCPM2

Anyone tested VoxCPM2 yet?

  • vs Qwen3-TTS — naturalness and multilingual coverage?
  • vs Open-MOSS — latency and voice quality?
  • OmniVoice (k2-fsa) — covers 646 languages vs VoxCPM2's 30, RTF of 0.025 vs 0.30, but 24kHz vs 48kHz. Quality tradeoff worth it for the speed and language coverage?
  • Does Voice Design (no reference audio) actually hold up?
  • Non-English results?

Audio comparisons would be great if anyone has them.

submitted by /u/Downtown_Radish_8040
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top