Author name: /u/mudler_it

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

/u/mudler_it / May 5, 2026

A few weeks ago I shipped vibevoice.cpp, a pure-C++ ggml port of Microsoft VibeVoice (the speech-to-speech model with voice cloning, https://github.com/microsoft/VibeVoice). Wanted to post a follow-up here because we're at a point where the engine …

LocalLLaMA

APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier

/u/mudler_it / May 4, 2026

Quick follow-up on APEX, the MoE-aware mixed-precision quant strategy. The original post was just about Qwen 3.5 35B-A3B ( https://www.reddit.com/r/LocalLLaMA/comments/1s9vzry/apex_moe_quantized_models_boost_with_33_faster/ ); since then the collection…