TTS

Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, New Releases, Open Source, Staff, Technology, TTS, Voice AI

IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference

IBM Releases Granite Speech 4.1 2B and Its Non-Autoregressive Twin — Compact ASR Models Built for Enterprise

The post IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference appeared first on MarkTechPost.

Agentic AI, Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, New Releases, Staff, Technology, TTS, Voice AI

xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers

Elon Musk’s AI company xAI has launched two standalone audio APIs — a Speech-to-Text (STT) API and a Text-to-Speech (TTS) API — both built on the same infrastructure that powers Grok Voice on mobile apps, Tesla vehicles, and Starlink customer support. The release moves xAI squarely into the competitive speech API market currently occupied by […]

The post xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers appeared first on MarkTechPost.

Agentic AI, Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, New Releases, Staff, Technology, TTS, Voice AI

Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice

Google has introduced Gemini 3.1 Flash TTS, a preview text-to-speech model focused on improving speech quality, expressive control, and multilingual generation. Unlike previous iterations that prioritized simple conversion, this release emphasizes natural-language audio tags, native support for more than 70 languages, and native multi-speaker dialogue. This release signals a shift from ‘black-box’ audio generation toward […]

The post Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice appeared first on MarkTechPost.

Agentic AI, Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, Staff, Technology, TTS, Tutorials, Voice AI

A Hands-On Coding Tutorial for Microsoft VibeVoice Covering Speaker-Aware ASR, Real-Time TTS, and Speech-to-Speech Pipelines

In this tutorial, we explore Microsoft VibeVoice in Colab and build a complete hands-on workflow for both speech recognition and real-time speech synthesis. We set up the environment from scratch, install the required dependencies, verify support for the latest VibeVoice models, and then walk through advanced capabilities such as speaker-aware transcription, context-guided ASR, batch audio […]

The post A Hands-On Coding Tutorial for Microsoft VibeVoice Covering Speaker-Aware ASR, Real-Time TTS, and Speech-to-Speech Pipelines appeared first on MarkTechPost.

Agentic AI, Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, New Releases, Staff, Technology, Text to Audio, TTS, Voice AI

Mistral AI Releases Voxtral TTS: A 4B Open-Weight Streaming Speech Model for Low-Latency Multilingual Voice Generation

Mistral AI has released Voxtral TTS, an open-weight text-to-speech model that marks the company’s first major move into audio generation. Following the release of its transcription and language models, Mistral is now providing the final ‘output layer’ of the audio stack, positioning itself as a direct competitor to proprietary voice APIs in the developer ecosystem. […]

The post Mistral AI Releases Voxtral TTS: A 4B Open-Weight Streaming Speech Model for Low-Latency Multilingual Voice Generation appeared first on MarkTechPost.

Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, New Releases, Open Source, Staff, Technology, TTS, Voice AI

Cohere AI Releases Cohere Transcribe: A SOTA Automatic Speech Recognition (ASR) Model Powering Enterprise Speech Intelligence

In the landscape of enterprise AI, the bridge between unstructured audio and actionable text has often been a bottleneck of proprietary APIs and complex cascaded pipelines. Today, Cohere—a company traditionally known for its text-generation and embedding models—has officially stepped into the Automatic Speech Recognition (ASR) market with the release of their latest model ‘Cohere Transcribe‘. […]

The post Cohere AI Releases Cohere Transcribe: A SOTA Automatic Speech Recognition (ASR) Model Powering Enterprise Speech Intelligence appeared first on MarkTechPost.

Agentic AI, Artificial Intelligence, Audio Language Model, Editors Pick, Language Model, New Releases, Staff, Technology, TTS, Voice AI

IBM AI Releases Granite 4.0 1B Speech as a Compact Multilingual Speech Model for Edge AI and Translation Pipelines

IBM has released Granite 4.0 1B Speech, a compact speech-language model designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). The release targets enterprise and edge-style speech deployments where memory footprint, latency, and compute efficiency matter as much as raw benchmark quality. What Changed in Granite 4.0 1B Speech At the […]

The post IBM AI Releases Granite 4.0 1B Speech as a Compact Multilingual Speech Model for Edge AI and Translation Pipelines appeared first on MarkTechPost.

Scroll to Top