Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to

Alibaba's advertising graphic shows two teddy bears in traditional Chinese clothing. The bear on the left is sitting at a desk in front of a monitor and represents Qwen3.5-Omni-Plus with functions such as SOTA Performance, Detailed Audio-Visual Captioning, Native Multimodal and Extensive Multilingual. The bear on the right is holding a smartphone and represents Qwen3.5-Omni-Plus-Realtime with Voice Control, WebSearch Tool, Voice Clone and Semantic Interruption.

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that processes text, images, audio, and video. It claims to beat Gemini 3.1 Pro on audio tasks and picked up an unexpected trick along the way: writing code from spoken instructions and video input.

The article Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to appeared first on The Decoder.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top