Alibaba's Qwen3.5-Omni Learns to Write Code from Spoken Instructions and Video
Alibaba has released Qwen3.5-Omni, an omnimodal AI model that can process text, images, audio, and video. It claims to outperform Gemini 3.1 Pro on audio tasks and has unexpectedly learned to write code from spoken instructions and video input without any specific training.
Why it matters
Qwen3.5-Omni's ability to write code from spoken instructions and video showcases the rapid progress in multimodal AI and its potential applications in software engineering and other domains.
Key Points
- 1Qwen3.5-Omni is an omnimodal AI model from Alibaba that can process multiple data modalities
- 2It outperforms Gemini 3.1 Pro on audio tasks
- 3Qwen3.5-Omni has learned to write code from spoken instructions and video input without any prior training
Details
Alibaba's new Qwen3.5-Omni AI model is capable of processing text, images, audio, and video data. The model claims to beat the performance of Gemini 3.1 Pro on audio-related tasks. What's more surprising is that Qwen3.5-Omni has unexpectedly learned to write code from spoken instructions and video input, without being explicitly trained for this capability. This demonstrates the model's impressive multimodal learning abilities and its potential to assist with tasks like software development through natural language and visual inputs.
No comments yet
Be the first to comment