Alibaba's Qwen3.5-Omni Learns to Write Code from Spoken Instructions and Video

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that can process text, images, audio, and video. It claims to outperform Gemini 3.1 Pro on audio tasks and has unexpectedly learned to write code from spoken instructions and video input without any specific training.

đź’ˇ

Why it matters

Qwen3.5-Omni's ability to write code from spoken instructions and video showcases the rapid progress in multimodal AI and its potential applications in software engineering and other domains.

Key Points

  • 1Qwen3.5-Omni is an omnimodal AI model from Alibaba that can process multiple data modalities
  • 2It outperforms Gemini 3.1 Pro on audio tasks
  • 3Qwen3.5-Omni has learned to write code from spoken instructions and video input without any prior training

Details

Alibaba's new Qwen3.5-Omni AI model is capable of processing text, images, audio, and video data. The model claims to beat the performance of Gemini 3.1 Pro on audio-related tasks. What's more surprising is that Qwen3.5-Omni has unexpectedly learned to write code from spoken instructions and video input, without being explicitly trained for this capability. This demonstrates the model's impressive multimodal learning abilities and its potential to assist with tasks like software development through natural language and visual inputs.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies