Mistral's Voxtral TTS Model Clones Voices Across 9 Languages

French AI startup Mistral has released Voxtral, a text-to-speech model that can clone voices from just 3 seconds of audio across 9 languages.

💡

Why it matters

Voxtral's voice cloning capabilities from limited audio data could enable more accessible and customizable text-to-speech applications globally.

Key Points

  • 1Mistral released its first open-weight TTS model called Voxtral
  • 2Voxtral can clone voices from 3 seconds of audio input
  • 3The model supports 9 languages including English, French, and others

Details

Mistral, a French AI company, has developed a new text-to-speech (TTS) model called Voxtral that can clone voices from just 3 seconds of audio input. The model supports 9 languages, including English, French, Spanish, German, Italian, Mandarin, Japanese, Korean, and Arabic. This represents a significant advancement in voice cloning technology, which typically requires much longer audio samples. Voxtral's open-weight architecture allows it to be deployed on a wide range of hardware, making it accessible for various applications like virtual assistants, audiobook narration, and voice dubbing. The ability to clone voices from short audio clips opens up new possibilities for personalized and realistic-sounding TTS across multiple languages.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies