Mistral's Voxtral TTS Model Clones Voices Across 9 Languages
French AI startup Mistral has released Voxtral, a text-to-speech model that can clone voices from just 3 seconds of audio across 9 languages.
Why it matters
Voxtral's voice cloning capabilities from limited audio data could enable more accessible and customizable text-to-speech applications globally.
Key Points
- 1Mistral released its first open-weight TTS model called Voxtral
- 2Voxtral can clone voices from 3 seconds of audio input
- 3The model supports 9 languages including English, French, and others
Details
Mistral, a French AI company, has developed a new text-to-speech (TTS) model called Voxtral that can clone voices from just 3 seconds of audio input. The model supports 9 languages, including English, French, Spanish, German, Italian, Mandarin, Japanese, Korean, and Arabic. This represents a significant advancement in voice cloning technology, which typically requires much longer audio samples. Voxtral's open-weight architecture allows it to be deployed on a wide range of hardware, making it accessible for various applications like virtual assistants, audiobook narration, and voice dubbing. The ability to clone voices from short audio clips opens up new possibilities for personalized and realistic-sounding TTS across multiple languages.
No comments yet
Be the first to comment