MiraTTS: High quality and fast TTS model

MiraTTS is a high-quality LLM-based text-to-speech (TTS) model that can generate realistic and clear 48kHz speech at over 100x real-time speed, while being memory-efficient and low-latency.

💡

Why it matters

MiraTTS represents a significant advancement in text-to-speech technology, offering high-quality, fast, and efficient speech generation that could have wide-ranging applications in various industries.

Key Points

  • 1MiraTTS is a high-quality LLM-based TTS model
  • 2It can generate 48kHz speech at over 100x real-time speed
  • 3It is memory-efficient and has low latency (as low as 150ms)
  • 4Basic multilingual versions are supported, with multispeaker in progress

Details

MiraTTS is an optimized LLM-based text-to-speech (TTS) model that can generate high-quality, realistic 48kHz speech at an incredibly fast speed of over 100x real-time. The model was heavily optimized using Lmdeploy and enhanced with FlashSR to achieve this performance. In addition to the high quality and speed, MiraTTS is also memory-efficient, working even on GPUs with 6GB of VRAM. The low latency of the model, as low as 150ms, makes it suitable for real-time applications. While basic multilingual versions are already supported, the developer is working on adding multispeaker capabilities soon.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies