MiraTTS: High quality and fast TTS model
MiraTTS is a high-quality LLM-based text-to-speech (TTS) model that can generate realistic and clear 48kHz speech at over 100x real-time speed, while being memory-efficient and low-latency.
Why it matters
MiraTTS represents a significant advancement in text-to-speech technology, offering high-quality, fast, and efficient speech generation that could have wide-ranging applications in various industries.
Key Points
- 1MiraTTS is a high-quality LLM-based TTS model
- 2It can generate 48kHz speech at over 100x real-time speed
- 3It is memory-efficient and has low latency (as low as 150ms)
- 4Basic multilingual versions are supported, with multispeaker in progress
Details
MiraTTS is an optimized LLM-based text-to-speech (TTS) model that can generate high-quality, realistic 48kHz speech at an incredibly fast speed of over 100x real-time. The model was heavily optimized using Lmdeploy and enhanced with FlashSR to achieve this performance. In addition to the high quality and speed, MiraTTS is also memory-efficient, working even on GPUs with 6GB of VRAM. The low latency of the model, as low as 150ms, makes it suitable for real-time applications. While basic multilingual versions are already supported, the developer is working on adding multispeaker capabilities soon.
No comments yet
Be the first to comment