Three new Kitten TTS models – smallest less than 25MB

Kitten TTS has released three new open-source text-to-speech models with varying sizes and quality levels, aimed at on-device applications.

💡

Why it matters

Tiny, high-quality text-to-speech models are crucial for enabling on-device AI applications, especially in low-power environments.

Key Points

  • 1Three new Kitten TTS models with 80M, 40M, and 14M parameters
  • 2The 14M variant is the smallest at under 25MB but has high expressivity
  • 3Models support English text-to-speech in 8 voices (4 male, 4 female)
  • 4Models are quantized and use ONNX for runtime, designed to run on low-end devices without GPUs

Details

Kitten TTS is an open-source project focused on developing tiny and expressive text-to-speech models for on-device applications. The latest release includes three new models with varying sizes and quality levels. The largest 80M parameter model has the highest quality, while the 14M variant reaches new state-of-the-art in expressivity among similar-sized models, despite being under 25MB in size. All models support English text-to-speech in 8 voices (4 male, 4 female) and are designed to run on low-power devices like Raspberry Pis, smartphones, and wearables without requiring a GPU. The models are quantized to int8 and fp16 and use the ONNX runtime. This release aims to bridge the gap between on-device and cloud-based text-to-speech solutions, making it easier to build production-ready voice agents and apps that run entirely on the device.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies