Name: Open-source text-to-speech models teams can self-host
Creator: The Webhound Team
Published: 2026-05-08T16:20:09.990169+00:00
Keywords: A practical list of open-source text-to-speech systems you can actually run yourself.

Open-source text-to-speech models teams can self-host$3 spent

$2.50 spent

model_name	maintainer	license	primary_repo_or_model_url	languages_supported	voice_cloning	streaming_or_realtime	description	last_active_date
F5-TTS	SWivid	CC-BY-NC-4.0	https://github.com/SWivid/F5-TTS	["English","Chinese"]	true	true	Flow matching model with a Diffusion Transformer backbone for consistent zero-shot cloning.	2025-03-21
Bark	Suno	MIT	https://github.com/suno-ai/bark	["English","Spanish","French","German","Italian","Japanese","Korean","Chinese","Portuguese","Russian","Turkish","Polish","Hindi"]	false	false	GPT-style autoregressive model capable of generating speech and non-speech sounds like laughter or music.	2024-03-01
Piper	Rhasspy	MIT	https://github.com/rhasspy/piper	["English","Spanish","French","German","Chinese","Multilingual"]	false	true	Extremely fast, ONNX-based model optimized for edge devices and Raspberry Pi, running entirely on CPU. Supports 30+ languages.	2024-11-08
VibeVoice	Microsoft	Custom (Research-Only)	https://huggingface.co/microsoft/VibeVoice-1.5B	["English","Chinese"]	false	true	Microsoft's long-form TTS model using low-frame-rate tokenizers for stable multi-speaker dialogue up to 90 minutes.	2026-01-22
Qwen3-TTS	Alibaba Group	Apache-2.0	https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice	["Chinese","English","Japanese","Korean","German","French","Russian","Portuguese","Spanish","Italian"]	true	true	High-performance TTS model family (0.6B/1.7B) from Alibaba Qwen, beating closed-source models in voice similarity benchmarks.	2026-01-29
VoxCPM2	OpenBMB	Apache-2.0	https://huggingface.co/openbmb/VoxCPM2	["Arabic","Burmese","Chinese","Danish","Dutch","English","Finnish","French","German","Greek","Hebrew","Hindi","Indonesian","Italian","Japanese","Khmer","Korean","Lao","Malay","Norwegian","Polish","Portuguese","Russian","Spanish","Swahili","Swedish","Tagalog","Thai","Turkish","Vietnamese"]	true	true	High-performance model from OpenBMB achieving 85%+ voice similarity in competitive benchmarks.	2026-04-15
OmniVoice	k2-fsa	Apache-2.0	https://huggingface.co/k2-fsa/OmniVoice	["Multilingual"]	false	true	A massively multilingual zero-shot text-to-speech (TTS) model supporting over 600 languages.	2026-05-07
CosyVoice 3	Alibaba Group	Apache-2.0	https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512	["Chinese","English","Japanese","Korean","German","Spanish","French","Italian","Russian"]	true	true	Small but powerful speech generation model from Alibaba's FunAudioLLM (CosyVoice series) with streaming support.	2026-02-03
Chatterbox-Turbo	Resemble AI	MIT	https://huggingface.co/ResembleAI/chatterbox-turbo	["Arabic","Danish","German","Greek","English","Spanish","Finnish","French","Hebrew","Hindi","Italian","Japanese","Korean","Malay","Dutch","Norwegian","Polish","Portuguese","Russian","Swedish","Swahili","Turkish","Chinese"]	true	true	Streamlined 350M-parameter model family designed for low-latency, production-grade speech with emotion control.	2026-04-22
Parler-TTS	Hugging Face	Apache-2.0	https://github.com/huggingface/parler-tts	["English"]	false	false	Text-controlled TTS model using T5 and DAC decoder, allowing users to describe voice characteristics via natural language.	2025-09-24
Kokoro	Hexgrad	Apache-2.0	https://huggingface.co/hexgrad/Kokoro-82M	["English","Japanese","Korean","Spanish","French","German","Italian","Portuguese","Hindi"]	false	true	Lightweight, 82M parameter model based on StyleTTS2 for high-quality, fast inference.	2025-04-10
Fish Audio S2 Pro	Fish Audio	Fish Audio Research License	https://huggingface.co/fishaudio/s2-pro	["English","Chinese","Multilingual (80+)"]	true	true	Decoder-only transformer model with Dual-AR design for high-quality, controllable speech across 80+ languages.	2026-03-11
XTTS-v2	Coqui	CPML	https://huggingface.co/coqui/XTTS-v2	["English","Spanish","French","German","Italian","Portuguese","Polish","Turkish","Russian","Dutch","Czech","Arabic","Chinese","Japanese","Hungarian","Korean","Hindi"]	true	true	GPT-style autoregressive model with DVAE for high-fidelity zero-shot voice cloning in 17 languages.	2023-12-11
Dia2	Nari Labs	Apache-2.0	https://huggingface.co/nari-labs/Dia2-2B	["English"]	true	true	Dialogue-focused TTS model (1B/2B variants) for multi-speaker conversations with nonverbal tags like laughter.	2026-03-01
MeloTTS	MyShell.ai	MIT	https://github.com/myshell-ai/MeloTTS	["English","Spanish","French","Chinese","Japanese","Korean"]	false	true	Fast, multilingual TTS library optimized for CPU inference with support for mixed-language speech.	2024-12-24
ChatTTS	2Noise	AGPL-3.0	https://github.com/2Noise/ChatTTS	["English","Chinese"]	true	true	Conversational TTS model optimized for dialogue, supporting natural prosody and expressive speech features.	2025-11-15
IndexTTS2	SiliconFlow	Apache-2.0	https://huggingface.co/siliconflow/IndexTTS2-1.5B	["Chinese","English"]	true	true	Advanced zero-shot TTS model from SiliconFlow with high emotional fidelity and superior speaker similarity.	2026-04-07
LongCat-AudioDiT	Meituan	MIT	https://github.com/meituan-longcat/LongCat-AudioDiT	["Chinese","English"]	true	false	Diffusion-based TTS model from Meituan's LongCat team, operating in waveform latent space for high-quality voice cloning.	2026-03-30
MioTTS-2.6B	Aratako	LFM	https://huggingface.co/Aratako/MioTTS-2.6B	["English","Japanese"]	true	true	Zero-shot TTS model (2.6B) built on the MioCodec for efficient, high-quality audio generation and cloning.	2026-02-10
Sesame CSM-1B	Sesame AI	Custom (Sesame License)	https://huggingface.co/sesame/csm-1b	["English"]	true	true	1B parameter TTS model from Sesame focused on high-quality speech generation.	2025-12-01
ZipVoice	MyShell.ai	Apache-2.0	https://huggingface.co/myshell-ai/zipvoice-v1	["Chinese","English"]	true	true	High-speed zero-shot TTS model from MyShell.ai optimized for low-latency voice cloning and interaction.	2025-11-20
E2-TTS	SWivid	MIT	https://github.com/SWivid/E2-TTS	["English","Chinese"]	true	true	A highly efficient TTS model from SWivid, sister project to F5-TTS, designed for fast inference.	2025-03-12
GPT-SoVITS	RVC-Boss	MIT	https://github.com/RVC-Boss/GPT-SoVITS	["English","Chinese","Japanese","Korean","Cantonese"]	true	true	Powerful few-shot voice cloning and TTS system with a web UI, widely used for creating custom voice models.	2025-12-01
Moshi	Kyutai	CC-BY-4.0	https://github.com/kyutai-labs/moshi	["English"]	true	true	Low-latency speech-to-speech model (200ms) capable of real-time conversational reasoning, running on consumer hardware.	2025-10-15
MOSS-TTS-Nano	OpenMOSS Team	Apache-2.0	https://github.com/OpenMOSS/MOSS-TTS-Nano	["English","Chinese","Multilingual"]	true	true	Ultra-lightweight 0.1B parameter model designed for real-time CPU inference across 20 languages.	2026-04-17
NeuTTS	Neuphonic	NeuTTS Open License v1.0	https://github.com/neuphonic/neu-tts	["English","Spanish","German","French"]	true	true	On-device foundation TTS model from Neuphonic for super-realistic speech with instant voice cloning.	2026-04-10
VoiceCraft	Jason Peng	MIT	https://github.com/jasonppy/voicecraft	["English"]	true	true	Token infilling neural codec language model for high-fidelity speech editing and zero-shot voice cloning.	2024-11-20
FireRedTTS-1S	FireRedTeam	MPL-2.0	https://github.com/FireRedTeam/FireRedTTS	["Chinese","English"]	true	true	High-quality streaming foundation TTS system from the FireRed Team with two-stage semantic-to-acoustic decoding.	2026-04-17
MaskGCT	Amphion	MIT	https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct	["English","Japanese","Chinese"]	true	true	Fully non-autoregressive TTS model using a masked generative codec transformer for zero-shot synthesis without text-speech alignment.	2026-04-17
StyleTTS 2	Liwei Guo	MIT	https://github.com/yl4579/StyleTTS2	["English"]	true	true	Foundation TTS model using style diffusion and adversarial training for human-level naturalness and expressive prosody.	2024-11-20
MetaVoice-1B	MetaVoice	Apache-2.0	https://github.com/metavoiceio/metavoice-src	["English"]	true	true	Foundational 1.2B parameter model for human-like, expressive TTS, trained on 100,000 hours of speech.	2024-11-15
MARS5-TTS	CAMB.AI	Apache-2.0	https://github.com/Camb-ai/mars5-tts	["English"]	true	true	Novel speech model from CAMB.AI designed for high-quality prosody and zero-shot cloning from 5 seconds of audio.	2024-11-20
OpenVoice	MyShell.ai	MIT	https://github.com/myshell-ai/OpenVoice	["English","Spanish","French","Chinese","Japanese","Korean"]	true	true	Versatile instant voice cloning approach from MIT and MyShell.ai with control over emotion and accent across multiple languages.	2024-11-15
EmotiVoice	NetEase Youdao	Apache-2.0	https://github.com/netease-youdao/EmotiVoice	["English","Chinese"]	true	true	Multi-voice and prompt-controlled TTS engine from NetEase Youdao with support for over 2000 voices and emotional synthesis.	2024-11-20
VALL-E-X	Plachtaa	Apache-2.0	https://github.com/Plachtaa/VALL-E-X	["English","Chinese","Japanese"]	true	true	Open-source implementation of Microsoft's VALL-E X for zero-shot cross-lingual voice cloning and synthesis.	2024-11-20
MegaTTS 3	MegaTTS Team	Apache-2.0	https://huggingface.co/models?search=MegaTTS3	["Chinese","English"]	true	true	Multilingual zero-shot TTS model with high speaker similarity and natural prosody.	2025-12-25
Spark-TTS	Spark-TTS Team	Apache-2.0	https://github.com/spark-tts/spark-tts	["Chinese","English"]	true	true	High-quality zero-shot TTS model with support for expressive speech generation and voice cloning.	2026-03-30
T5Gemma-TTS	T5Gemma Team	MIT	https://github.com/t5gemma-tts/t5gemma-tts	["English","Chinese","Japanese"]	true	false	Multilingual TTS model built on the T5Gemma architecture, supporting voice cloning and precise duration control.	2026-03-15
KugelAudio	KugelAudio Team	MIT	https://github.com/kugelaudio/kugelaudio	["English","French","German","Spanish","Italian","Dutch","Portuguese","Multilingual (23 EU)"]	true	true	High-quality TTS model supporting 23 European Union languages with zero-shot voice cloning capabilities.	2026-03-20
TinyTTS	Trong Hieu IT	Apache-2.0	https://github.com/tronghieuit/tiny-tts	["English"]	false	true	Ultra-lightweight English TTS model with only 1.6 million parameters, achieving ~53x real-time inference.	2025-02-15

Made with Webhound · Ask questions about this research, build on it, or start your own