Text-to-Speech APIs
A hands-on guide to the leading TTS APIs — ElevenLabs, Google Cloud Text-to-Speech, Azure Speech Service, and Amazon Polly — with code examples, feature comparison, and pricing guidance.
API Comparison
| Feature | ElevenLabs | Google Cloud TTS | Azure Speech | Amazon Polly |
|---|---|---|---|---|
| Voice Quality | Exceptional | Excellent | Excellent | Very Good |
| Voice Cloning | Yes (instant + professional) | Custom Voice (enterprise) | Custom Neural Voice | No |
| Languages | 29+ | 50+ | 140+ | 30+ |
| SSML Support | Limited | Full | Full + MSTTS extensions | Full |
| Streaming | Yes | Yes | Yes | Yes |
| Free Tier | 10,000 chars/month | 1M chars/month (WaveNet) | 500K chars/month | 5M chars/month (12 months) |
ElevenLabs
ElevenLabs offers the most natural-sounding voices with instant voice cloning capabilities:
import requests
ELEVENLABS_API_KEY = "your-api-key"
VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Rachel voice
url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}"
headers = {
"xi-api-key": ELEVENLABS_API_KEY,
"Content-Type": "application/json"
}
data = {
"text": "Hello! Welcome to AI School.",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.5
}
}
response = requests.post(url, json=data, headers=headers)
with open("output.mp3", "wb") as f:
f.write(response.content)
Google Cloud Text-to-Speech
Google offers WaveNet and Neural2 voices with comprehensive language support:
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(
text="Hello! Welcome to AI School."
)
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Neural2-F",
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0,
pitch=0.0
)
response = client.synthesize_speech(
input=input_text, voice=voice,
audio_config=audio_config
)
with open("output.mp3", "wb") as f:
f.write(response.audio_content)
Azure Speech Service
Microsoft Azure offers the widest language coverage and advanced SSML features:
import azure.cognitiveservices.speech as speechsdk
speech_config = speechsdk.SpeechConfig(
subscription="your-key",
region="eastus"
)
speech_config.speech_synthesis_voice_name = "en-US-JennyNeural"
synthesizer = speechsdk.SpeechSynthesizer(
speech_config=speech_config,
audio_config=speechsdk.audio.AudioOutputConfig(
filename="output.wav"
)
)
result = synthesizer.speak_text_async(
"Hello! Welcome to AI School."
).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech synthesized successfully.")
Amazon Polly
AWS Polly integrates seamlessly with the AWS ecosystem for scalable TTS:
import boto3
polly = boto3.client("polly", region_name="us-east-1")
response = polly.synthesize_speech(
Text="Hello! Welcome to AI School.",
OutputFormat="mp3",
VoiceId="Joanna",
Engine="neural"
)
with open("output.mp3", "wb") as f:
f.write(response["AudioStream"].read())
Choosing the Right API
Best Voice Quality
Choose ElevenLabs for the most natural-sounding voices, especially for content creation, audiobooks, and applications where voice quality is the top priority.
Most Languages
Choose Azure Speech for the widest language and locale support (140+ languages), advanced SSML features, and enterprise custom voice capabilities.
Best Free Tier
Choose Amazon Polly for the most generous free tier (5M characters for 12 months) and tight AWS ecosystem integration.
Google Ecosystem
Choose Google Cloud TTS for WaveNet quality, Google Cloud integration, and excellent documentation and SSML support.
Lilly Tech Systems