Intermediate

Text-to-Speech APIs

A hands-on guide to the leading TTS APIs — ElevenLabs, Google Cloud Text-to-Speech, Azure Speech Service, and Amazon Polly — with code examples, feature comparison, and pricing guidance.

API Comparison

Feature	ElevenLabs	Google Cloud TTS	Azure Speech	Amazon Polly
Voice Quality	Exceptional	Excellent	Excellent	Very Good
Voice Cloning	Yes (instant + professional)	Custom Voice (enterprise)	Custom Neural Voice	No
Languages	29+	50+	140+	30+
SSML Support	Limited	Full	Full + MSTTS extensions	Full
Streaming	Yes	Yes	Yes	Yes
Free Tier	10,000 chars/month	1M chars/month (WaveNet)	500K chars/month	5M chars/month (12 months)

ElevenLabs

ElevenLabs offers the most natural-sounding voices with instant voice cloning capabilities:

import requests

ELEVENLABS_API_KEY = "your-api-key"
VOICE_ID = "21m00Tcm4TlvDq8ikWAM"  # Rachel voice

url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}"

headers = {
    "xi-api-key": ELEVENLABS_API_KEY,
    "Content-Type": "application/json"
}

data = {
    "text": "Hello! Welcome to AI School.",
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.75,
        "style": 0.5
    }
}

response = requests.post(url, json=data, headers=headers)

with open("output.mp3", "wb") as f:
    f.write(response.content)

Google Cloud Text-to-Speech

Google offers WaveNet and Neural2 voices with comprehensive language support:

from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

input_text = texttospeech.SynthesisInput(
    text="Hello! Welcome to AI School."
)

voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name="en-US-Neural2-F",
    ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3,
    speaking_rate=1.0,
    pitch=0.0
)

response = client.synthesize_speech(
    input=input_text, voice=voice,
    audio_config=audio_config
)

with open("output.mp3", "wb") as f:
    f.write(response.audio_content)

Azure Speech Service

Microsoft Azure offers the widest language coverage and advanced SSML features:

import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription="your-key",
    region="eastus"
)

speech_config.speech_synthesis_voice_name = "en-US-JennyNeural"

synthesizer = speechsdk.SpeechSynthesizer(
    speech_config=speech_config,
    audio_config=speechsdk.audio.AudioOutputConfig(
        filename="output.wav"
    )
)

result = synthesizer.speak_text_async(
    "Hello! Welcome to AI School."
).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized successfully.")

Amazon Polly

AWS Polly integrates seamlessly with the AWS ecosystem for scalable TTS:

import boto3

polly = boto3.client("polly", region_name="us-east-1")

response = polly.synthesize_speech(
    Text="Hello! Welcome to AI School.",
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="neural"
)

with open("output.mp3", "wb") as f:
    f.write(response["AudioStream"].read())

Choosing the Right API

🎤

Best Voice Quality

Choose ElevenLabs for the most natural-sounding voices, especially for content creation, audiobooks, and applications where voice quality is the top priority.

🌐

Most Languages

Choose Azure Speech for the widest language and locale support (140+ languages), advanced SSML features, and enterprise custom voice capabilities.

💰

Best Free Tier

Choose Amazon Polly for the most generous free tier (5M characters for 12 months) and tight AWS ecosystem integration.

⚙

Google Ecosystem

Choose Google Cloud TTS for WaveNet quality, Google Cloud integration, and excellent documentation and SSML support.

← Previous How TTS Works Next → Neural Voices