Intermediate

Text-to-Speech APIs

A hands-on guide to the leading TTS APIs — ElevenLabs, Google Cloud Text-to-Speech, Azure Speech Service, and Amazon Polly — with code examples, feature comparison, and pricing guidance.

API Comparison

Feature ElevenLabs Google Cloud TTS Azure Speech Amazon Polly
Voice Quality Exceptional Excellent Excellent Very Good
Voice Cloning Yes (instant + professional) Custom Voice (enterprise) Custom Neural Voice No
Languages 29+ 50+ 140+ 30+
SSML Support Limited Full Full + MSTTS extensions Full
Streaming Yes Yes Yes Yes
Free Tier 10,000 chars/month 1M chars/month (WaveNet) 500K chars/month 5M chars/month (12 months)

ElevenLabs

ElevenLabs offers the most natural-sounding voices with instant voice cloning capabilities:

import requests

ELEVENLABS_API_KEY = "your-api-key"
VOICE_ID = "21m00Tcm4TlvDq8ikWAM"  # Rachel voice

url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}"

headers = {
    "xi-api-key": ELEVENLABS_API_KEY,
    "Content-Type": "application/json"
}

data = {
    "text": "Hello! Welcome to AI School.",
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.75,
        "style": 0.5
    }
}

response = requests.post(url, json=data, headers=headers)

with open("output.mp3", "wb") as f:
    f.write(response.content)

Google Cloud Text-to-Speech

Google offers WaveNet and Neural2 voices with comprehensive language support:

from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

input_text = texttospeech.SynthesisInput(
    text="Hello! Welcome to AI School."
)

voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name="en-US-Neural2-F",
    ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3,
    speaking_rate=1.0,
    pitch=0.0
)

response = client.synthesize_speech(
    input=input_text, voice=voice,
    audio_config=audio_config
)

with open("output.mp3", "wb") as f:
    f.write(response.audio_content)

Azure Speech Service

Microsoft Azure offers the widest language coverage and advanced SSML features:

import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription="your-key",
    region="eastus"
)

speech_config.speech_synthesis_voice_name = "en-US-JennyNeural"

synthesizer = speechsdk.SpeechSynthesizer(
    speech_config=speech_config,
    audio_config=speechsdk.audio.AudioOutputConfig(
        filename="output.wav"
    )
)

result = synthesizer.speak_text_async(
    "Hello! Welcome to AI School."
).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized successfully.")

Amazon Polly

AWS Polly integrates seamlessly with the AWS ecosystem for scalable TTS:

import boto3

polly = boto3.client("polly", region_name="us-east-1")

response = polly.synthesize_speech(
    Text="Hello! Welcome to AI School.",
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="neural"
)

with open("output.mp3", "wb") as f:
    f.write(response["AudioStream"].read())

Choosing the Right API

🎤

Best Voice Quality

Choose ElevenLabs for the most natural-sounding voices, especially for content creation, audiobooks, and applications where voice quality is the top priority.

🌐

Most Languages

Choose Azure Speech for the widest language and locale support (140+ languages), advanced SSML features, and enterprise custom voice capabilities.

💰

Best Free Tier

Choose Amazon Polly for the most generous free tier (5M characters for 12 months) and tight AWS ecosystem integration.

Google Ecosystem

Choose Google Cloud TTS for WaveNet quality, Google Cloud integration, and excellent documentation and SSML support.