Content Creation Pipeline
Build automated content production systems that chain LLMs for writing, image generation models for visuals, and text-to-speech for voiceovers — producing blog posts, social media content, podcasts, and videos at scale.
The Multi-Model Content Factory
Content creation is one of the most natural applications of multi-model AI. A single piece of content — a blog post, a social media campaign, or an educational video — requires multiple creative capabilities: writing (LLM), visual design (image generation), narration (TTS), and sometimes video assembly. By chaining these models together, you can build a content factory that produces polished, multi-format content from a single idea.
Marketing teams that previously needed a writer, a designer, a voice actor, and a video editor can now prototype content in minutes with a well-designed pipeline. The human role shifts from creation to curation — reviewing, refining, and approving AI-generated content rather than producing it from scratch.
Models in the Content Pipeline
| Stage | Model Options | Best For | Cost Per Unit |
|---|---|---|---|
| Writing | Claude 4, GPT-4o, Gemini 2.5, LLaMA 3 | Blog posts, scripts, social copy, email sequences | $0.01–$0.15 per 1K words |
| Image Generation | DALL-E 3, Stable Diffusion XL, Midjourney, Flux | Blog headers, social images, product shots, illustrations | $0.02–$0.08 per image |
| Voice / TTS | ElevenLabs, OpenAI TTS, Bark, Google TTS | Podcast narration, video voiceovers, audio articles | $0.01–$0.30 per minute |
| Video Generation | Sora, Runway Gen-3, Pika, Kling | Short clips, B-roll, social video content | $0.10–$1.00 per second |
| Music / Audio | Suno, Udio, MusicGen | Background music, jingles, intro/outro | $0.05–$0.50 per track |
Blog Post Generator with AI Images
This pipeline generates a complete blog post with an SEO-optimized article, a header image, and in-content illustrations — all from a single topic prompt:
import json
import os
import base64
from pathlib import Path
import anthropic
import openai
class BlogPostGenerator:
"""Generate complete blog posts with AI-written text
and AI-generated images."""
def __init__(self):
self.llm = anthropic.Anthropic()
self.image_client = openai.OpenAI()
self.output_dir = Path("output/blog")
self.output_dir.mkdir(parents=True, exist_ok=True)
def generate(self, topic: str, style: str = "professional",
word_count: int = 1500) -> dict:
"""Generate a complete blog post from a topic."""
# Step 1: Generate article outline and content
article = self._write_article(topic, style, word_count)
# Step 2: Generate image prompts from the article
image_prompts = self._create_image_prompts(
topic, article["sections"]
)
# Step 3: Generate images
images = self._generate_images(image_prompts)
# Step 4: Assemble the complete post
post = self._assemble_post(article, images)
# Save output
output_path = self.output_dir / f"{self._slugify(topic)}.json"
with open(output_path, "w") as f:
json.dump(post, f, indent=2)
return post
def _write_article(self, topic: str, style: str,
word_count: int) -> dict:
"""Use Claude to write the full article."""
response = self.llm.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": f"""Write a
{word_count}-word blog post about: {topic}
Style: {style}
Format: Return valid JSON with this structure:
{{
"title": "SEO-optimized title (under 60 chars)",
"meta_description": "SEO meta description (under 160 chars)",
"slug": "url-friendly-slug",
"sections": [
{{
"heading": "Section heading",
"content": "Full section content in HTML paragraphs",
"needs_image": true/false
}}
],
"tags": ["tag1", "tag2"],
"excerpt": "2-sentence excerpt for social sharing"
}}
Write engaging, informative content. Use short paragraphs.
Include practical examples. Return ONLY valid JSON."""}]
)
text = response.content[0].text
start = text.find("{")
end = text.rfind("}") + 1
return json.loads(text[start:end])
def _create_image_prompts(self, topic: str,
sections: list) -> list:
"""Generate DALL-E prompts for sections that need images."""
prompts = []
# Header image
prompts.append({
"type": "header",
"prompt": f"Professional blog header image for an "
f"article about {topic}. Modern, clean "
f"design with subtle tech elements. "
f"16:9 aspect ratio. No text overlays.",
"size": "1792x1024"
})
# Section images
for i, section in enumerate(sections):
if section.get("needs_image", False):
prompts.append({
"type": "section",
"section_index": i,
"prompt": f"Illustration for a blog section "
f"about: {section['heading']}. "
f"Clean, professional style. "
f"Informative visual. No text.",
"size": "1024x1024"
})
return prompts
def _generate_images(self, prompts: list) -> list:
"""Generate images using DALL-E 3."""
images = []
for prompt_data in prompts:
try:
response = self.image_client.images.generate(
model="dall-e-3",
prompt=prompt_data["prompt"],
size=prompt_data["size"],
quality="standard",
n=1
)
image_url = response.data[0].url
revised_prompt = response.data[0].revised_prompt
images.append({
**prompt_data,
"url": image_url,
"revised_prompt": revised_prompt,
"status": "success"
})
except Exception as e:
images.append({
**prompt_data,
"status": "failed",
"error": str(e)
})
return images
def _assemble_post(self, article: dict, images: list) -> dict:
"""Combine article text and images into final output."""
header_image = next(
(img for img in images if img["type"] == "header"),
None
)
section_images = {
img["section_index"]: img
for img in images
if img["type"] == "section" and img["status"] == "success"
}
html_parts = []
for i, section in enumerate(article["sections"]):
html_parts.append(f"<h2>{section['heading']}</h2>")
html_parts.append(section["content"])
if i in section_images:
img = section_images[i]
html_parts.append(
f'<img src="{img["url"]}" '
f'alt="{section["heading"]}">'
)
return {
"title": article["title"],
"slug": article.get("slug", ""),
"meta_description": article["meta_description"],
"excerpt": article["excerpt"],
"tags": article["tags"],
"header_image": header_image,
"html_content": "\n".join(html_parts),
"word_count": len("\n".join(
s["content"] for s in article["sections"]
).split()),
"image_count": len([
i for i in images if i["status"] == "success"
])
}
def _slugify(self, text: str) -> str:
import re
return re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
# Generate a blog post
generator = BlogPostGenerator()
post = generator.generate(
topic="How RAG is Replacing Traditional Search in 2026",
style="technical but accessible",
word_count=1800
)
print(f"Generated: {post['title']}")
print(f"Words: {post['word_count']}, Images: {post['image_count']}")
Podcast Script and Audio Generation
This pipeline generates a podcast episode from a topic — writing the script with an LLM and converting it to natural-sounding audio with ElevenLabs, including distinct voices for host and guest:
import anthropic
import requests
import io
from pydub import AudioSegment
class PodcastGenerator:
"""Generate podcast episodes with LLM script + TTS audio."""
def __init__(self, config: dict):
self.llm = anthropic.Anthropic()
self.elevenlabs_key = config["elevenlabs_api_key"]
self.voices = {
"host": config.get("host_voice_id",
"21m00Tcm4TlvDq8ikWAM"),
"guest": config.get("guest_voice_id",
"AZnzlk1XvdvUeBnXmlld")
}
def generate_episode(self, topic: str,
duration_minutes: int = 10) -> dict:
"""Generate a full podcast episode."""
# Step 1: Write the script
script = self._write_script(topic, duration_minutes)
# Step 2: Generate audio for each segment
audio_segments = []
for segment in script["segments"]:
audio = self._generate_audio(
segment["text"],
self.voices[segment["speaker"]]
)
audio_segments.append({
"speaker": segment["speaker"],
"audio": audio,
"duration_ms": len(audio)
})
# Step 3: Assemble the full episode
full_episode = self._assemble_episode(audio_segments)
# Step 4: Export
output_path = f"output/podcasts/{script['slug']}.mp3"
full_episode.export(output_path, format="mp3", bitrate="192k")
return {
"title": script["title"],
"description": script["description"],
"duration_seconds": len(full_episode) / 1000,
"segment_count": len(script["segments"]),
"output_path": output_path
}
def _write_script(self, topic: str,
duration_minutes: int) -> dict:
"""Generate podcast script with host/guest dialogue."""
words = duration_minutes * 150 # ~150 words per minute
response = self.llm.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8000,
messages=[{"role": "user", "content": f"""Write a
{duration_minutes}-minute podcast script (~{words} words) about:
{topic}
Format: Two speakers - "host" and "guest" (an expert).
Return valid JSON:
{{
"title": "Episode title",
"slug": "url-slug",
"description": "Episode description for show notes",
"segments": [
{{"speaker": "host", "text": "What they say"}},
{{"speaker": "guest", "text": "What they say"}}
]
}}
Make it conversational and engaging. The host asks questions and
provides transitions. The guest gives detailed, insightful
answers. Include an intro and outro. Return ONLY valid JSON."""}]
)
text = response.content[0].text
start = text.find("{")
end = text.rfind("}") + 1
return json.loads(text[start:end])
def _generate_audio(self, text: str,
voice_id: str) -> AudioSegment:
"""Convert text to speech using ElevenLabs."""
url = (f"https://api.elevenlabs.io/v1/text-to-speech/"
f"{voice_id}")
response = requests.post(
url,
headers={
"xi-api-key": self.elevenlabs_key,
"Content-Type": "application/json"
},
json={
"text": text,
"model_id": "eleven_turbo_v2_5",
"voice_settings": {
"stability": 0.6,
"similarity_boost": 0.8
}
}
)
return AudioSegment.from_mp3(io.BytesIO(response.content))
def _assemble_episode(self,
segments: list) -> AudioSegment:
"""Combine audio segments with natural pauses."""
episode = AudioSegment.silent(duration=500) # Opening pause
for i, seg in enumerate(segments):
episode += seg["audio"]
# Add pause between segments
pause = 400 if seg["speaker"] == "host" else 300
episode += AudioSegment.silent(duration=pause)
episode += AudioSegment.silent(duration=1000) # End pause
return episode
# Generate a podcast episode
podcast = PodcastGenerator({
"elevenlabs_api_key": "your-key",
"host_voice_id": "21m00Tcm4TlvDq8ikWAM",
"guest_voice_id": "AZnzlk1XvdvUeBnXmlld"
})
result = podcast.generate_episode(
topic="The Future of Multi-Model AI Applications",
duration_minutes=8
)
print(f"Episode: {result['title']}")
print(f"Duration: {result['duration_seconds']:.0f}s")
Social Media Content Pipeline
Generate a complete social media campaign — post text, hashtags, and matched images — from a single content brief:
class SocialMediaPipeline:
"""Generate multi-platform social media content."""
def __init__(self):
self.llm = anthropic.Anthropic()
self.image_client = openai.OpenAI()
def generate_campaign(self, brief: str,
platforms: list = None) -> dict:
"""Generate content for multiple platforms."""
if platforms is None:
platforms = ["linkedin", "twitter", "instagram"]
# Step 1: Generate platform-specific copy
copy = self._generate_copy(brief, platforms)
# Step 2: Generate images sized for each platform
images = self._generate_platform_images(brief, platforms)
# Step 3: Combine
campaign = {}
for platform in platforms:
campaign[platform] = {
"text": copy.get(platform, ""),
"image": images.get(platform, {}),
"hashtags": copy.get(f"{platform}_hashtags", [])
}
return campaign
def _generate_copy(self, brief: str,
platforms: list) -> dict:
"""Generate platform-optimized copy."""
response = self.llm.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2000,
messages=[{"role": "user", "content": f"""Create social
media posts for these platforms: {', '.join(platforms)}
Content brief: {brief}
Return JSON with keys for each platform and hashtags:
{{
"linkedin": "Professional post (150-300 words, storytelling)",
"linkedin_hashtags": ["hashtag1", "hashtag2"],
"twitter": "Concise post (under 280 chars)",
"twitter_hashtags": ["hashtag1"],
"instagram": "Engaging caption with emojis (100-200 words)",
"instagram_hashtags": ["up", "to", "30", "hashtags"]
}}
Match each platform's tone and best practices.
Return ONLY valid JSON."""}]
)
text = response.content[0].text
start = text.find("{")
end = text.rfind("}") + 1
return json.loads(text[start:end])
def _generate_platform_images(self, brief: str,
platforms: list) -> dict:
"""Generate correctly-sized images per platform."""
sizes = {
"linkedin": ("1200x627", "Professional LinkedIn post "
"image"),
"twitter": ("1200x675", "Twitter/X post image"),
"instagram": ("1080x1080", "Instagram square post image")
}
images = {}
for platform in platforms:
if platform in sizes:
size_str, desc = sizes[platform]
# DALL-E 3 supports specific sizes
dalle_size = "1024x1024" # Closest standard size
if "1200" in size_str:
dalle_size = "1792x1024"
try:
response = self.image_client.images.generate(
model="dall-e-3",
prompt=f"{desc} for: {brief}. "
f"Clean, modern, eye-catching. "
f"No text in the image.",
size=dalle_size,
quality="standard",
n=1
)
images[platform] = {
"url": response.data[0].url,
"size": size_str
}
except Exception as e:
images[platform] = {"error": str(e)}
return images
# Generate a campaign
pipeline = SocialMediaPipeline()
campaign = pipeline.generate_campaign(
brief="Launch announcement for our new AI-powered document "
"processing platform that reduces manual data entry by 90%",
platforms=["linkedin", "twitter", "instagram"]
)
for platform, content in campaign.items():
print(f"\n--- {platform.upper()} ---")
print(content["text"][:200] + "...")
Video Creation Workflow
The most complex content pipeline: generating a short video with script, scene images, voiceover, and assembly using MoviePy and FFmpeg:
from moviepy.editor import (
ImageClip, AudioFileClip, CompositeVideoClip,
concatenate_videoclips, TextClip
)
import json
class VideoCreator:
"""Assemble AI-generated assets into a video."""
def create_video(self, script: dict, images: list,
audio_path: str, output_path: str):
"""Assemble video from script, images, and voiceover.
Args:
script: {"scenes": [{"text": "...", "duration": 5}]}
images: List of image file paths (one per scene)
audio_path: Path to voiceover audio file
output_path: Path for output video
"""
audio = AudioFileClip(audio_path)
scenes = script["scenes"]
clips = []
for i, scene in enumerate(scenes):
duration = scene["duration"]
img_path = images[i] if i < len(images) else images[-1]
# Create image clip with Ken Burns effect (slow zoom)
img_clip = (
ImageClip(img_path)
.set_duration(duration)
.resize(height=1080)
.set_position("center")
)
# Add subtitle overlay
if scene.get("text"):
txt_clip = (
TextClip(
scene["text"],
fontsize=36,
color="white",
font="Arial-Bold",
stroke_color="black",
stroke_width=2,
size=(1800, None),
method="caption"
)
.set_duration(duration)
.set_position(("center", 900))
)
clip = CompositeVideoClip(
[img_clip, txt_clip],
size=(1920, 1080)
)
else:
clip = img_clip
clips.append(clip)
# Concatenate all scene clips
video = concatenate_videoclips(clips, method="compose")
# Add voiceover audio
video = video.set_audio(audio.subclip(0, video.duration))
# Export
video.write_videofile(
output_path,
fps=24,
codec="libx264",
audio_codec="aac",
bitrate="5000k"
)
return {
"output": output_path,
"duration": video.duration,
"scenes": len(clips),
"resolution": "1920x1080"
}
Prompt Chaining: LLM Output as Image Input
The key technique in content pipelines is prompt chaining — using the output of one model as input for the next. The LLM writes the article, then generates optimized image prompts based on the content it just wrote. This ensures visual consistency with the written content:
Prompt chaining best practices:
- Be explicit about output format: Each model in the chain needs structured output (JSON) that the next model can parse reliably.
- Include style tokens: When the LLM generates image prompts, include consistent style descriptors (e.g., "flat illustration, blue and purple color scheme") to maintain visual coherence across all images.
- Validate between steps: Check each model's output before passing to the next. A malformed image prompt wastes an API call.
- Cache intermediate results: If image generation fails, you do not want to re-run the entire LLM step. Save outputs at each stage.
Cost Analysis per Content Piece
| Content Type | LLM Cost | Image Cost | Audio Cost | Total Cost | Time |
|---|---|---|---|---|---|
| Blog post (1500 words + 3 images) | $0.05 | $0.12 | — | $0.17 | ~2 min |
| Social media set (3 platforms) | $0.03 | $0.12 | — | $0.15 | ~90 sec |
| Podcast episode (10 min) | $0.08 | — | $0.45 | $0.53 | ~3 min |
| Short video (60 sec) | $0.10 | $0.32 | $0.10 | $0.52 | ~5 min |
| Email newsletter | $0.03 | $0.04 | — | $0.07 | ~45 sec |
| Full campaign (blog + social + email) | $0.11 | $0.28 | — | $0.39 | ~4 min |
Quality Control and Human-in-the-Loop
AI-generated content should never be published without review. Here are the critical quality gates:
- Factual accuracy: LLMs hallucinate. Every factual claim in generated content must be verified, especially statistics, quotes, and technical details.
- Brand voice consistency: Include detailed brand guidelines in the LLM system prompt. Review output against brand voice checklist before publishing.
- Image appropriateness: AI image models can produce unexpected results. Review every generated image for brand alignment, accuracy, and sensitivity.
- Legal compliance: Ensure generated content does not infringe trademarks, copyrights, or make unsubstantiated claims about competitors.
- SEO validation: Check generated titles, meta descriptions, and content structure against SEO requirements before publishing.
- Accessibility: Add alt text to images, ensure readable formatting, and verify that content works with screen readers.
Use Cases by Industry
| Industry | Content Types | Pipeline | Volume |
|---|---|---|---|
| Marketing | Blog posts, social media, email campaigns, ad copy | LLM + DALL-E + scheduling API | 50–200 pieces/week |
| Education | Course materials, explainer videos, quizzes, flashcards | LLM + image gen + TTS + video assembly | 10–50 lessons/week |
| E-commerce | Product descriptions, comparison guides, review summaries | LLM + product image enhancement | 100–1000 listings/week |
| Media | News summaries, podcast episodes, video shorts | LLM + TTS + video gen | 20–100 pieces/day |
| Presentations | Slide decks, speaker notes, handouts | LLM + image gen + PDF assembly | 5–20 decks/week |
What's Next
In the next lesson, we explore Vision + LLM Apps — combining computer vision models with large language models to build applications that can see and reason about images and video, from visual Q&A to automated quality inspection.
Lilly Tech Systems