Intermediate

Content Creation Pipeline

Build automated content production systems that chain LLMs for writing, image generation models for visuals, and text-to-speech for voiceovers — producing blog posts, social media content, podcasts, and videos at scale.

The Multi-Model Content Factory

Content creation is one of the most natural applications of multi-model AI. A single piece of content — a blog post, a social media campaign, or an educational video — requires multiple creative capabilities: writing (LLM), visual design (image generation), narration (TTS), and sometimes video assembly. By chaining these models together, you can build a content factory that produces polished, multi-format content from a single idea.

Marketing teams that previously needed a writer, a designer, a voice actor, and a video editor can now prototype content in minutes with a well-designed pipeline. The human role shifts from creation to curation — reviewing, refining, and approving AI-generated content rather than producing it from scratch.

💡

Pipeline flow: Idea/Topic → Script Generation (LLM) → Image Generation (DALL-E/Stable Diffusion) → Voiceover (ElevenLabs/Bark) → Video Assembly (MoviePy/FFmpeg) → Human Review → Publish. Each stage feeds its output as input to the next.

Models in the Content Pipeline

Stage	Model Options	Best For	Cost Per Unit
Writing	Claude 4, GPT-4o, Gemini 2.5, LLaMA 3	Blog posts, scripts, social copy, email sequences	$0.01–$0.15 per 1K words
Image Generation	DALL-E 3, Stable Diffusion XL, Midjourney, Flux	Blog headers, social images, product shots, illustrations	$0.02–$0.08 per image
Voice / TTS	ElevenLabs, OpenAI TTS, Bark, Google TTS	Podcast narration, video voiceovers, audio articles	$0.01–$0.30 per minute
Video Generation	Sora, Runway Gen-3, Pika, Kling	Short clips, B-roll, social video content	$0.10–$1.00 per second
Music / Audio	Suno, Udio, MusicGen	Background music, jingles, intro/outro	$0.05–$0.50 per track

Blog Post Generator with AI Images

This pipeline generates a complete blog post with an SEO-optimized article, a header image, and in-content illustrations — all from a single topic prompt:

import json
import os
import base64
from pathlib import Path
import anthropic
import openai

class BlogPostGenerator:
    """Generate complete blog posts with AI-written text
    and AI-generated images."""

    def __init__(self):
        self.llm = anthropic.Anthropic()
        self.image_client = openai.OpenAI()
        self.output_dir = Path("output/blog")
        self.output_dir.mkdir(parents=True, exist_ok=True)

    def generate(self, topic: str, style: str = "professional",
                 word_count: int = 1500) -> dict:
        """Generate a complete blog post from a topic."""

        # Step 1: Generate article outline and content
        article = self._write_article(topic, style, word_count)

        # Step 2: Generate image prompts from the article
        image_prompts = self._create_image_prompts(
            topic, article["sections"]
        )

        # Step 3: Generate images
        images = self._generate_images(image_prompts)

        # Step 4: Assemble the complete post
        post = self._assemble_post(article, images)

        # Save output
        output_path = self.output_dir / f"{self._slugify(topic)}.json"
        with open(output_path, "w") as f:
            json.dump(post, f, indent=2)

        return post

    def _write_article(self, topic: str, style: str,
                       word_count: int) -> dict:
        """Use Claude to write the full article."""
        response = self.llm.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{"role": "user", "content": f"""Write a
{word_count}-word blog post about: {topic}

Style: {style}
Format: Return valid JSON with this structure:
{{
  "title": "SEO-optimized title (under 60 chars)",
  "meta_description": "SEO meta description (under 160 chars)",
  "slug": "url-friendly-slug",
  "sections": [
    {{
      "heading": "Section heading",
      "content": "Full section content in HTML paragraphs",
      "needs_image": true/false
    }}
  ],
  "tags": ["tag1", "tag2"],
  "excerpt": "2-sentence excerpt for social sharing"
}}

Write engaging, informative content. Use short paragraphs.
Include practical examples. Return ONLY valid JSON."""}]
        )

        text = response.content[0].text
        start = text.find("{")
        end = text.rfind("}") + 1
        return json.loads(text[start:end])

    def _create_image_prompts(self, topic: str,
                              sections: list) -> list:
        """Generate DALL-E prompts for sections that need images."""
        prompts = []

        # Header image
        prompts.append({
            "type": "header",
            "prompt": f"Professional blog header image for an "
                      f"article about {topic}. Modern, clean "
                      f"design with subtle tech elements. "
                      f"16:9 aspect ratio. No text overlays.",
            "size": "1792x1024"
        })

        # Section images
        for i, section in enumerate(sections):
            if section.get("needs_image", False):
                prompts.append({
                    "type": "section",
                    "section_index": i,
                    "prompt": f"Illustration for a blog section "
                              f"about: {section['heading']}. "
                              f"Clean, professional style. "
                              f"Informative visual. No text.",
                    "size": "1024x1024"
                })

        return prompts

    def _generate_images(self, prompts: list) -> list:
        """Generate images using DALL-E 3."""
        images = []
        for prompt_data in prompts:
            try:
                response = self.image_client.images.generate(
                    model="dall-e-3",
                    prompt=prompt_data["prompt"],
                    size=prompt_data["size"],
                    quality="standard",
                    n=1
                )

                image_url = response.data[0].url
                revised_prompt = response.data[0].revised_prompt

                images.append({
                    **prompt_data,
                    "url": image_url,
                    "revised_prompt": revised_prompt,
                    "status": "success"
                })
            except Exception as e:
                images.append({
                    **prompt_data,
                    "status": "failed",
                    "error": str(e)
                })

        return images

    def _assemble_post(self, article: dict, images: list) -> dict:
        """Combine article text and images into final output."""
        header_image = next(
            (img for img in images if img["type"] == "header"),
            None
        )
        section_images = {
            img["section_index"]: img
            for img in images
            if img["type"] == "section" and img["status"] == "success"
        }

        html_parts = []
        for i, section in enumerate(article["sections"]):
            html_parts.append(f"<h2>{section['heading']}</h2>")
            html_parts.append(section["content"])
            if i in section_images:
                img = section_images[i]
                html_parts.append(
                    f'<img src="{img["url"]}" '
                    f'alt="{section["heading"]}">'
                )

        return {
            "title": article["title"],
            "slug": article.get("slug", ""),
            "meta_description": article["meta_description"],
            "excerpt": article["excerpt"],
            "tags": article["tags"],
            "header_image": header_image,
            "html_content": "\n".join(html_parts),
            "word_count": len("\n".join(
                s["content"] for s in article["sections"]
            ).split()),
            "image_count": len([
                i for i in images if i["status"] == "success"
            ])
        }

    def _slugify(self, text: str) -> str:
        import re
        return re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")


# Generate a blog post
generator = BlogPostGenerator()
post = generator.generate(
    topic="How RAG is Replacing Traditional Search in 2026",
    style="technical but accessible",
    word_count=1800
)
print(f"Generated: {post['title']}")
print(f"Words: {post['word_count']}, Images: {post['image_count']}")

Podcast Script and Audio Generation

This pipeline generates a podcast episode from a topic — writing the script with an LLM and converting it to natural-sounding audio with ElevenLabs, including distinct voices for host and guest:

import anthropic
import requests
import io
from pydub import AudioSegment

class PodcastGenerator:
    """Generate podcast episodes with LLM script + TTS audio."""

    def __init__(self, config: dict):
        self.llm = anthropic.Anthropic()
        self.elevenlabs_key = config["elevenlabs_api_key"]
        self.voices = {
            "host": config.get("host_voice_id",
                               "21m00Tcm4TlvDq8ikWAM"),
            "guest": config.get("guest_voice_id",
                                "AZnzlk1XvdvUeBnXmlld")
        }

    def generate_episode(self, topic: str,
                         duration_minutes: int = 10) -> dict:
        """Generate a full podcast episode."""

        # Step 1: Write the script
        script = self._write_script(topic, duration_minutes)

        # Step 2: Generate audio for each segment
        audio_segments = []
        for segment in script["segments"]:
            audio = self._generate_audio(
                segment["text"],
                self.voices[segment["speaker"]]
            )
            audio_segments.append({
                "speaker": segment["speaker"],
                "audio": audio,
                "duration_ms": len(audio)
            })

        # Step 3: Assemble the full episode
        full_episode = self._assemble_episode(audio_segments)

        # Step 4: Export
        output_path = f"output/podcasts/{script['slug']}.mp3"
        full_episode.export(output_path, format="mp3", bitrate="192k")

        return {
            "title": script["title"],
            "description": script["description"],
            "duration_seconds": len(full_episode) / 1000,
            "segment_count": len(script["segments"]),
            "output_path": output_path
        }

    def _write_script(self, topic: str,
                      duration_minutes: int) -> dict:
        """Generate podcast script with host/guest dialogue."""
        words = duration_minutes * 150  # ~150 words per minute

        response = self.llm.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=8000,
            messages=[{"role": "user", "content": f"""Write a
{duration_minutes}-minute podcast script (~{words} words) about:
{topic}

Format: Two speakers - "host" and "guest" (an expert).
Return valid JSON:
{{
  "title": "Episode title",
  "slug": "url-slug",
  "description": "Episode description for show notes",
  "segments": [
    {{"speaker": "host", "text": "What they say"}},
    {{"speaker": "guest", "text": "What they say"}}
  ]
}}

Make it conversational and engaging. The host asks questions and
provides transitions. The guest gives detailed, insightful
answers. Include an intro and outro. Return ONLY valid JSON."""}]
        )

        text = response.content[0].text
        start = text.find("{")
        end = text.rfind("}") + 1
        return json.loads(text[start:end])

    def _generate_audio(self, text: str,
                        voice_id: str) -> AudioSegment:
        """Convert text to speech using ElevenLabs."""
        url = (f"https://api.elevenlabs.io/v1/text-to-speech/"
               f"{voice_id}")

        response = requests.post(
            url,
            headers={
                "xi-api-key": self.elevenlabs_key,
                "Content-Type": "application/json"
            },
            json={
                "text": text,
                "model_id": "eleven_turbo_v2_5",
                "voice_settings": {
                    "stability": 0.6,
                    "similarity_boost": 0.8
                }
            }
        )

        return AudioSegment.from_mp3(io.BytesIO(response.content))

    def _assemble_episode(self,
                          segments: list) -> AudioSegment:
        """Combine audio segments with natural pauses."""
        episode = AudioSegment.silent(duration=500)  # Opening pause

        for i, seg in enumerate(segments):
            episode += seg["audio"]
            # Add pause between segments
            pause = 400 if seg["speaker"] == "host" else 300
            episode += AudioSegment.silent(duration=pause)

        episode += AudioSegment.silent(duration=1000)  # End pause
        return episode


# Generate a podcast episode
podcast = PodcastGenerator({
    "elevenlabs_api_key": "your-key",
    "host_voice_id": "21m00Tcm4TlvDq8ikWAM",
    "guest_voice_id": "AZnzlk1XvdvUeBnXmlld"
})
result = podcast.generate_episode(
    topic="The Future of Multi-Model AI Applications",
    duration_minutes=8
)
print(f"Episode: {result['title']}")
print(f"Duration: {result['duration_seconds']:.0f}s")

Social Media Content Pipeline

Generate a complete social media campaign — post text, hashtags, and matched images — from a single content brief:

class SocialMediaPipeline:
    """Generate multi-platform social media content."""

    def __init__(self):
        self.llm = anthropic.Anthropic()
        self.image_client = openai.OpenAI()

    def generate_campaign(self, brief: str,
                          platforms: list = None) -> dict:
        """Generate content for multiple platforms."""
        if platforms is None:
            platforms = ["linkedin", "twitter", "instagram"]

        # Step 1: Generate platform-specific copy
        copy = self._generate_copy(brief, platforms)

        # Step 2: Generate images sized for each platform
        images = self._generate_platform_images(brief, platforms)

        # Step 3: Combine
        campaign = {}
        for platform in platforms:
            campaign[platform] = {
                "text": copy.get(platform, ""),
                "image": images.get(platform, {}),
                "hashtags": copy.get(f"{platform}_hashtags", [])
            }

        return campaign

    def _generate_copy(self, brief: str,
                       platforms: list) -> dict:
        """Generate platform-optimized copy."""
        response = self.llm.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2000,
            messages=[{"role": "user", "content": f"""Create social
media posts for these platforms: {', '.join(platforms)}

Content brief: {brief}

Return JSON with keys for each platform and hashtags:
{{
  "linkedin": "Professional post (150-300 words, storytelling)",
  "linkedin_hashtags": ["hashtag1", "hashtag2"],
  "twitter": "Concise post (under 280 chars)",
  "twitter_hashtags": ["hashtag1"],
  "instagram": "Engaging caption with emojis (100-200 words)",
  "instagram_hashtags": ["up", "to", "30", "hashtags"]
}}

Match each platform's tone and best practices.
Return ONLY valid JSON."""}]
        )
        text = response.content[0].text
        start = text.find("{")
        end = text.rfind("}") + 1
        return json.loads(text[start:end])

    def _generate_platform_images(self, brief: str,
                                  platforms: list) -> dict:
        """Generate correctly-sized images per platform."""
        sizes = {
            "linkedin": ("1200x627", "Professional LinkedIn post "
                         "image"),
            "twitter": ("1200x675", "Twitter/X post image"),
            "instagram": ("1080x1080", "Instagram square post image")
        }
        images = {}
        for platform in platforms:
            if platform in sizes:
                size_str, desc = sizes[platform]
                # DALL-E 3 supports specific sizes
                dalle_size = "1024x1024"  # Closest standard size
                if "1200" in size_str:
                    dalle_size = "1792x1024"

                try:
                    response = self.image_client.images.generate(
                        model="dall-e-3",
                        prompt=f"{desc} for: {brief}. "
                               f"Clean, modern, eye-catching. "
                               f"No text in the image.",
                        size=dalle_size,
                        quality="standard",
                        n=1
                    )
                    images[platform] = {
                        "url": response.data[0].url,
                        "size": size_str
                    }
                except Exception as e:
                    images[platform] = {"error": str(e)}

        return images


# Generate a campaign
pipeline = SocialMediaPipeline()
campaign = pipeline.generate_campaign(
    brief="Launch announcement for our new AI-powered document "
          "processing platform that reduces manual data entry by 90%",
    platforms=["linkedin", "twitter", "instagram"]
)
for platform, content in campaign.items():
    print(f"\n--- {platform.upper()} ---")
    print(content["text"][:200] + "...")

Video Creation Workflow

The most complex content pipeline: generating a short video with script, scene images, voiceover, and assembly using MoviePy and FFmpeg:

from moviepy.editor import (
    ImageClip, AudioFileClip, CompositeVideoClip,
    concatenate_videoclips, TextClip
)
import json

class VideoCreator:
    """Assemble AI-generated assets into a video."""

    def create_video(self, script: dict, images: list,
                     audio_path: str, output_path: str):
        """Assemble video from script, images, and voiceover.

        Args:
            script: {"scenes": [{"text": "...", "duration": 5}]}
            images: List of image file paths (one per scene)
            audio_path: Path to voiceover audio file
            output_path: Path for output video
        """
        audio = AudioFileClip(audio_path)
        scenes = script["scenes"]
        clips = []

        for i, scene in enumerate(scenes):
            duration = scene["duration"]
            img_path = images[i] if i < len(images) else images[-1]

            # Create image clip with Ken Burns effect (slow zoom)
            img_clip = (
                ImageClip(img_path)
                .set_duration(duration)
                .resize(height=1080)
                .set_position("center")
            )

            # Add subtitle overlay
            if scene.get("text"):
                txt_clip = (
                    TextClip(
                        scene["text"],
                        fontsize=36,
                        color="white",
                        font="Arial-Bold",
                        stroke_color="black",
                        stroke_width=2,
                        size=(1800, None),
                        method="caption"
                    )
                    .set_duration(duration)
                    .set_position(("center", 900))
                )
                clip = CompositeVideoClip(
                    [img_clip, txt_clip],
                    size=(1920, 1080)
                )
            else:
                clip = img_clip

            clips.append(clip)

        # Concatenate all scene clips
        video = concatenate_videoclips(clips, method="compose")

        # Add voiceover audio
        video = video.set_audio(audio.subclip(0, video.duration))

        # Export
        video.write_videofile(
            output_path,
            fps=24,
            codec="libx264",
            audio_codec="aac",
            bitrate="5000k"
        )

        return {
            "output": output_path,
            "duration": video.duration,
            "scenes": len(clips),
            "resolution": "1920x1080"
        }

Prompt Chaining: LLM Output as Image Input

The key technique in content pipelines is prompt chaining — using the output of one model as input for the next. The LLM writes the article, then generates optimized image prompts based on the content it just wrote. This ensures visual consistency with the written content:

💡

Prompt chaining best practices:

Be explicit about output format: Each model in the chain needs structured output (JSON) that the next model can parse reliably.
Include style tokens: When the LLM generates image prompts, include consistent style descriptors (e.g., "flat illustration, blue and purple color scheme") to maintain visual coherence across all images.
Validate between steps: Check each model's output before passing to the next. A malformed image prompt wastes an API call.
Cache intermediate results: If image generation fails, you do not want to re-run the entire LLM step. Save outputs at each stage.

Cost Analysis per Content Piece

Content Type	LLM Cost	Image Cost	Audio Cost	Total Cost	Time
Blog post (1500 words + 3 images)	$0.05	$0.12	—	$0.17	~2 min
Social media set (3 platforms)	$0.03	$0.12	—	$0.15	~90 sec
Podcast episode (10 min)	$0.08	—	$0.45	$0.53	~3 min
Short video (60 sec)	$0.10	$0.32	$0.10	$0.52	~5 min
Email newsletter	$0.03	$0.04	—	$0.07	~45 sec
Full campaign (blog + social + email)	$0.11	$0.28	—	$0.39	~4 min

Quality Control and Human-in-the-Loop

AI-generated content should never be published without review. Here are the critical quality gates:

Factual accuracy: LLMs hallucinate. Every factual claim in generated content must be verified, especially statistics, quotes, and technical details.
Brand voice consistency: Include detailed brand guidelines in the LLM system prompt. Review output against brand voice checklist before publishing.
Image appropriateness: AI image models can produce unexpected results. Review every generated image for brand alignment, accuracy, and sensitivity.
Legal compliance: Ensure generated content does not infringe trademarks, copyrights, or make unsubstantiated claims about competitors.
SEO validation: Check generated titles, meta descriptions, and content structure against SEO requirements before publishing.
Accessibility: Add alt text to images, ensure readable formatting, and verify that content works with screen readers.

Use Cases by Industry

Industry	Content Types	Pipeline	Volume
Marketing	Blog posts, social media, email campaigns, ad copy	LLM + DALL-E + scheduling API	50–200 pieces/week
Education	Course materials, explainer videos, quizzes, flashcards	LLM + image gen + TTS + video assembly	10–50 lessons/week
E-commerce	Product descriptions, comparison guides, review summaries	LLM + product image enhancement	100–1000 listings/week
Media	News summaries, podcast episodes, video shorts	LLM + TTS + video gen	20–100 pieces/day
Presentations	Slide decks, speaker notes, handouts	LLM + image gen + PDF assembly	5–20 decks/week

What's Next

In the next lesson, we explore Vision + LLM Apps — combining computer vision models with large language models to build applications that can see and reason about images and video, from visual Q&A to automated quality inspection.

← Previous Conversational AI Next → Vision + LLM Apps