Beginner

Requirements Analysis for AI Systems

Before writing any code or choosing any framework, you need to define what your AI system must do, how fast, at what scale, and at what cost. This lesson gives you a concrete framework for AI system requirements with real numbers.

Functional vs. Non-Functional Requirements

AI systems have the same requirement categories as traditional systems, but with AI-specific dimensions that most teams miss.

Functional Requirements (What the System Does)

Input/Output Contract

Define exactly what goes in and what comes out. For an LLM chatbot: input is a conversation history (max 8K tokens), output is a response (max 2K tokens). For a fraud detector: input is a transaction object (amount, merchant, user features), output is a risk score 0–1 and a binary decision.

Quality Requirements

What accuracy is acceptable? A spam filter at 99.5% accuracy is fine. A medical diagnosis system at 99.5% accuracy might not be. Define your metrics: accuracy, precision, recall, F1, NDCG, BLEU, or business-specific metrics like click-through rate.

Model Update Frequency

How often does the model need to reflect new data? Real-time learning (fraud detection), daily retraining (recommendations), weekly (content moderation), or quarterly (document classification)?

Explainability Requirements

Does the system need to explain its decisions? Regulated industries (finance, healthcare) often require it. This constrains your model choices — a gradient-boosted tree is more explainable than a deep neural network.

Non-Functional Requirements (How the System Performs)

These are where AI systems diverge most from traditional systems. Get these wrong and you will build something that works in a notebook but fails in production.

Latency Budgets

AI inference is slow compared to traditional API calls. You need to budget latency carefully across the request path.

💡

Why p99 matters more than average: If your average latency is 50ms but your p99 is 2 seconds, 1 in 100 users waits 2+ seconds. At 10,000 QPS, that is 100 users per second having a bad experience. Always design for p99, not average.

Use Case	p50 Target	p95 Target	p99 Target	Why
Search autocomplete	10ms	30ms	50ms	User is typing — must feel instant
Product recommendations	30ms	80ms	150ms	Below page load budget, not blocking
Fraud detection	20ms	50ms	100ms	Must decide before transaction completes
Chatbot response	500ms	2s	5s	Users expect some thinking time
Image generation	5s	15s	30s	Users expect to wait, but not forever
Document processing	10s	30s	60s	Async — user submits and checks back

Latency Budget Breakdown Example

For a product recommendation API with a 150ms p99 budget:

Total budget: 150ms (p99)

Breakdown:
  Network (client to LB):           5ms
  Load balancer:                     2ms
  API gateway (auth, rate limit):   10ms
  Feature store lookup (Redis):     10ms   # p99 for Redis GET
  Model inference (GPU):            80ms   # The expensive part
  Post-processing (business rules): 10ms
  Serialization + response:          5ms
  Network (LB to client):           5ms
  Buffer for variance:             23ms
                                  ------
  Total:                           150ms

Key constraint: Model inference gets 80ms.
This means:
  - Model must be small enough to infer in 80ms on target GPU
  - Or use model distillation / quantization to meet budget
  - Or use batch precomputation and serve from cache

Throughput Estimation

Calculate your queries per second (QPS) and plan capacity accordingly. Here is a concrete example.

# Throughput estimation for an e-commerce recommendation system

# Step 1: Daily active users
daily_active_users = 2_000_000

# Step 2: Requests per user per day
# Homepage load (1) + category pages (3) + product pages (5) + cart (1)
requests_per_user_per_day = 10

# Step 3: Total daily requests
daily_requests = daily_active_users * requests_per_user_per_day  # 20M

# Step 4: Average QPS
avg_qps = daily_requests / 86_400  # ~231 QPS

# Step 5: Peak QPS (typically 3-5x average)
peak_qps = avg_qps * 4  # ~925 QPS

# Step 6: Design QPS (add 50% headroom for growth + spikes)
design_qps = peak_qps * 1.5  # ~1,388 QPS → round to 1,500 QPS

# Step 7: GPU capacity planning
inference_time_per_request = 0.025  # 25ms on A10G GPU
requests_per_gpu_per_second = 1 / inference_time_per_request  # 40 QPS per GPU

# With batching (batch size 8, 60ms per batch):
batched_throughput = 8 / 0.060  # ~133 QPS per GPU

# GPUs needed for design QPS:
gpus_needed = design_qps / batched_throughput  # ~11.3 → 12 GPUs

# With redundancy (N+2 for fault tolerance):
total_gpus = 12 + 2  # 14 GPUs

# Monthly cost at $0.75/hr per A10G (AWS spot):
monthly_cost = total_gpus * 0.75 * 24 * 30  # $7,560/month

Data Volume Estimation

AI systems generate and consume far more data than most teams expect. Plan storage and processing capacity for these categories:

Data Category	Example Size	Growth Rate	Retention
Training data	500GB of labeled examples	+50GB/month (new labels)	Indefinite (versioned)
Feature data	200GB in feature store	+10GB/month	Rolling 90 days online, 2 years offline
Inference logs	2TB/month at 1,500 QPS	Grows with traffic	30 days hot, 1 year cold
Model artifacts	2GB per model version	1–4 versions/month	Last 10 versions
Experiment data	50GB (metrics, hyperparams, runs)	+5GB/month	Indefinite
Embeddings	10M items × 768 dims × 4 bytes = 30GB	Grows with catalog	Current + 1 previous version

Cost Constraints

AI infrastructure costs are the number one surprise for teams moving from prototype to production. Establish budgets early.

GPU Compute

Training: A single training run on 8x A100 GPUs can cost $500–$5,000. Monthly retraining: $6K–$60K/year. Inference: $0.50–$3.00/hr per GPU. At 14 GPUs, that is $5K–$30K/month.

API Costs

If using third-party APIs (OpenAI, Anthropic): GPT-4o at $2.50/1M input tokens. At 1,500 QPS with 500 tokens/request, that is ~$2.7M input tokens/day = $6,750/day. This is why many teams self-host.

Data Processing

Feature engineering, ETL pipelines, data validation. Typically 20–30% of total AI infrastructure cost. Spark/Dataflow clusters for batch processing: $2K–$10K/month.

Storage

Training data in S3/GCS: $0.023/GB/month. Feature store (Redis): $0.10–$0.50/GB/month. Inference logs at 2TB/month: $50–$200/month depending on tier.

⚠

Cost trap: Teams often budget only for inference GPUs and forget about training compute, data processing, feature store infrastructure, monitoring tools, and the engineering time to maintain all of it. Total cost of ownership is typically 3–5x the raw GPU cost.

AI System Requirements Document Template

Use this template at the start of every AI project. Fill it out before writing any code.

# AI System Requirements Document
# Project: [Name]
# Date: [Date]
# Author: [Name]

## 1. Problem Statement
- Business problem being solved:
- Current solution (if any):
- Why AI is needed (what rules/heuristics cannot do):
- Success criteria (business metric):

## 2. Functional Requirements
- Input format and constraints:
- Output format and constraints:
- Quality metric and target (e.g., precision > 0.95):
- Explainability requirements (yes/no, what level):
- Model update frequency:
- Supported languages/regions:

## 3. Latency Requirements
- p50 target: ___ms
- p95 target: ___ms
- p99 target: ___ms
- Is streaming response acceptable? (yes/no)
- Is async processing acceptable? (yes/no)

## 4. Throughput Requirements
- Daily active users: ___
- Requests per user per day: ___
- Average QPS: ___
- Peak QPS: ___
- Design QPS (with headroom): ___

## 5. Data Requirements
- Training data source(s):
- Training data size (current):
- Training data growth rate:
- Feature data sources:
- Real-time features needed? (yes/no)
- Data retention policy:

## 6. Cost Constraints
- Monthly infrastructure budget: $___
- Cost per inference request target: $___
- Training budget per run: $___
- Build vs. buy preference:

## 7. Reliability Requirements
- Uptime SLA: ___% (e.g., 99.9%)
- Acceptable fallback behavior:
- Maximum acceptable data staleness:
- Disaster recovery requirements:

## 8. Compliance and Security
- Data privacy requirements (GDPR, CCPA, HIPAA):
- Model audit requirements:
- PII handling:
- Access control requirements:

## 9. Team and Timeline
- Team size and skills:
- MVP timeline:
- Production timeline:
- Maintenance ownership:

💡

Apply at work tomorrow: Copy this template and fill it out for your current or upcoming AI project. Share it with your team and stakeholders. You will be surprised how many critical decisions surface before any code is written. The 2 hours you spend on requirements will save 2 months of rework.

← Previous How AI Systems Differ Next → Data Architecture Patterns