Beginner

Requirements Analysis for AI Systems

Before writing any code or choosing any framework, you need to define what your AI system must do, how fast, at what scale, and at what cost. This lesson gives you a concrete framework for AI system requirements with real numbers.

Functional vs. Non-Functional Requirements

AI systems have the same requirement categories as traditional systems, but with AI-specific dimensions that most teams miss.

Functional Requirements (What the System Does)

1

Input/Output Contract

Define exactly what goes in and what comes out. For an LLM chatbot: input is a conversation history (max 8K tokens), output is a response (max 2K tokens). For a fraud detector: input is a transaction object (amount, merchant, user features), output is a risk score 0–1 and a binary decision.

2

Quality Requirements

What accuracy is acceptable? A spam filter at 99.5% accuracy is fine. A medical diagnosis system at 99.5% accuracy might not be. Define your metrics: accuracy, precision, recall, F1, NDCG, BLEU, or business-specific metrics like click-through rate.

3

Model Update Frequency

How often does the model need to reflect new data? Real-time learning (fraud detection), daily retraining (recommendations), weekly (content moderation), or quarterly (document classification)?

4

Explainability Requirements

Does the system need to explain its decisions? Regulated industries (finance, healthcare) often require it. This constrains your model choices — a gradient-boosted tree is more explainable than a deep neural network.

Non-Functional Requirements (How the System Performs)

These are where AI systems diverge most from traditional systems. Get these wrong and you will build something that works in a notebook but fails in production.

Latency Budgets

AI inference is slow compared to traditional API calls. You need to budget latency carefully across the request path.

💡
Why p99 matters more than average: If your average latency is 50ms but your p99 is 2 seconds, 1 in 100 users waits 2+ seconds. At 10,000 QPS, that is 100 users per second having a bad experience. Always design for p99, not average.
Use Casep50 Targetp95 Targetp99 TargetWhy
Search autocomplete10ms30ms50msUser is typing — must feel instant
Product recommendations30ms80ms150msBelow page load budget, not blocking
Fraud detection20ms50ms100msMust decide before transaction completes
Chatbot response500ms2s5sUsers expect some thinking time
Image generation5s15s30sUsers expect to wait, but not forever
Document processing10s30s60sAsync — user submits and checks back

Latency Budget Breakdown Example

For a product recommendation API with a 150ms p99 budget:

Total budget: 150ms (p99)

Breakdown:
  Network (client to LB):           5ms
  Load balancer:                     2ms
  API gateway (auth, rate limit):   10ms
  Feature store lookup (Redis):     10ms   # p99 for Redis GET
  Model inference (GPU):            80ms   # The expensive part
  Post-processing (business rules): 10ms
  Serialization + response:          5ms
  Network (LB to client):           5ms
  Buffer for variance:             23ms
                                  ------
  Total:                           150ms

Key constraint: Model inference gets 80ms.
This means:
  - Model must be small enough to infer in 80ms on target GPU
  - Or use model distillation / quantization to meet budget
  - Or use batch precomputation and serve from cache

Throughput Estimation

Calculate your queries per second (QPS) and plan capacity accordingly. Here is a concrete example.

# Throughput estimation for an e-commerce recommendation system

# Step 1: Daily active users
daily_active_users = 2_000_000

# Step 2: Requests per user per day
# Homepage load (1) + category pages (3) + product pages (5) + cart (1)
requests_per_user_per_day = 10

# Step 3: Total daily requests
daily_requests = daily_active_users * requests_per_user_per_day  # 20M

# Step 4: Average QPS
avg_qps = daily_requests / 86_400  # ~231 QPS

# Step 5: Peak QPS (typically 3-5x average)
peak_qps = avg_qps * 4  # ~925 QPS

# Step 6: Design QPS (add 50% headroom for growth + spikes)
design_qps = peak_qps * 1.5  # ~1,388 QPS → round to 1,500 QPS

# Step 7: GPU capacity planning
inference_time_per_request = 0.025  # 25ms on A10G GPU
requests_per_gpu_per_second = 1 / inference_time_per_request  # 40 QPS per GPU

# With batching (batch size 8, 60ms per batch):
batched_throughput = 8 / 0.060  # ~133 QPS per GPU

# GPUs needed for design QPS:
gpus_needed = design_qps / batched_throughput  # ~11.3 → 12 GPUs

# With redundancy (N+2 for fault tolerance):
total_gpus = 12 + 2  # 14 GPUs

# Monthly cost at $0.75/hr per A10G (AWS spot):
monthly_cost = total_gpus * 0.75 * 24 * 30  # $7,560/month

Data Volume Estimation

AI systems generate and consume far more data than most teams expect. Plan storage and processing capacity for these categories:

Data CategoryExample SizeGrowth RateRetention
Training data500GB of labeled examples+50GB/month (new labels)Indefinite (versioned)
Feature data200GB in feature store+10GB/monthRolling 90 days online, 2 years offline
Inference logs2TB/month at 1,500 QPSGrows with traffic30 days hot, 1 year cold
Model artifacts2GB per model version1–4 versions/monthLast 10 versions
Experiment data50GB (metrics, hyperparams, runs)+5GB/monthIndefinite
Embeddings10M items × 768 dims × 4 bytes = 30GBGrows with catalogCurrent + 1 previous version

Cost Constraints

AI infrastructure costs are the number one surprise for teams moving from prototype to production. Establish budgets early.

GPU Compute

Training: A single training run on 8x A100 GPUs can cost $500–$5,000. Monthly retraining: $6K–$60K/year. Inference: $0.50–$3.00/hr per GPU. At 14 GPUs, that is $5K–$30K/month.

API Costs

If using third-party APIs (OpenAI, Anthropic): GPT-4o at $2.50/1M input tokens. At 1,500 QPS with 500 tokens/request, that is ~$2.7M input tokens/day = $6,750/day. This is why many teams self-host.

Data Processing

Feature engineering, ETL pipelines, data validation. Typically 20–30% of total AI infrastructure cost. Spark/Dataflow clusters for batch processing: $2K–$10K/month.

Storage

Training data in S3/GCS: $0.023/GB/month. Feature store (Redis): $0.10–$0.50/GB/month. Inference logs at 2TB/month: $50–$200/month depending on tier.

Cost trap: Teams often budget only for inference GPUs and forget about training compute, data processing, feature store infrastructure, monitoring tools, and the engineering time to maintain all of it. Total cost of ownership is typically 3–5x the raw GPU cost.

AI System Requirements Document Template

Use this template at the start of every AI project. Fill it out before writing any code.

# AI System Requirements Document
# Project: [Name]
# Date: [Date]
# Author: [Name]

## 1. Problem Statement
- Business problem being solved:
- Current solution (if any):
- Why AI is needed (what rules/heuristics cannot do):
- Success criteria (business metric):

## 2. Functional Requirements
- Input format and constraints:
- Output format and constraints:
- Quality metric and target (e.g., precision > 0.95):
- Explainability requirements (yes/no, what level):
- Model update frequency:
- Supported languages/regions:

## 3. Latency Requirements
- p50 target: ___ms
- p95 target: ___ms
- p99 target: ___ms
- Is streaming response acceptable? (yes/no)
- Is async processing acceptable? (yes/no)

## 4. Throughput Requirements
- Daily active users: ___
- Requests per user per day: ___
- Average QPS: ___
- Peak QPS: ___
- Design QPS (with headroom): ___

## 5. Data Requirements
- Training data source(s):
- Training data size (current):
- Training data growth rate:
- Feature data sources:
- Real-time features needed? (yes/no)
- Data retention policy:

## 6. Cost Constraints
- Monthly infrastructure budget: $___
- Cost per inference request target: $___
- Training budget per run: $___
- Build vs. buy preference:

## 7. Reliability Requirements
- Uptime SLA: ___% (e.g., 99.9%)
- Acceptable fallback behavior:
- Maximum acceptable data staleness:
- Disaster recovery requirements:

## 8. Compliance and Security
- Data privacy requirements (GDPR, CCPA, HIPAA):
- Model audit requirements:
- PII handling:
- Access control requirements:

## 9. Team and Timeline
- Team size and skills:
- MVP timeline:
- Production timeline:
- Maintenance ownership:
💡
Apply at work tomorrow: Copy this template and fill it out for your current or upcoming AI project. Share it with your team and stakeholders. You will be surprised how many critical decisions surface before any code is written. The 2 hours you spend on requirements will save 2 months of rework.