Requirements Analysis for AI Systems
Before writing any code or choosing any framework, you need to define what your AI system must do, how fast, at what scale, and at what cost. This lesson gives you a concrete framework for AI system requirements with real numbers.
Functional vs. Non-Functional Requirements
AI systems have the same requirement categories as traditional systems, but with AI-specific dimensions that most teams miss.
Functional Requirements (What the System Does)
Input/Output Contract
Define exactly what goes in and what comes out. For an LLM chatbot: input is a conversation history (max 8K tokens), output is a response (max 2K tokens). For a fraud detector: input is a transaction object (amount, merchant, user features), output is a risk score 0–1 and a binary decision.
Quality Requirements
What accuracy is acceptable? A spam filter at 99.5% accuracy is fine. A medical diagnosis system at 99.5% accuracy might not be. Define your metrics: accuracy, precision, recall, F1, NDCG, BLEU, or business-specific metrics like click-through rate.
Model Update Frequency
How often does the model need to reflect new data? Real-time learning (fraud detection), daily retraining (recommendations), weekly (content moderation), or quarterly (document classification)?
Explainability Requirements
Does the system need to explain its decisions? Regulated industries (finance, healthcare) often require it. This constrains your model choices — a gradient-boosted tree is more explainable than a deep neural network.
Non-Functional Requirements (How the System Performs)
These are where AI systems diverge most from traditional systems. Get these wrong and you will build something that works in a notebook but fails in production.
Latency Budgets
AI inference is slow compared to traditional API calls. You need to budget latency carefully across the request path.
| Use Case | p50 Target | p95 Target | p99 Target | Why |
|---|---|---|---|---|
| Search autocomplete | 10ms | 30ms | 50ms | User is typing — must feel instant |
| Product recommendations | 30ms | 80ms | 150ms | Below page load budget, not blocking |
| Fraud detection | 20ms | 50ms | 100ms | Must decide before transaction completes |
| Chatbot response | 500ms | 2s | 5s | Users expect some thinking time |
| Image generation | 5s | 15s | 30s | Users expect to wait, but not forever |
| Document processing | 10s | 30s | 60s | Async — user submits and checks back |
Latency Budget Breakdown Example
For a product recommendation API with a 150ms p99 budget:
Total budget: 150ms (p99)
Breakdown:
Network (client to LB): 5ms
Load balancer: 2ms
API gateway (auth, rate limit): 10ms
Feature store lookup (Redis): 10ms # p99 for Redis GET
Model inference (GPU): 80ms # The expensive part
Post-processing (business rules): 10ms
Serialization + response: 5ms
Network (LB to client): 5ms
Buffer for variance: 23ms
------
Total: 150ms
Key constraint: Model inference gets 80ms.
This means:
- Model must be small enough to infer in 80ms on target GPU
- Or use model distillation / quantization to meet budget
- Or use batch precomputation and serve from cache
Throughput Estimation
Calculate your queries per second (QPS) and plan capacity accordingly. Here is a concrete example.
# Throughput estimation for an e-commerce recommendation system
# Step 1: Daily active users
daily_active_users = 2_000_000
# Step 2: Requests per user per day
# Homepage load (1) + category pages (3) + product pages (5) + cart (1)
requests_per_user_per_day = 10
# Step 3: Total daily requests
daily_requests = daily_active_users * requests_per_user_per_day # 20M
# Step 4: Average QPS
avg_qps = daily_requests / 86_400 # ~231 QPS
# Step 5: Peak QPS (typically 3-5x average)
peak_qps = avg_qps * 4 # ~925 QPS
# Step 6: Design QPS (add 50% headroom for growth + spikes)
design_qps = peak_qps * 1.5 # ~1,388 QPS → round to 1,500 QPS
# Step 7: GPU capacity planning
inference_time_per_request = 0.025 # 25ms on A10G GPU
requests_per_gpu_per_second = 1 / inference_time_per_request # 40 QPS per GPU
# With batching (batch size 8, 60ms per batch):
batched_throughput = 8 / 0.060 # ~133 QPS per GPU
# GPUs needed for design QPS:
gpus_needed = design_qps / batched_throughput # ~11.3 → 12 GPUs
# With redundancy (N+2 for fault tolerance):
total_gpus = 12 + 2 # 14 GPUs
# Monthly cost at $0.75/hr per A10G (AWS spot):
monthly_cost = total_gpus * 0.75 * 24 * 30 # $7,560/month
Data Volume Estimation
AI systems generate and consume far more data than most teams expect. Plan storage and processing capacity for these categories:
| Data Category | Example Size | Growth Rate | Retention |
|---|---|---|---|
| Training data | 500GB of labeled examples | +50GB/month (new labels) | Indefinite (versioned) |
| Feature data | 200GB in feature store | +10GB/month | Rolling 90 days online, 2 years offline |
| Inference logs | 2TB/month at 1,500 QPS | Grows with traffic | 30 days hot, 1 year cold |
| Model artifacts | 2GB per model version | 1–4 versions/month | Last 10 versions |
| Experiment data | 50GB (metrics, hyperparams, runs) | +5GB/month | Indefinite |
| Embeddings | 10M items × 768 dims × 4 bytes = 30GB | Grows with catalog | Current + 1 previous version |
Cost Constraints
AI infrastructure costs are the number one surprise for teams moving from prototype to production. Establish budgets early.
GPU Compute
Training: A single training run on 8x A100 GPUs can cost $500–$5,000. Monthly retraining: $6K–$60K/year. Inference: $0.50–$3.00/hr per GPU. At 14 GPUs, that is $5K–$30K/month.
API Costs
If using third-party APIs (OpenAI, Anthropic): GPT-4o at $2.50/1M input tokens. At 1,500 QPS with 500 tokens/request, that is ~$2.7M input tokens/day = $6,750/day. This is why many teams self-host.
Data Processing
Feature engineering, ETL pipelines, data validation. Typically 20–30% of total AI infrastructure cost. Spark/Dataflow clusters for batch processing: $2K–$10K/month.
Storage
Training data in S3/GCS: $0.023/GB/month. Feature store (Redis): $0.10–$0.50/GB/month. Inference logs at 2TB/month: $50–$200/month depending on tier.
AI System Requirements Document Template
Use this template at the start of every AI project. Fill it out before writing any code.
# AI System Requirements Document
# Project: [Name]
# Date: [Date]
# Author: [Name]
## 1. Problem Statement
- Business problem being solved:
- Current solution (if any):
- Why AI is needed (what rules/heuristics cannot do):
- Success criteria (business metric):
## 2. Functional Requirements
- Input format and constraints:
- Output format and constraints:
- Quality metric and target (e.g., precision > 0.95):
- Explainability requirements (yes/no, what level):
- Model update frequency:
- Supported languages/regions:
## 3. Latency Requirements
- p50 target: ___ms
- p95 target: ___ms
- p99 target: ___ms
- Is streaming response acceptable? (yes/no)
- Is async processing acceptable? (yes/no)
## 4. Throughput Requirements
- Daily active users: ___
- Requests per user per day: ___
- Average QPS: ___
- Peak QPS: ___
- Design QPS (with headroom): ___
## 5. Data Requirements
- Training data source(s):
- Training data size (current):
- Training data growth rate:
- Feature data sources:
- Real-time features needed? (yes/no)
- Data retention policy:
## 6. Cost Constraints
- Monthly infrastructure budget: $___
- Cost per inference request target: $___
- Training budget per run: $___
- Build vs. buy preference:
## 7. Reliability Requirements
- Uptime SLA: ___% (e.g., 99.9%)
- Acceptable fallback behavior:
- Maximum acceptable data staleness:
- Disaster recovery requirements:
## 8. Compliance and Security
- Data privacy requirements (GDPR, CCPA, HIPAA):
- Model audit requirements:
- PII handling:
- Access control requirements:
## 9. Team and Timeline
- Team size and skills:
- MVP timeline:
- Production timeline:
- Maintenance ownership:
Lilly Tech Systems