Intermediate
GraphQL for AI APIs
Build flexible, type-safe AI APIs with GraphQL — from schema design and resolvers to subscriptions for real-time AI features.
Why GraphQL for AI?
GraphQL offers advantages for complex AI applications where clients need different views of the same data:
- Flexible queries: Clients request exactly the fields they need, reducing payload sizes for mobile and edge clients.
- Type safety: Strong schema types catch errors at development time, not in production.
- Single endpoint: One endpoint serves multiple AI capabilities, simplifying client code.
- Subscriptions: Built-in real-time support for streaming AI results to clients.
- Introspection: Self-documenting schema that generates documentation automatically.
Schema Design for AI
type Query {
models: [Model!]!
model(id: ID!): Model
predict(input: PredictionInput!): PredictionResult!
}
type Mutation {
createFineTuneJob(input: FineTuneInput!): FineTuneJob!
cancelJob(jobId: ID!): FineTuneJob!
}
type Subscription {
streamCompletion(input: CompletionInput!): CompletionChunk!
jobStatus(jobId: ID!): JobStatusUpdate!
}
type Model {
id: ID!
name: String!
version: String!
capabilities: [String!]!
maxTokens: Int!
pricing: Pricing!
}
input PredictionInput {
modelId: ID!
text: String!
temperature: Float = 0.7
maxTokens: Int = 256
}
type PredictionResult {
id: ID!
text: String!
confidence: Float!
tokens: TokenUsage!
model: Model!
latencyMs: Int!
}
type TokenUsage {
input: Int!
output: Int!
total: Int!
}
Design schemas around capabilities, not models: Instead of
gpt4Predict and claudePredict, use a single predict query with a modelId parameter. This lets you add new models without changing the schema.Implementing Resolvers
import strawberry
from strawberry.types import Info
@strawberry.type
class Query:
@strawberry.field
async def predict(self, input: PredictionInput, info: Info) -> PredictionResult:
# Validate input
model = await get_model(input.model_id)
if not model:
raise ValueError(f"Model {input.model_id} not found")
# Run inference
start_time = time.time()
result = await model.predict(
text=input.text,
temperature=input.temperature,
max_tokens=input.max_tokens,
)
latency = int((time.time() - start_time) * 1000)
# Track usage
await track_usage(info.context.user, result.token_usage)
return PredictionResult(
id=generate_id(),
text=result.output,
confidence=result.confidence,
tokens=result.token_usage,
model=model,
latency_ms=latency,
)
GraphQL Subscriptions for Streaming
Use subscriptions to stream AI model output token by token:
@strawberry.type
class Subscription:
@strawberry.subscription
async def stream_completion(
self, input: CompletionInput
) -> AsyncGenerator[CompletionChunk, None]:
model = await get_model(input.model_id)
async for chunk in model.stream(
text=input.text,
temperature=input.temperature,
):
yield CompletionChunk(
id=chunk.id,
delta=chunk.text,
finish_reason=chunk.finish_reason,
index=chunk.index,
)
GraphQL vs REST for AI APIs
| Criterion | Choose REST | Choose GraphQL |
|---|---|---|
| API complexity | Single model, simple interface | Multiple models, complex queries |
| Clients | Diverse (curl, any language) | Modern frontends (React, mobile) |
| Caching | HTTP caching is straightforward | Requires Apollo/Relay caching |
| Team familiarity | Universal knowledge | Requires GraphQL expertise |
| File uploads | Native multipart support | Requires extra configuration |
Consider a hybrid approach: Many production AI platforms offer both REST and GraphQL. Use REST for simple prediction endpoints and GraphQL for complex management and analytics queries. This gives developers the choice that best fits their use case.