Intermediate

BentoML

Package and deploy ML models with BentoML for developer-friendly containerized serving, adaptive batching, and seamless Kubernetes deployment.

Defining a BentoML Service

# service.py
import bentoml
from bentoml.io import JSON, NumpyNdarray
import numpy as np

model_ref = bentoml.sklearn.get("iris_classifier:latest")
model_runner = model_ref.to_runner()

svc = bentoml.Service("iris_classifier", runners=[model_runner])

@svc.api(input=NumpyNdarray(), output=JSON())
async def classify(input_data: np.ndarray) -> dict:
    prediction = await model_runner.predict.async_run(input_data)
    return {"prediction": prediction.tolist()}

Building and Containerizing

# bentofile.yaml
service: "service:svc"
include:
  - "*.py"
python:
  packages:
    - scikit-learn
    - numpy
docker:
  python_version: "3.11"

# Build and containerize
bentoml build
bentoml containerize iris_classifier:latest --image-tag my-registry/iris:v1
docker push my-registry/iris:v1

Deploying to Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iris-classifier
spec:
  replicas: 3
  selector:
    matchLabels:
      app: iris-classifier
  template:
    metadata:
      labels:
        app: iris-classifier
    spec:
      containers:
      - name: bento-service
        image: my-registry/iris:v1
        ports:
        - containerPort: 3000
        resources:
          limits:
            cpu: "2"
            memory: "4Gi"
        readinessProbe:
          httpGet:
            path: /healthz
            port: 3000
---
apiVersion: v1
kind: Service
metadata:
  name: iris-classifier-svc
spec:
  selector:
    app: iris-classifier
  ports:
  - port: 80
    targetPort: 3000

✅

BentoML advantage: BentoML provides the best developer experience for model packaging. It handles dependency management, containerization, and API generation automatically. Use it when you want a fast path from trained model to production endpoint.

← Previous Seldon Core Next → Best Practices