Advanced

Model Deployment & Serving

Deploy ML models for production inference using Snowpark Container Services, the model registry, UDF-based serving, and batch inference patterns — all within Snowflake's secure perimeter.

Snowflake Model Registry

The Model Registry is Snowflake's centralized service for versioning, managing, and deploying ML models. It is a critical exam topic.

from snowflake.ml.registry import Registry

# Initialize the registry
registry = Registry(session=session, database_name="ML_DB", schema_name="ML_REGISTRY")

# Log a model
model_version = registry.log_model(
    model_name="churn_classifier",
    version_name="v1",
    model=trained_model,           # Snowpark ML, scikit-learn, XGBoost, etc.
    conda_dependencies=["scikit-learn"],
    sample_input_data=df_sample,   # For signature inference
    comment="Random Forest churn model, AUC=0.92"
)

# List registered models
registry.show_models()

# Get a specific model version
model_ref = registry.get_model("churn_classifier").version("v1")

# Run inference using the registered model
predictions = model_ref.run(df_new_data, function_name="predict")
💡
Exam focus: The model registry supports multiple frameworks (Snowpark ML, scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, Hugging Face). When you log a model, Snowflake automatically packages it with dependencies and creates inference functions. Know the log_model, get_model, and run methods.

Model Registry Key Concepts

  • Model name: A logical identifier for the model (e.g., "churn_classifier")
  • Version: Each model can have multiple versions (v1, v2, production, staging)
  • Model signature: Automatically inferred input/output schema from sample data
  • Conda dependencies: Required packages are packaged with the model for reproducibility
  • Inference functions: predict, predict_proba, transform — automatically generated based on the model type
  • Tags and comments: Metadata for tracking experiments and model lineage

UDF-Based Model Serving

For lightweight inference, you can wrap a trained model in a UDF that runs within SQL queries.

import joblib
from snowflake.snowpark.functions import pandas_udf
from snowflake.snowpark.types import PandasSeriesType, FloatType
import pandas as pd

# Load model from stage inside a vectorized UDF
@pandas_udf(return_type=PandasSeriesType(FloatType()),
            input_types=[PandasSeriesType(FloatType()),
                         PandasSeriesType(FloatType()),
                         PandasSeriesType(FloatType())],
            packages=["joblib", "scikit-learn"],
            imports=["@ML_MODELS/model.pkl"])
def predict_churn(spend: pd.Series, visits: pd.Series,
                  days_since: pd.Series) -> pd.Series:
    import sys
    model = joblib.load(sys.path[0] + "/model.pkl")
    features = pd.DataFrame({
        "SPEND": spend, "VISITS": visits, "DAYS_SINCE": days_since
    })
    return pd.Series(model.predict_proba(features)[:, 1])

# Use in SQL-like queries
df_scored = session.table("NEW_CUSTOMERS").select(
    "CUSTOMER_ID",
    predict_churn("SPEND_SCALED", "VISITS_SCALED", "DAYS_SINCE_LAST")
        .alias("CHURN_PROBABILITY")
)
📚
UDF vs. Model Registry inference: UDF-based serving gives you full control over the inference logic and is best for custom preprocessing or multi-model ensembles. Model Registry inference (model_ref.run()) is simpler and handles dependency management automatically. The exam tests when to use each approach.

Snowpark Container Services

Snowpark Container Services (SPCS) lets you run containerized applications (Docker) directly inside Snowflake. This is the most flexible option for complex ML serving, including GPU-accelerated inference.

Key SPCS Concepts

Compute Pool

A set of Snowflake-managed compute nodes that run your containers. You specify the instance family (CPU or GPU) and the min/max number of nodes for auto-scaling.

Service

A running containerized application within a compute pool. Services can expose HTTP endpoints for real-time inference or run as background jobs.

Image Repository

Snowflake's built-in container registry where you push Docker images. Images are stored securely and versioned within your Snowflake account.

Service Function

A SQL function that routes requests to a running service, enabling you to call containerized inference from SQL queries seamlessly.

-- Create a compute pool for ML inference
CREATE COMPUTE POOL ml_inference_pool
    MIN_NODES = 1
    MAX_NODES = 3
    INSTANCE_FAMILY = GPU_NV_S;

-- Create a service from a container image
CREATE SERVICE ml_serving_service
    IN COMPUTE POOL ml_inference_pool
    FROM SPECIFICATION_FILE = 'service_spec.yaml'
    MIN_INSTANCES = 1
    MAX_INSTANCES = 3;

-- Create a service function for SQL access
CREATE FUNCTION predict_with_model(input VARIANT)
    RETURNS VARIANT
    SERVICE = ml_serving_service
    ENDPOINT = 'predict'
    AS '/predict';

-- Use in queries
SELECT
    CUSTOMER_ID,
    predict_with_model(OBJECT_CONSTRUCT(
        'spend', TOTAL_SPEND,
        'visits', VISIT_COUNT
    )) AS PREDICTION
FROM CUSTOMERS;
Exam tip: SPCS is the answer when the question mentions GPU inference, custom Docker containers, complex serving logic, or models that cannot run in a UDF (e.g., large deep learning models). For simple scikit-learn or XGBoost models, UDFs or Model Registry inference are simpler and sufficient.

Batch Inference

For scoring large datasets on a schedule, Snowflake provides several batch inference patterns.

Using Tasks for Scheduled Scoring

-- Create a task that runs batch scoring daily
CREATE OR REPLACE TASK daily_churn_scoring
    WAREHOUSE = ML_WH
    SCHEDULE = 'USING CRON 0 6 * * * UTC'  -- 6 AM UTC daily
AS
    INSERT INTO CHURN_PREDICTIONS
    SELECT
        CUSTOMER_ID,
        churn_classifier!PREDICT(
            INPUT_DATA => OBJECT_CONSTRUCT(*)
        ):class AS PREDICTED_CHURN,
        CURRENT_TIMESTAMP() AS SCORED_AT
    FROM CUSTOMER_FEATURES
    WHERE LAST_ACTIVITY_DATE >= DATEADD('day', -90, CURRENT_DATE());

-- Enable the task
ALTER TASK daily_churn_scoring RESUME;

Using Stored Procedures for Batch Scoring

@sproc(packages=["snowflake-ml-python"])
def batch_score(session, input_table, output_table):
    from snowflake.ml.registry import Registry

    # Load model from registry
    registry = Registry(session=session, database_name="ML_DB", schema_name="ML_REGISTRY")
    model_ref = registry.get_model("churn_classifier").version("production")

    # Score the full table
    df_input = session.table(input_table)
    df_scored = model_ref.run(df_input, function_name="predict_proba")

    # Write results
    df_scored.write.mode("overwrite").save_as_table(output_table)
    return f"Scored {df_scored.count()} rows"

Model Monitoring

Production models require monitoring for data drift and performance degradation.

  • Data drift detection: Compare feature distributions between training data and incoming data using SQL statistical functions
  • Prediction drift: Monitor the distribution of model predictions over time for shifts
  • Performance tracking: When ground truth labels become available, compute metrics against predictions and alert on degradation
  • Snowflake Alerts: Use Snowflake's ALERT feature to trigger notifications when drift exceeds thresholds
-- Monitor prediction distribution drift
CREATE OR REPLACE ALERT prediction_drift_alert
    WAREHOUSE = ML_WH
    SCHEDULE = 'USING CRON 0 8 * * * UTC'
    IF (EXISTS (
        SELECT 1 FROM (
            SELECT
                AVG(CHURN_PROBABILITY) AS AVG_SCORE,
                STDDEV(CHURN_PROBABILITY) AS STD_SCORE
            FROM CHURN_PREDICTIONS
            WHERE SCORED_AT >= DATEADD('day', -1, CURRENT_TIMESTAMP())
        )
        WHERE AVG_SCORE > 0.5 OR STD_SCORE > 0.3  -- Drift thresholds
    ))
    THEN
        CALL SYSTEM$SEND_EMAIL('ml-team@company.com',
            'Prediction Drift Alert',
            'Churn model predictions have drifted beyond thresholds.');

Practice Questions

Question 1

Q1
A team has trained a scikit-learn model and wants to deploy it for real-time inference within SQL queries. The model is lightweight (50MB) and processes single rows. Which deployment approach is most appropriate?

A) Deploy using Snowpark Container Services with a GPU compute pool
B) Register the model in the Model Registry and use model_ref.run()
C) Wrap the model in a vectorized UDF with the model file imported from a stage
D) Export the model and deploy on an external REST API

Answer: B — The Model Registry is the simplest approach for deploying a scikit-learn model for inference. It automatically handles packaging, dependency management, and creates inference functions callable from SQL. A vectorized UDF (C) also works but requires more manual setup. SPCS (A) is overkill for a lightweight model. External deployment (D) moves data outside Snowflake.

Question 2

Q2
When should you use Snowpark Container Services instead of a UDF for model serving?

A) When the model is a simple scikit-learn classifier
B) When you need GPU-accelerated inference for a large deep learning model
C) When you only need batch scoring once per week
D) When the model has no external dependencies

Answer: B — Snowpark Container Services is the right choice when you need GPU compute, custom container environments, or complex serving logic that exceeds UDF capabilities. Large deep learning models (PyTorch, TensorFlow) often require GPU acceleration and custom serving frameworks that only SPCS can provide within Snowflake.

Question 3

Q3
A company needs to score their entire customer table (10 million rows) every morning at 6 AM with a churn prediction model. Which Snowflake feature should they use to automate this?

A) A Snowflake Stream
B) A Snowflake Task with a CRON schedule
C) A manual stored procedure call each morning
D) A Snowflake Alert

Answer: B — Snowflake Tasks with CRON schedules are designed for automated, recurring jobs. The task can call a stored procedure or execute SQL that scores the table using the model registry or a UDF. Streams (A) are for change data capture, not scheduling. Manual calls (C) are not automated. Alerts (D) are for conditional notifications, not scheduled computation.

Question 4

Q4
Which Model Registry method is used to run inference on new data using a registered model?

A) model_ref.predict(df)
B) model_ref.run(df, function_name="predict")
C) model_ref.score(df)
D) model_ref.infer(df)

Answer: B — The run() method on a model version reference is the correct way to execute inference. You specify the function_name parameter ("predict", "predict_proba", "transform") depending on what the model supports. The other method names are not part of the Model Registry API.