Model Deployment & Serving
Deploy ML models for production inference using Snowpark Container Services, the model registry, UDF-based serving, and batch inference patterns — all within Snowflake's secure perimeter.
Snowflake Model Registry
The Model Registry is Snowflake's centralized service for versioning, managing, and deploying ML models. It is a critical exam topic.
from snowflake.ml.registry import Registry
# Initialize the registry
registry = Registry(session=session, database_name="ML_DB", schema_name="ML_REGISTRY")
# Log a model
model_version = registry.log_model(
model_name="churn_classifier",
version_name="v1",
model=trained_model, # Snowpark ML, scikit-learn, XGBoost, etc.
conda_dependencies=["scikit-learn"],
sample_input_data=df_sample, # For signature inference
comment="Random Forest churn model, AUC=0.92"
)
# List registered models
registry.show_models()
# Get a specific model version
model_ref = registry.get_model("churn_classifier").version("v1")
# Run inference using the registered model
predictions = model_ref.run(df_new_data, function_name="predict")
log_model, get_model, and run methods.Model Registry Key Concepts
- Model name: A logical identifier for the model (e.g., "churn_classifier")
- Version: Each model can have multiple versions (v1, v2, production, staging)
- Model signature: Automatically inferred input/output schema from sample data
- Conda dependencies: Required packages are packaged with the model for reproducibility
- Inference functions: predict, predict_proba, transform — automatically generated based on the model type
- Tags and comments: Metadata for tracking experiments and model lineage
UDF-Based Model Serving
For lightweight inference, you can wrap a trained model in a UDF that runs within SQL queries.
import joblib
from snowflake.snowpark.functions import pandas_udf
from snowflake.snowpark.types import PandasSeriesType, FloatType
import pandas as pd
# Load model from stage inside a vectorized UDF
@pandas_udf(return_type=PandasSeriesType(FloatType()),
input_types=[PandasSeriesType(FloatType()),
PandasSeriesType(FloatType()),
PandasSeriesType(FloatType())],
packages=["joblib", "scikit-learn"],
imports=["@ML_MODELS/model.pkl"])
def predict_churn(spend: pd.Series, visits: pd.Series,
days_since: pd.Series) -> pd.Series:
import sys
model = joblib.load(sys.path[0] + "/model.pkl")
features = pd.DataFrame({
"SPEND": spend, "VISITS": visits, "DAYS_SINCE": days_since
})
return pd.Series(model.predict_proba(features)[:, 1])
# Use in SQL-like queries
df_scored = session.table("NEW_CUSTOMERS").select(
"CUSTOMER_ID",
predict_churn("SPEND_SCALED", "VISITS_SCALED", "DAYS_SINCE_LAST")
.alias("CHURN_PROBABILITY")
)
model_ref.run()) is simpler and handles dependency management automatically. The exam tests when to use each approach.Snowpark Container Services
Snowpark Container Services (SPCS) lets you run containerized applications (Docker) directly inside Snowflake. This is the most flexible option for complex ML serving, including GPU-accelerated inference.
Key SPCS Concepts
Compute Pool
A set of Snowflake-managed compute nodes that run your containers. You specify the instance family (CPU or GPU) and the min/max number of nodes for auto-scaling.
Service
A running containerized application within a compute pool. Services can expose HTTP endpoints for real-time inference or run as background jobs.
Image Repository
Snowflake's built-in container registry where you push Docker images. Images are stored securely and versioned within your Snowflake account.
Service Function
A SQL function that routes requests to a running service, enabling you to call containerized inference from SQL queries seamlessly.
-- Create a compute pool for ML inference
CREATE COMPUTE POOL ml_inference_pool
MIN_NODES = 1
MAX_NODES = 3
INSTANCE_FAMILY = GPU_NV_S;
-- Create a service from a container image
CREATE SERVICE ml_serving_service
IN COMPUTE POOL ml_inference_pool
FROM SPECIFICATION_FILE = 'service_spec.yaml'
MIN_INSTANCES = 1
MAX_INSTANCES = 3;
-- Create a service function for SQL access
CREATE FUNCTION predict_with_model(input VARIANT)
RETURNS VARIANT
SERVICE = ml_serving_service
ENDPOINT = 'predict'
AS '/predict';
-- Use in queries
SELECT
CUSTOMER_ID,
predict_with_model(OBJECT_CONSTRUCT(
'spend', TOTAL_SPEND,
'visits', VISIT_COUNT
)) AS PREDICTION
FROM CUSTOMERS;
Batch Inference
For scoring large datasets on a schedule, Snowflake provides several batch inference patterns.
Using Tasks for Scheduled Scoring
-- Create a task that runs batch scoring daily
CREATE OR REPLACE TASK daily_churn_scoring
WAREHOUSE = ML_WH
SCHEDULE = 'USING CRON 0 6 * * * UTC' -- 6 AM UTC daily
AS
INSERT INTO CHURN_PREDICTIONS
SELECT
CUSTOMER_ID,
churn_classifier!PREDICT(
INPUT_DATA => OBJECT_CONSTRUCT(*)
):class AS PREDICTED_CHURN,
CURRENT_TIMESTAMP() AS SCORED_AT
FROM CUSTOMER_FEATURES
WHERE LAST_ACTIVITY_DATE >= DATEADD('day', -90, CURRENT_DATE());
-- Enable the task
ALTER TASK daily_churn_scoring RESUME;
Using Stored Procedures for Batch Scoring
@sproc(packages=["snowflake-ml-python"])
def batch_score(session, input_table, output_table):
from snowflake.ml.registry import Registry
# Load model from registry
registry = Registry(session=session, database_name="ML_DB", schema_name="ML_REGISTRY")
model_ref = registry.get_model("churn_classifier").version("production")
# Score the full table
df_input = session.table(input_table)
df_scored = model_ref.run(df_input, function_name="predict_proba")
# Write results
df_scored.write.mode("overwrite").save_as_table(output_table)
return f"Scored {df_scored.count()} rows"
Model Monitoring
Production models require monitoring for data drift and performance degradation.
- Data drift detection: Compare feature distributions between training data and incoming data using SQL statistical functions
- Prediction drift: Monitor the distribution of model predictions over time for shifts
- Performance tracking: When ground truth labels become available, compute metrics against predictions and alert on degradation
- Snowflake Alerts: Use Snowflake's ALERT feature to trigger notifications when drift exceeds thresholds
-- Monitor prediction distribution drift
CREATE OR REPLACE ALERT prediction_drift_alert
WAREHOUSE = ML_WH
SCHEDULE = 'USING CRON 0 8 * * * UTC'
IF (EXISTS (
SELECT 1 FROM (
SELECT
AVG(CHURN_PROBABILITY) AS AVG_SCORE,
STDDEV(CHURN_PROBABILITY) AS STD_SCORE
FROM CHURN_PREDICTIONS
WHERE SCORED_AT >= DATEADD('day', -1, CURRENT_TIMESTAMP())
)
WHERE AVG_SCORE > 0.5 OR STD_SCORE > 0.3 -- Drift thresholds
))
THEN
CALL SYSTEM$SEND_EMAIL('ml-team@company.com',
'Prediction Drift Alert',
'Churn model predictions have drifted beyond thresholds.');
Practice Questions
Question 1
A) Deploy using Snowpark Container Services with a GPU compute pool
B) Register the model in the Model Registry and use model_ref.run()
C) Wrap the model in a vectorized UDF with the model file imported from a stage
D) Export the model and deploy on an external REST API
Answer: B — The Model Registry is the simplest approach for deploying a scikit-learn model for inference. It automatically handles packaging, dependency management, and creates inference functions callable from SQL. A vectorized UDF (C) also works but requires more manual setup. SPCS (A) is overkill for a lightweight model. External deployment (D) moves data outside Snowflake.
Question 2
A) When the model is a simple scikit-learn classifier
B) When you need GPU-accelerated inference for a large deep learning model
C) When you only need batch scoring once per week
D) When the model has no external dependencies
Answer: B — Snowpark Container Services is the right choice when you need GPU compute, custom container environments, or complex serving logic that exceeds UDF capabilities. Large deep learning models (PyTorch, TensorFlow) often require GPU acceleration and custom serving frameworks that only SPCS can provide within Snowflake.
Question 3
A) A Snowflake Stream
B) A Snowflake Task with a CRON schedule
C) A manual stored procedure call each morning
D) A Snowflake Alert
Answer: B — Snowflake Tasks with CRON schedules are designed for automated, recurring jobs. The task can call a stored procedure or execute SQL that scores the table using the model registry or a UDF. Streams (A) are for change data capture, not scheduling. Manual calls (C) are not automated. Alerts (D) are for conditional notifications, not scheduled computation.
Question 4
A) model_ref.predict(df)
B) model_ref.run(df, function_name="predict")
C) model_ref.score(df)
D) model_ref.infer(df)
Answer: B — The
run() method on a model version reference is the correct way to execute inference. You specify the function_name parameter ("predict", "predict_proba", "transform") depending on what the model supports. The other method names are not part of the Model Registry API.
Lilly Tech Systems