AWS Neuron SDK
Master the AWS Neuron SDK — the software stack that enables you to compile, optimize, and run ML models on Inferentia and Trainium hardware.
Neuron SDK Components
The Neuron SDK is a comprehensive software stack with several key components:
Neuron Compiler
Compiles ML models into optimized instructions for NeuronCores. Handles graph optimization, operator fusion, and memory allocation.
Neuron Runtime
Manages model execution on NeuronCores, handles memory management, and provides multi-model support on a single instance.
Neuron Tools
Monitoring and profiling utilities including neuron-top, neuron-monitor, and neuron-profile for performance analysis.
Framework Integrations
Native integrations with PyTorch (torch-neuronx), TensorFlow (tensorflow-neuronx), and JAX for seamless development.
PyTorch Integration (torch-neuronx)
The most popular framework integration. Key APIs for inference:
import torch
import torch_neuronx
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Create example input for compilation
example_input = tokenizer("Hello world", return_tensors="pt")
# Compile for Neuron (trace the model)
model_neuron = torch_neuronx.trace(model, example_input["input_ids"])
# Save compiled model
torch.jit.save(model_neuron, "model_neuron.pt")
# Load and run inference
model_loaded = torch.jit.load("model_neuron.pt")
output = model_loaded(example_input["input_ids"])
Training with Neuron
import torch
import torch_xla.core.xla_model as xm
from torch_neuronx.distributed import parallel_model_trace
# Get the Neuron device
device = xm.xla_device()
# Move model and data to Neuron
model = model.to(device)
inputs = inputs.to(device)
# Training loop (similar to standard PyTorch)
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
for epoch in range(num_epochs):
outputs = model(**inputs)
loss = outputs.loss
loss.backward()
xm.optimizer_step(optimizer) # Neuron-aware optimizer step
optimizer.zero_grad()
Neuron Monitoring Tools
| Tool | Purpose | Usage |
|---|---|---|
| neuron-top | Real-time NeuronCore utilization | neuron-top |
| neuron-monitor | JSON metrics for CloudWatch integration | neuron-monitor |
| neuron-profile | Detailed execution profiling | neuron-profile capture |
| neuron-ls | List available Neuron devices | neuron-ls |
neuron_parallel_compile to speed up compilation by compiling multiple subgraphs in parallel.
Lilly Tech Systems