ML Backend Integration
Connect machine learning models to Label Studio for pre-annotations, active learning, and ML-assisted labeling that dramatically speeds up the annotation process.
What is ML-Assisted Labeling?
ML-assisted labeling uses a trained model to generate initial annotations (pre-annotations) that human annotators then verify and correct. This approach can reduce labeling time by 50-80% compared to manual annotation from scratch.
# ML-Assisted Labeling Workflow 1. Train Initial Model → Train on small manually-labeled dataset 2. Generate Pre-annotations → Model predicts labels for unlabeled data 3. Human Review → Annotators verify/correct predictions 4. Retrain Model → Model improves with more labeled data 5. Repeat → Each iteration gets faster and more accurate
Setting Up the ML Backend SDK
Label Studio provides a Python SDK for creating ML backends. Install it and scaffold a new backend:
# Install the ML backend SDK pip install label-studio-ml # Create a new ML backend project label-studio-ml create my_ml_backend # Project structure: my_ml_backend/ model.py # Your ML model code _wsgi.py # WSGI entry point requirements.txt # Dependencies docker-compose.yml
Building a Custom ML Backend
Create a model class that extends LabelStudioMLBase. You need to implement two methods: predict() and optionally fit():
from label_studio_ml.model import LabelStudioMLBase class MyMLBackend(LabelStudioMLBase): def setup(self): """Initialize your model here.""" # Load a pre-trained model self.model = self.load_model() def predict(self, tasks, **kwargs): """Generate predictions for tasks.""" predictions = [] for task in tasks: # Get the data from the task text = task["data"]["text"] # Run your model result = self.model.predict(text) # Format as Label Studio annotation predictions.append({ "result": [{ "from_name": "sentiment", "to_name": "text", "type": "choices", "value": { "choices": [result["label"]] } }], "score": result["confidence"] }) return predictions def fit(self, event, data, **kwargs): """Train your model on new annotations.""" # Called when annotations are created/updated annotations = data["annotation"] # Retrain or fine-tune your model self.model.train(annotations) return {"model_version": "v2"}
Running the ML Backend
# Start the ML backend server label-studio-ml start my_ml_backend -p 9090 # Or with Docker docker-compose up -d # Connect to Label Studio: # 1. Go to Project Settings > Machine Learning # 2. Add ML Backend URL: http://localhost:9090 # 3. Enable "Use for interactive preannotations"
Pre-built ML Backends
Label Studio provides several pre-built ML backends for common tasks:
Object Detection
YOLOv5/v8 backend for generating bounding box pre-annotations on images. Supports custom-trained YOLO models.
NER with Transformers
HuggingFace Transformers backend for named entity recognition. Uses BERT, RoBERTa, or other pre-trained models.
LLM Backend
Use GPT-4, Claude, or other LLMs for text classification, sentiment analysis, and text generation pre-annotations.
SAM (Segment Anything)
Meta's Segment Anything Model for interactive segmentation. Click on objects to generate precise masks.
Active Learning
Active learning selects the most informative samples for labeling, maximizing model improvement per labeled sample:
# Active learning: prioritize uncertain predictions def predict(self, tasks, **kwargs): predictions = [] for task in tasks: result = self.model.predict(task["data"]) # Lower score = more uncertain = higher priority confidence = result["confidence"] predictions.append({ "result": [...], "score": confidence # Label Studio sorts by score }) return predictions # In Label Studio settings: # Set task ordering to "Predictions score" ascending # This surfaces the most uncertain samples first
What's Next?
In the final lesson, we will cover best practices for team workflows, quality control, export strategies, and scaling your annotation pipeline.
Lilly Tech Systems