Advanced

ML Backend Integration

Connect machine learning models to Label Studio for pre-annotations, active learning, and ML-assisted labeling that dramatically speeds up the annotation process.

What is ML-Assisted Labeling?

ML-assisted labeling uses a trained model to generate initial annotations (pre-annotations) that human annotators then verify and correct. This approach can reduce labeling time by 50-80% compared to manual annotation from scratch.

ML-Assisted Workflow
# ML-Assisted Labeling Workflow

1. Train Initial Model
   → Train on small manually-labeled dataset

2. Generate Pre-annotations
   → Model predicts labels for unlabeled data

3. Human Review
   → Annotators verify/correct predictions

4. Retrain Model
   → Model improves with more labeled data

5. Repeat
   → Each iteration gets faster and more accurate

Setting Up the ML Backend SDK

Label Studio provides a Python SDK for creating ML backends. Install it and scaffold a new backend:

Bash
# Install the ML backend SDK
pip install label-studio-ml

# Create a new ML backend project
label-studio-ml create my_ml_backend

# Project structure:
my_ml_backend/
  model.py          # Your ML model code
  _wsgi.py          # WSGI entry point
  requirements.txt  # Dependencies
  docker-compose.yml

Building a Custom ML Backend

Create a model class that extends LabelStudioMLBase. You need to implement two methods: predict() and optionally fit():

Python - model.py
from label_studio_ml.model import LabelStudioMLBase

class MyMLBackend(LabelStudioMLBase):
    def setup(self):
        """Initialize your model here."""
        # Load a pre-trained model
        self.model = self.load_model()

    def predict(self, tasks, **kwargs):
        """Generate predictions for tasks."""
        predictions = []
        for task in tasks:
            # Get the data from the task
            text = task["data"]["text"]

            # Run your model
            result = self.model.predict(text)

            # Format as Label Studio annotation
            predictions.append({
                "result": [{
                    "from_name": "sentiment",
                    "to_name": "text",
                    "type": "choices",
                    "value": {
                        "choices": [result["label"]]
                    }
                }],
                "score": result["confidence"]
            })
        return predictions

    def fit(self, event, data, **kwargs):
        """Train your model on new annotations."""
        # Called when annotations are created/updated
        annotations = data["annotation"]
        # Retrain or fine-tune your model
        self.model.train(annotations)
        return {"model_version": "v2"}

Running the ML Backend

Bash
# Start the ML backend server
label-studio-ml start my_ml_backend -p 9090

# Or with Docker
docker-compose up -d

# Connect to Label Studio:
# 1. Go to Project Settings > Machine Learning
# 2. Add ML Backend URL: http://localhost:9090
# 3. Enable "Use for interactive preannotations"

Pre-built ML Backends

Label Studio provides several pre-built ML backends for common tasks:

🖼

Object Detection

YOLOv5/v8 backend for generating bounding box pre-annotations on images. Supports custom-trained YOLO models.

📝

NER with Transformers

HuggingFace Transformers backend for named entity recognition. Uses BERT, RoBERTa, or other pre-trained models.

🤖

LLM Backend

Use GPT-4, Claude, or other LLMs for text classification, sentiment analysis, and text generation pre-annotations.

🖼

SAM (Segment Anything)

Meta's Segment Anything Model for interactive segmentation. Click on objects to generate precise masks.

Active Learning

Active learning selects the most informative samples for labeling, maximizing model improvement per labeled sample:

Python - Active Learning Strategy
# Active learning: prioritize uncertain predictions
def predict(self, tasks, **kwargs):
    predictions = []
    for task in tasks:
        result = self.model.predict(task["data"])

        # Lower score = more uncertain = higher priority
        confidence = result["confidence"]
        predictions.append({
            "result": [...],
            "score": confidence  # Label Studio sorts by score
        })
    return predictions

# In Label Studio settings:
# Set task ordering to "Predictions score" ascending
# This surfaces the most uncertain samples first
Active learning strategy: Start by manually labeling 100-200 samples. Train a model, generate predictions with confidence scores, then have annotators focus on low-confidence predictions. Retrain every 200-500 new annotations.
Pre-annotation bias: Be careful with ML-assisted labeling — annotators may accept incorrect pre-annotations without careful review. Include some tasks without pre-annotations to measure annotator attention and prevent confirmation bias.

What's Next?

In the final lesson, we will cover best practices for team workflows, quality control, export strategies, and scaling your annotation pipeline.