Advanced

Best Practices

Learn team workflow management, quality control strategies, inter-annotator agreement, choosing export formats, and scaling your annotation pipeline for production ML projects.

Annotation Guidelines

The single most important factor for annotation quality is clear, comprehensive guidelines. Your guidelines should include:

Definitions: Precise definition of each label category with boundary cases
Examples: Positive and negative examples for each label
Edge cases: How to handle ambiguous situations
Conventions: Whether to include punctuation in NER spans, how tight bounding boxes should be, etc.
Versioning: Track guideline changes and re-annotate affected tasks when definitions change

✅

Pilot annotation round: Before starting a large project, have 2-3 annotators label 50-100 samples independently. Compare their annotations, discuss disagreements, and refine your guidelines based on the findings.

Quality Control Strategies

Overlap / Redundancy

Have multiple annotators label the same tasks (typically 2-3x overlap). Use agreement metrics to identify problematic tasks and annotators who need retraining.
Review Workflow

Assign senior annotators or domain experts as reviewers. They approve, reject, or correct annotations before they enter the training dataset.
Gold Standard Tasks

Mix in pre-labeled "gold" tasks that you know the correct answer for. Monitor annotator accuracy on these to detect quality drops.
Spot Checks

Randomly sample completed annotations for manual review. Calculate per-annotator accuracy and provide feedback.

Inter-Annotator Agreement

Measure how consistently your annotators label the same data. Common metrics include:

Metric	Use Case	Range
Cohen's Kappa	Two annotators, categorical labels	-1 to 1 (>0.8 = excellent)
Fleiss' Kappa	Multiple annotators, categorical labels	-1 to 1 (>0.6 = good)
IoU (Jaccard)	Bounding boxes, segmentation	0 to 1 (>0.7 = good)
F1 Score	NER span matching	0 to 1 (>0.8 = good)

Export Format Selection

Choose your export format based on your ML framework and task type:

Export Format Guide

# Object Detection
YOLO       → YOLOv5/v8 training
COCO       → Detectron2, MMDetection
Pascal VOC → TensorFlow Object Detection API

# NLP / Text
spaCy      → spaCy NER training
CoNLL      → Sequence labeling (CRF, BiLSTM)
JSON       → Custom pipelines, HuggingFace

# General
JSON       → Most flexible, full annotation data
JSON-MIN   → Simplified, smaller file size
CSV        → Classification tasks, spreadsheets

Using the API

Automate your annotation pipeline with the Label Studio API:

Python - Label Studio SDK

from label_studio_sdk import Client

# Connect to Label Studio
ls = Client(
    url="http://localhost:8080",
    api_key="your-api-key"
)

# Create a project
project = ls.start_project(
    title="NER Project",
    label_config="""
    <View>
      <Labels name="ner" toName="text">
        <Label value="Person" />
        <Label value="Organization" />
      </Labels>
      <Text name="text" value="$text" />
    </View>
    """
)

# Import tasks
project.import_tasks([
    {"text": "John works at Google."},
    {"text": "Mary founded Acme Corp."},
])

# Export annotations
annotations = project.export_tasks(
    export_type="JSON"
)

Scaling Tips

💻

Use PostgreSQL

Switch from SQLite to PostgreSQL for projects with more than 10,000 tasks or multiple concurrent annotators.

☁

Cloud Storage

Store data files in S3/GCS/Azure instead of uploading directly. This reduces server load and enables larger datasets.

👥

Task Distribution

Use task assignment rules to distribute work evenly across annotators and prevent duplicate effort.

📊

Monitor Progress

Track annotation speed, quality metrics, and remaining tasks. Set daily targets and identify bottlenecks early.

⚠

Common mistakes to avoid:

Starting annotation without clear guidelines
Not measuring inter-annotator agreement
Using the wrong export format for your ML framework
Not backing up your Label Studio database regularly
Ignoring annotator feedback about ambiguous cases

Course Summary

Congratulations on completing the Label Studio course! You have learned how to install and configure Label Studio, create labeling templates for images and text, connect ML backends for assisted labeling, and implement quality control workflows for production annotation pipelines.

← Previous ML Backend Next → Course Home

Best Practices

Annotation Guidelines

Quality Control Strategies

Overlap / Redundancy

Review Workflow

Gold Standard Tasks

Spot Checks

Inter-Annotator Agreement

Export Format Selection

Using the API

Scaling Tips

Use PostgreSQL

Cloud Storage

Task Distribution

Monitor Progress

Course Summary