Best Practices Intermediate

Deploying sentiment analysis in production requires handling messy real-world text, managing edge cases, and building reliable pipelines. This lesson covers the practical considerations that make the difference between a demo and a production system.

Preprocessing Social Media Text

Python
import re

def preprocess_social_media(text):
    # Preserve emojis (they carry sentiment!)
    # Normalize URLs
    text = re.sub(r"https?://\S+", "[URL]", text)
    # Normalize @mentions
    text = re.sub(r"@\w+", "[USER]", text)
    # Normalize repeated characters: "soooo goooood" -> "soo good"
    text = re.sub(r"(.)\1{2,}", r"\1\1", text)
    # Keep hashtag text: "#amazing" -> "amazing"
    text = re.sub(r"#(\w+)", r"\1", text)
    return text.strip()

# Example
tweet = "@brand Your new product is sooooo amazing!!! 🎉 #loveit https://t.co/abc"
print(preprocess_social_media(tweet))
# "[USER] Your new product is soo amazing!!! 🎉 loveit [URL]"

Handling Sarcasm and Irony

Sarcasm is the hardest challenge in sentiment analysis. No model reliably detects sarcasm. Practical strategies include:
  • Train on domain-specific data where sarcasm patterns are consistent
  • Use transformer models (BERT, RoBERTa) which capture context better than rule-based methods
  • Flag low-confidence predictions for human review
  • Combine sentiment with other signals (star ratings, emoji usage) for cross-validation

Multilingual Sentiment Analysis

Python
from transformers import pipeline

# Multilingual sentiment model (supports 100+ languages)
sentiment = pipeline(
    "sentiment-analysis",
    model="nlptown/bert-base-multilingual-uncased-sentiment"
)

texts = [
    "This product is wonderful!",           # English
    "Ce produit est merveilleux!",          # French
    "Dieses Produkt ist wunderbar!",        # German
    "Este producto es maravilloso!",        # Spanish
]

for text in texts:
    result = sentiment(text)
    print(f"{text[:40]:<40} -> {result[0]['label']}")

Production Deployment Checklist

Consideration Recommendation
Latency Use DistilBERT or ONNX-exported models for sub-100ms inference
Throughput Batch predictions; use GPU for >100 req/sec
Monitoring Track sentiment distribution over time; sudden shifts indicate data drift
Confidence Return confidence scores; flag low-confidence predictions for review
Updates Retrain periodically as language evolves (new slang, emojis)
Bias Test for demographic and language biases in your model

Common Pitfalls

Avoid these mistakes:
  • Training on one domain, deploying on another — A model trained on movie reviews will not work well on product reviews
  • Ignoring neutral sentiment — Many systems only handle positive/negative; neutral text is the majority in most datasets
  • Over-relying on accuracy — Use precision, recall, and F1 per class, especially when classes are imbalanced
  • Not handling negation — "Not good" should be negative, not positive
  • Treating all text equally — Headlines, tweets, reviews, and articles need different preprocessing

Course Complete!

You now have the skills to build sentiment analysis systems from simple rule-based tools to state-of-the-art deep learning models. Return to the course overview to review any lessons or explore other AI School courses.

← Course Overview