Advanced

Computer Vision Best Practices

Practical guidance for building robust CV systems — from dataset creation to production deployment and ethical considerations.

Dataset Creation and Annotation

  • Quality over quantity: A well-annotated dataset of 1,000 images often outperforms a poorly labeled dataset of 10,000.
  • Annotation tools: Use tools like CVAT, LabelStudio, Roboflow, or VGG Image Annotator for efficient labeling.
  • Consistency: Define clear annotation guidelines. What counts as a "partial occlusion"? How tight should bounding boxes be?
  • Balance: Ensure balanced representation across classes. Use oversampling or weighted loss functions for imbalanced datasets.
  • Validation: Have multiple annotators label the same images and measure inter-annotator agreement.

Data Augmentation Strategies

CategoryTechniquesWhen to Use
GeometricFlip, rotate, crop, scale, translateAlmost always; fundamental augmentations
ColorBrightness, contrast, saturation, hue jitterWhen lighting varies in real-world conditions
NoiseGaussian noise, blur, JPEG compressionWhen input quality varies
AdvancedCutout, MixUp, CutMix, MosaicWhen you need stronger regularization
GenerativeSynthetic data generation with diffusion modelsWhen real data is scarce or expensive
💡
Important: Not all augmentations are appropriate for all tasks. Vertical flips make sense for satellite imagery but not for face recognition. Always consider whether the augmentation produces images your model might realistically encounter.

Model Selection Guide

ScenarioRecommended ModelReasoning
Small dataset (<1K images)Pretrained ResNet-18 or EfficientNet-B0Smaller models overfit less on small datasets
Large dataset (>10K images)ResNet-50, EfficientNet-B4, or ViTLarger models can leverage more data
Real-time inferenceMobileNet, YOLOv8-nanoOptimized for speed on edge devices
Maximum accuracyViT-Large, ConvNeXt-XL, EfficientNet-B7Larger models with more compute budget
Object detectionYOLOv8 (start with nano/small)Best speed-accuracy tradeoff, easy to use
SegmentationU-Net with ResNet encoderStrong baseline, well-understood architecture

Training Tips

  • Start with transfer learning: Always start with pretrained weights. Training from scratch is rarely justified.
  • Learning rate: Use a learning rate finder. Typical values: 1e-3 for new heads, 1e-5 for fine-tuning backbones.
  • Batch size: Larger batches are faster but may generalize worse. Use gradient accumulation if your GPU cannot fit large batches.
  • Mixed precision: Use FP16 training (PyTorch AMP) to nearly double throughput and halve memory usage.
  • Early stopping: Monitor validation loss and stop when it plateaus to prevent overfitting.
  • GPU/TPU selection: An NVIDIA RTX 3090 or A100 is ideal. Google Colab provides free T4 GPUs for prototyping.

Evaluation Metrics

TaskPrimary MetricDescription
ClassificationAccuracy, Top-5 AccuracyPercentage of correctly classified images
DetectionmAP (mean Average Precision)Average precision across all classes at various IoU thresholds
SegmentationmIoU (mean IoU)Average IoU between predicted and ground truth masks across classes
All tasksPrecision, Recall, F1Trade-off between false positives and false negatives

Deployment

  1. Model Optimization

    Quantize (INT8), prune, or distill your model for faster inference. Use ONNX Runtime or TensorRT for production.

  2. Edge Deployment

    For mobile/IoT: use TensorFlow Lite, CoreML (Apple), or ONNX Runtime Mobile. Consider model size and latency constraints.

  3. Cloud Deployment

    Serve with TorchServe, TensorFlow Serving, or Triton Inference Server. Use batch inference for throughput-intensive workloads.

  4. Monitoring

    Track inference latency, accuracy drift, and data distribution shifts in production.

Ethical Considerations

  • Surveillance: Facial recognition and tracking technologies raise significant privacy and civil liberties concerns. Consider whether your application could enable mass surveillance.
  • Bias: CV models can exhibit demographic biases (e.g., lower accuracy on darker skin tones). Test across diverse populations and lighting conditions.
  • Consent: Ensure proper consent for collecting and using images of people, especially for training data.
  • Deepfakes: Generative CV models can create convincing fake images and videos. Consider misuse potential when deploying generative capabilities.
  • Transparency: Be clear about what your CV system can and cannot do. Avoid overstating capabilities.

Frequently Asked Questions

With transfer learning and good augmentation, you can achieve reasonable results with as few as 100-500 images per class. For production quality, aim for 1,000-5,000 images per class. More complex tasks like segmentation may need even more annotated data.

Both are excellent choices. PyTorch is more popular in research and has a more Pythonic API. TensorFlow has stronger production/deployment tooling (TFLite, TF Serving, TFX). Most modern CV libraries and pretrained models support both. Choose whichever your team is more comfortable with.

Most models require fixed input sizes. Resize images to the model's expected input (e.g., 224x224 for ResNet, 640x640 for YOLO). Use letterboxing (padding with gray) to preserve aspect ratio. For segmentation, you can use sliding window inference or process at the original resolution with fully convolutional networks.

Yes, for inference. Smaller models like MobileNet or YOLOv8-nano run at reasonable speeds on modern CPUs. For training, a GPU is strongly recommended. Use model optimization (quantization, ONNX Runtime) to speed up CPU inference.

Start with this course and OpenCV tutorials. Then work through a practical project (build a classifier, train a YOLO model on custom data). Study Stanford CS231n for deeper theory. Join Kaggle competitions for practice with real datasets. Most importantly, build projects that solve problems you care about.

Congratulations! You have completed the Computer Vision course. You now understand image processing, detection, classification, segmentation, generative models, and production best practices. Keep experimenting and building!