Image Segmentation
Image segmentation assigns a class label to every pixel in an image. It's pixel-level classification, providing precise object boundaries instead of just bounding boxes.
Goal: Understand the image at pixel level—what class does each pixel belong to?
Types of Segmentation
Semantic Segmentation
Classify each pixel by class.
No distinction between instances
Example: Road, sky, building
Instance Segmentation
Separate different instances.
Each instance gets unique ID
Example: Count individual objects
Panoptic Segmentation
Combines both approaches.
Instance for things (cars, people)
Example: Complete scene understanding
Popular Architectures
U-Net
Encoder-decoder with skip connections. Originally for medical imaging.
Decoder: upsample to recover spatial resolution
Skip connections: preserve fine details
✓ Excellent for medical images, works with small datasets
DeepLab
Uses atrous (dilated) convolutions for multi-scale context.
Captures multi-scale information
Maintains resolution
✓ State-of-the-art semantic segmentation
Mask R-CNN
Extends Faster R-CNN for instance segmentation.
Generate pixel-level masks for each instance
Combines detection + segmentation
✓ Best for instance segmentation
Segment Anything (SAM)
Foundation model for segmentation (Meta AI, 2023).
Zero-shot segmentation
Interactive prompting (points, boxes, text)
✓ Generalizes to any image, no fine-tuning needed
Loss Functions
Dice Loss
Measures overlap between prediction and ground truth.
Range: 0 to 1
Good for imbalanced classes
IoU Loss
Intersection over Union.
Loss = 1 - IoU
Directly optimizes evaluation metric
Focal Loss
Handles class imbalance.
Focuses on hard examples
Useful when background dominates
Combined Loss
Mix multiple losses.
Combines strengths
Often works best in practice
Applications
Evaluation Metrics
Key Takeaway: Segmentation provides pixel-level understanding. U-Net for medical images, Mask R-CNN for instances, SAM for general-purpose segmentation.