Image Segmentation

Image segmentation assigns a class label to every pixel in an image. It's pixel-level classification, providing precise object boundaries instead of just bounding boxes.

Goal: Understand the image at pixel level—what class does each pixel belong to?

Types of Segmentation

Semantic Segmentation

Classify each pixel by class.

All "person" pixels get same label
No distinction between instances

Example: Road, sky, building

Instance Segmentation

Separate different instances.

Person 1, Person 2, Person 3
Each instance gets unique ID

Example: Count individual objects

Panoptic Segmentation

Combines both approaches.

Semantic for stuff (sky, road)
Instance for things (cars, people)

Example: Complete scene understanding

Popular Architectures

U-Net

Encoder-decoder with skip connections. Originally for medical imaging.

Encoder: downsample to capture context
Decoder: upsample to recover spatial resolution
Skip connections: preserve fine details

✓ Excellent for medical images, works with small datasets

DeepLab

Uses atrous (dilated) convolutions for multi-scale context.

Atrous Spatial Pyramid Pooling (ASPP)
Captures multi-scale information
Maintains resolution

✓ State-of-the-art semantic segmentation

Mask R-CNN

Extends Faster R-CNN for instance segmentation.

Detect objects (bounding boxes)
Generate pixel-level masks for each instance
Combines detection + segmentation

✓ Best for instance segmentation

Segment Anything (SAM)

Foundation model for segmentation (Meta AI, 2023).

Trained on 1 billion masks
Zero-shot segmentation
Interactive prompting (points, boxes, text)

✓ Generalizes to any image, no fine-tuning needed

Loss Functions

Dice Loss

Measures overlap between prediction and ground truth.

Dice = 2|A ∩ B| / (|A| + |B|)
Range: 0 to 1

Good for imbalanced classes

IoU Loss

Intersection over Union.

IoU = |A ∩ B| / |A ∪ B|
Loss = 1 - IoU

Directly optimizes evaluation metric

Focal Loss

Handles class imbalance.

Down-weights easy examples
Focuses on hard examples

Useful when background dominates

Combined Loss

Mix multiple losses.

Loss = α·CE + β·Dice
Combines strengths

Often works best in practice

Applications

🏥

Medical Imaging

Tumor segmentation, organ delineation

🚗

Autonomous Driving

Road, lane, pedestrian segmentation

🛰️

Satellite Imagery

Land use classification, change detection

📹

Video Editing

Background removal, green screen replacement

🏭

Agriculture

Crop monitoring, disease detection

🎨

AR/VR

Scene understanding, object manipulation

Evaluation Metrics

IoU (Intersection over Union)

Overlap between predicted and true masks

Dice Coefficient

Similar to IoU, 2×overlap / (pred + true)

Pixel Accuracy

Percentage of correctly classified pixels

Mean IoU (mIoU)

Average IoU across all classes

Key Takeaway: Segmentation provides pixel-level understanding. U-Net for medical images, Mask R-CNN for instances, SAM for general-purpose segmentation.