Object Detection
Object detection goes beyond classification—it finds and localizes multiple objects in an image. Each detection includes a bounding box and class label.
Task: For each object in an image, predict: (1) What it is (class), (2) Where it is (bounding box coordinates).
Key Concepts
Bounding Box
Rectangle around object.
or (x1, y1, x2, y2)
IoU (Intersection over Union)
Overlap between predicted and ground truth boxes.
Range: 0 to 1
Confidence Score
How confident the model is about detection.
NMS (Non-Max Suppression)
Remove duplicate detections.
Suppress overlapping boxes (IoU > threshold)
Detection Approaches
Two-Stage Detectors (R-CNN Family)
First propose regions, then classify them.
2. Classify and refine each proposal
Examples: R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN
✓ High accuracy | ✗ Slower (two stages)
One-Stage Detectors (YOLO, SSD)
Predict boxes and classes in single pass.
Each cell predicts bounding boxes + class probabilities
Examples: YOLO (v1-v8), SSD, RetinaNet
✓ Very fast (real-time) | ✗ Slightly lower accuracy
YOLO (You Only Look Once)
Most popular real-time object detector. Treats detection as regression problem.
YOLO Versions
Evaluation Metrics
mAP (mean Average Precision)
Primary metric for object detection.
AP = area under precision-recall curve
Common: mAP@0.5, mAP@0.5:0.95
FPS (Frames Per Second)
Speed metric for real-time applications.
YOLO: 30-150 FPS depending on version
Faster R-CNN: 5-10 FPS
Applications
Key Takeaway: Object detection localizes and classifies multiple objects. YOLO is the go-to for real-time applications, while Faster R-CNN offers higher accuracy when speed isn't critical.