🧠What is a Region Proposal Network (RPN)?
A Region Proposal Network (RPN) is a small fully convolutional neural network that slides over a shared feature map of an image and predicts object proposals (regions likely to contain objects).
It replaces external methods like Selective Search with a fast, trainable alternative.
⚙️ How RPN Works – Step-by-Step
Let’s go through the internal workings of the RPN:
1️⃣ Input: Shared Feature Map
-
The RPN operates on the feature map extracted from the image by a backbone CNN (e.g., ResNet, VGG).
-
This feature map is shared with the detection network — no redundant computation.
2️⃣ Sliding Window Mechanism
-
A small 3×3 sliding window moves over the entire feature map.
-
At each spatial location (like a pixel), the RPN looks at a patch of features and makes predictions.
3️⃣ Anchors: Fixed Reference Boxes
-
At every location, the RPN uses multiple anchors — predefined bounding boxes of various:
-
Scales (e.g., small, medium, large),
-
Aspect Ratios (e.g., 1:1, 2:1, 1:2).
-
-
Typically 9 anchors per location (3 scales × 3 aspect ratios).
4️⃣ Outputs Per Anchor
For each anchor box, the RPN predicts:
-
Objectness Score:
-
Binary classification — Is this box an object or not?
-
-
Bounding Box Regression:
-
4 values to adjust (refine) the anchor:
(dx, dy, dw, dh)
-
So, at each location, RPN outputs:
-
9 objectness scores and 9×4 = 36 bbox regressions (if 9 anchors).
5️⃣ Proposal Selection (Post-processing)
-
All anchor boxes are scored and adjusted.
-
RPN applies:
-
Non-Maximum Suppression (NMS) to remove overlapping boxes,
-
And keeps the Top-N (e.g., 300) proposals to send to the detection network.
-
🧪 Training the RPN
-
Labels:
-
Anchors are labeled as positive/negative based on IoU (Intersection over Union) with ground-truth boxes.
-
Positive: IoU ≥ 0.7
-
Negative: IoU ≤ 0.3
-
-
Loss Function: A multi-task loss with:
-
Classification loss (log loss for objectness)
-
Regression loss (smooth L1 for bbox refinement)
-
🔄 Summary: RPN Flow
✅ Why RPN is Revolutionary
| Before (R-CNN, Fast R-CNN) | After (Faster R-CNN with RPN) |
|---|---|
| Hand-crafted proposals (slow) | Learned proposals (fast & accurate) |
| Separate pipeline | Fully end-to-end training |
| Fixed number of candidates | Adaptive proposals from the image |
| No GPU-friendly design | Fully convolutional, GPU-efficient |
No comments:
Post a Comment