My Research Notes: 🧠 What is a Region Proposal Network (RPN)?

🧠 What is a Region Proposal Network (RPN)?

A Region Proposal Network (RPN) is a small fully convolutional neural network that slides over a shared feature map of an image and predicts object proposals (regions likely to contain objects).

It replaces external methods like Selective Search with a fast, trainable alternative.

⚙️ How RPN Works – Step-by-Step

Let’s go through the internal workings of the RPN:

1️⃣ Input: Shared Feature Map

The RPN operates on the feature map extracted from the image by a backbone CNN (e.g., ResNet, VGG).
This feature map is shared with the detection network — no redundant computation.

2️⃣ Sliding Window Mechanism

A small 3×3 sliding window moves over the entire feature map.
At each spatial location (like a pixel), the RPN looks at a patch of features and makes predictions.

3️⃣ Anchors: Fixed Reference Boxes

At every location, the RPN uses multiple anchors — predefined bounding boxes of various:
- Scales (e.g., small, medium, large),
- Aspect Ratios (e.g., 1:1, 2:1, 1:2).
Typically 9 anchors per location (3 scales × 3 aspect ratios).

4️⃣ Outputs Per Anchor

For each anchor box, the RPN predicts:

Objectness Score:
- Binary classification — Is this box an object or not?
Bounding Box Regression:
- 4 values to adjust (refine) the anchor: (dx, dy, dw, dh)

So, at each location, RPN outputs:

9 objectness scores and 9×4 = 36 bbox regressions (if 9 anchors).

5️⃣ Proposal Selection (Post-processing)

All anchor boxes are scored and adjusted.
RPN applies:
- Non-Maximum Suppression (NMS) to remove overlapping boxes,
- And keeps the Top-N (e.g., 300) proposals to send to the detection network.

🧪 Training the RPN

Labels:
- Anchors are labeled as positive/negative based on IoU (Intersection over Union) with ground-truth boxes.
- Positive: IoU ≥ 0.7
- Negative: IoU ≤ 0.3
Loss Function: A multi-task loss with:
- Classification loss (log loss for objectness)
- Regression loss (smooth L1 for bbox refinement)

L = L_cls(p, p*) + λ [p* > 0] L_reg(t, t*)

🔄 Summary: RPN Flow

css
[Shared CNN Feature Map]
   ↓
[3×3 Sliding Window]
   ↓
→ [Anchors per location]
   → Objectness scores
   → Bbox regression
   ↓
[Non-Max Suppression]
   ↓
[Top N Proposals to Detection Head]

✅ Why RPN is Revolutionary

Before (R-CNN, Fast R-CNN)	After (Faster R-CNN with RPN)
Hand-crafted proposals (slow)	Learned proposals (fast & accurate)
Separate pipeline	Fully end-to-end training
Fixed number of candidates	Adaptive proposals from the image
No GPU-friendly design	Fully convolutional, GPU-efficient

My Research Notes

Monday, 21 April 2025

🧠 What is a Region Proposal Network (RPN)?

🧠 What is a Region Proposal Network (RPN)?

⚙️ How RPN Works – Step-by-Step

1️⃣ Input: Shared Feature Map

2️⃣ Sliding Window Mechanism

3️⃣ Anchors: Fixed Reference Boxes

4️⃣ Outputs Per Anchor

5️⃣ Proposal Selection (Post-processing)

🧪 Training the RPN

🔄 Summary: RPN Flow

✅ Why RPN is Revolutionary

No comments:

Post a Comment

Understading the Paper: Fine Grained Image Analysis with Deep Learning