Monday, 21 April 2025

🧠 What is a Region Proposal Network (RPN)?

 

🧠 What is a Region Proposal Network (RPN)?

A Region Proposal Network (RPN) is a small fully convolutional neural network that slides over a shared feature map of an image and predicts object proposals (regions likely to contain objects).

It replaces external methods like Selective Search with a fast, trainable alternative.


⚙️ How RPN Works – Step-by-Step

Let’s go through the internal workings of the RPN:


1️⃣ Input: Shared Feature Map

  • The RPN operates on the feature map extracted from the image by a backbone CNN (e.g., ResNet, VGG).

  • This feature map is shared with the detection network — no redundant computation.


2️⃣ Sliding Window Mechanism

  • A small 3×3 sliding window moves over the entire feature map.

  • At each spatial location (like a pixel), the RPN looks at a patch of features and makes predictions.


3️⃣ Anchors: Fixed Reference Boxes

  • At every location, the RPN uses multiple anchors — predefined bounding boxes of various:

    • Scales (e.g., small, medium, large),

    • Aspect Ratios (e.g., 1:1, 2:1, 1:2).

  • Typically 9 anchors per location (3 scales × 3 aspect ratios).


4️⃣ Outputs Per Anchor

For each anchor box, the RPN predicts:

  • Objectness Score:

    • Binary classification — Is this box an object or not?

  • Bounding Box Regression:

    • 4 values to adjust (refine) the anchor: (dx, dy, dw, dh)

So, at each location, RPN outputs:

  • 9 objectness scores and 9×4 = 36 bbox regressions (if 9 anchors).


5️⃣ Proposal Selection (Post-processing)

  • All anchor boxes are scored and adjusted.

  • RPN applies:

    • Non-Maximum Suppression (NMS) to remove overlapping boxes,

    • And keeps the Top-N (e.g., 300) proposals to send to the detection network.


🧪 Training the RPN

  • Labels:

    • Anchors are labeled as positive/negative based on IoU (Intersection over Union) with ground-truth boxes.

    • Positive: IoU ≥ 0.7

    • Negative: IoU ≤ 0.3

  • Loss Function: A multi-task loss with:

    • Classification loss (log loss for objectness)

    • Regression loss (smooth L1 for bbox refinement)

L=Lcls(p,p)+λ[p>0]Lreg(t,t)L = L_cls(p, p*) + λ [p* > 0] L_reg(t, t*)

🔄 Summary: RPN Flow

css
[Shared CNN Feature Map][3×3 Sliding Window] ↓ → [Anchors per location] → Objectness scores → Bbox regression ↓ [Non-Max Suppression][Top N Proposals to Detection Head]

✅ Why RPN is Revolutionary

Before (R-CNN, Fast R-CNN)After (Faster R-CNN with RPN)
Hand-crafted proposals (slow)Learned proposals (fast & accurate)
Separate pipelineFully end-to-end training
Fixed number of candidatesAdaptive proposals from the image
No GPU-friendly designFully convolutional, GPU-efficient

No comments:

Post a Comment

🧠 You Only Laugh Once: Creativity and Humor in Deep Learning Community

It all started with a simple truth: Attention Is All You Need . Or at least, that’s what the transformers keep whispering at every AI confer...