🎯 Context First: Why Learn Part Positions?
When you’re trying to detect parts of an object (like a bird’s head or tail), it helps to know where those parts typically appear relative to the object.
-
For example: The head is usually above the body.
-
But because of pose variations, one single "mean" location isn’t enough.
That’s why the model uses a Mixture of Gaussians — to capture multiple common configurations of part locations.
🧠 What is a Mixture of Gaussians (MoG)?
A Mixture of Gaussians is a probabilistic model that represents a complex distribution as a weighted sum of multiple Gaussian (bell curve) distributions.
Mathematically:
Where:
-
= weight (mixing coefficient) of the -th Gaussian (sums to 1)
-
= mean (center) of the -th Gaussian
-
= covariance (spread/shape) of the -th Gaussian
-
= number of mixture components
🧭 How is it used to model part positions?
In Part-based R-CNN, here's how it works:
1️⃣ Collect Part Coordinates from Training Data
For each part type (e.g., head), gather the coordinates from annotated training images, relative to the whole object box.
Example:
If the bird’s body is at (x, y, w, h), and the head is at (xₕ, yₕ), then part coordinates are normalized as offsets.
2️⃣ Fit a Mixture of Gaussians to These Coordinates
Use the training data to fit a K-component MoG to these relative positions.
Each Gaussian captures a common configuration:
-
One Gaussian might represent the bird facing left.
-
Another could capture the bird flying, where the head is farther from the body.
📌 You can fit this using Expectation-Maximization (EM) algorithm.
3️⃣ Use the MoG as a Geometric Prior
At test time:
-
You detect the object (e.g., bird body).
-
Then, for each possible part location (e.g., a proposed head), you compute the likelihood of that position under the MoG model.
This becomes the δMG(x) scoring function in the paper — higher for "likely" locations.
So, in the paper’s scoring formula:
-
is the MoG likelihood of part ’s location.
-
is a weighting factor.
🔍 Example: Bird Head Detection
Let’s say in training data:
-
Sometimes the head is 20 pixels above the center of the bird (Gaussian 1),
-
Sometimes it’s 15 pixels to the left and 5 above (Gaussian 2),
-
Sometimes it’s farther away (flying posture — Gaussian 3).
These patterns are captured as 3 Gaussians in the MoG.
At test time, a proposed head location gets a high score only if it aligns with one of these common patterns.
✅ Benefits of Using MoG
| Advantage | Description |
|---|---|
| Multi-pose modeling | Handles pose variability with multiple Gaussians |
| Probabilistic & smooth | Assigns soft likelihoods instead of hard cutoffs |
| Lightweight computation | Gaussians are fast to evaluate |
| Easy to fit | Uses standard EM algorithm |
No comments:
Post a Comment