My Research Notes: What is the Mixture of Gaussians(MoG)

🎯 Context First: Why Learn Part Positions?

When you’re trying to detect parts of an object (like a bird’s head or tail), it helps to know where those parts typically appear relative to the object.

For example: The head is usually above the body.
But because of pose variations, one single "mean" location isn’t enough.

That’s why the model uses a Mixture of Gaussians — to capture multiple common configurations of part locations.

🧠 What is a Mixture of Gaussians (MoG)?

A Mixture of Gaussians is a probabilistic model that represents a complex distribution as a weighted sum of multiple Gaussian (bell curve) distributions.

Mathematically:

P(x) = \sum_{k=1}^{K} \pi_k \cdot \mathcal{N}(x | \mu_k, \Sigma_k)

Where:

$\pi_k$ = weight (mixing coefficient) of the $k$ -th Gaussian (sums to 1)
$\mu_k$ = mean (center) of the $k$ -th Gaussian
$\Sigma_k$ = covariance (spread/shape) of the $k$ -th Gaussian
$K$ = number of mixture components

🧭 How is it used to model part positions?

In Part-based R-CNN, here's how it works:

1️⃣ Collect Part Coordinates from Training Data

For each part type (e.g., head), gather the coordinates from annotated training images, relative to the whole object box.

Example:
If the bird’s body is at (x, y, w, h), and the head is at (xₕ, yₕ), then part coordinates are normalized as offsets.

2️⃣ Fit a Mixture of Gaussians to These Coordinates

Use the training data to fit a K-component MoG to these relative positions.

Each Gaussian captures a common configuration:

One Gaussian might represent the bird facing left.
Another could capture the bird flying, where the head is farther from the body.

📌 You can fit this using Expectation-Maximization (EM) algorithm.

3️⃣ Use the MoG as a Geometric Prior

At test time:

You detect the object (e.g., bird body).
Then, for each possible part location (e.g., a proposed head), you compute the likelihood of that position under the MoG model.

This becomes the δMG(x) scoring function in the paper — higher for "likely" locations.

So, in the paper’s scoring formula:

\Delta_{\text{geometric}}(X) = \Delta_{\text{box}}(X) \cdot \left( \prod_{i=1}^{n} \delta_i(x_i) \right)^\alpha

$\delta_i(x_i)$ is the MoG likelihood of part $i$ ’s location.
$\alpha$ is a weighting factor.

🔍 Example: Bird Head Detection

Let’s say in training data:

Sometimes the head is 20 pixels above the center of the bird (Gaussian 1),
Sometimes it’s 15 pixels to the left and 5 above (Gaussian 2),
Sometimes it’s farther away (flying posture — Gaussian 3).

These patterns are captured as 3 Gaussians in the MoG.

At test time, a proposed head location gets a high score only if it aligns with one of these common patterns.

✅ Benefits of Using MoG

Advantage	Description
Multi-pose modeling	Handles pose variability with multiple Gaussians
Probabilistic & smooth	Assigns soft likelihoods instead of hard cutoffs
Lightweight computation	Gaussians are fast to evaluate
Easy to fit	Uses standard EM algorithm

My Research Notes

Monday, 21 April 2025

What is the Mixture of Gaussians(MoG)