Monday, 21 April 2025

What is the Mixture of Gaussians(MoG)

 

🎯 Context First: Why Learn Part Positions?

When you’re trying to detect parts of an object (like a bird’s head or tail), it helps to know where those parts typically appear relative to the object.

  • For example: The head is usually above the body.

  • But because of pose variations, one single "mean" location isn’t enough.

That’s why the model uses a Mixture of Gaussians — to capture multiple common configurations of part locations.


🧠 What is a Mixture of Gaussians (MoG)?

A Mixture of Gaussians is a probabilistic model that represents a complex distribution as a weighted sum of multiple Gaussian (bell curve) distributions.

Mathematically:

P(x)=k=1KπkN(xμk,Σk)P(x) = \sum_{k=1}^{K} \pi_k \cdot \mathcal{N}(x | \mu_k, \Sigma_k)

Where:

  • πk\pi_k = weight (mixing coefficient) of the kk-th Gaussian (sums to 1)

  • μk\mu_k = mean (center) of the kk-th Gaussian

  • Σk\Sigma_k = covariance (spread/shape) of the kk-th Gaussian

  • KK = number of mixture components


🧭 How is it used to model part positions?

In Part-based R-CNN, here's how it works:


1️⃣ Collect Part Coordinates from Training Data

For each part type (e.g., head), gather the coordinates from annotated training images, relative to the whole object box.

Example:
If the bird’s body is at (x, y, w, h), and the head is at (xₕ, yₕ), then part coordinates are normalized as offsets.


2️⃣ Fit a Mixture of Gaussians to These Coordinates

Use the training data to fit a K-component MoG to these relative positions.

Each Gaussian captures a common configuration:

  • One Gaussian might represent the bird facing left.

  • Another could capture the bird flying, where the head is farther from the body.

📌 You can fit this using Expectation-Maximization (EM) algorithm.


3️⃣ Use the MoG as a Geometric Prior

At test time:

  • You detect the object (e.g., bird body).

  • Then, for each possible part location (e.g., a proposed head), you compute the likelihood of that position under the MoG model.

This becomes the δMG(x) scoring function in the paper — higher for "likely" locations.

So, in the paper’s scoring formula:

Δgeometric(X)=Δbox(X)(i=1nδi(xi))α\Delta_{\text{geometric}}(X) = \Delta_{\text{box}}(X) \cdot \left( \prod_{i=1}^{n} \delta_i(x_i) \right)^\alpha
  • δi(xi)\delta_i(x_i) is the MoG likelihood of part ii’s location.

  • α\alpha is a weighting factor.


🔍 Example: Bird Head Detection

Let’s say in training data:

  • Sometimes the head is 20 pixels above the center of the bird (Gaussian 1),

  • Sometimes it’s 15 pixels to the left and 5 above (Gaussian 2),

  • Sometimes it’s farther away (flying posture — Gaussian 3).

These patterns are captured as 3 Gaussians in the MoG.

At test time, a proposed head location gets a high score only if it aligns with one of these common patterns.


✅ Benefits of Using MoG

AdvantageDescription
Multi-pose modelingHandles pose variability with multiple Gaussians
Probabilistic & smoothAssigns soft likelihoods instead of hard cutoffs
Lightweight computationGaussians are fast to evaluate
Easy to fitUses standard EM algorithm

No comments:

Post a Comment

🧠 You Only Laugh Once: Creativity and Humor in Deep Learning Community

It all started with a simple truth: Attention Is All You Need . Or at least, that’s what the transformers keep whispering at every AI confer...