Thursday, 4 June 2026

What is Metric Learning?

What is Metric Learning?

Metric learning is a machine learning approach in which the model learns how to measure similarity or distance between data points. Instead of using a fixed distance formula, metric learning tries to learn a distance function that is meaningful for a specific task.

In ordinary machine learning, we often use fixed distance measures such as Euclidean distance. For two data points \(x_i\) and \(x_j\), Euclidean distance is commonly written as:

\[ d(x_i, x_j) = \sqrt{\sum_k (x_{ik} - x_{jk})^2} \]

The problem is that Euclidean distance may not understand what “similar” means in a particular domain. Two objects may be close in terms of colour, shape, or surface texture, but they may still belong to different meaningful categories. Metric learning addresses this problem by learning a better notion of distance from examples.

In simple words, metric learning teaches a model what should be considered near and what should be considered far.

Basic Idea of Metric Learning

The core idea of metric learning is simple:

\[ \text{Same class} \Rightarrow \text{small distance} \]

\[ \text{Different class} \Rightarrow \text{large distance} \]

For example, if the task is saree classification, two Kanjivaram sarees should be placed closer together in the learned representation space. A Kanjivaram saree and a Banarasi saree should be placed farther apart, even if both have similar colours or similar golden borders.

Example from Saree Classification

Suppose two sarees are visually similar because both are red and both have gold zari borders. A simple distance measure may treat them as highly similar because it notices colour and brightness. However, one saree may be Kanjivaram and the other may be Banarasi.

Metric learning tries to learn deeper visual differences. It may learn that similarity should depend not only on colour, but also on motif structure, border style, pallu design, weave appearance, texture, and regional craft characteristics.

Situation What Metric Learning Tries to Do
Two Kanjivaram sarees Bring them closer in the learned space
A Kanjivaram saree and a Banarasi saree Push them farther apart
Same person’s face in two photographs Bring the two face representations closer
Different people’s faces Push their representations apart
Similar product images Place them close for retrieval or recommendation
Unrelated product images Place them far apart

Why Metric Learning is Useful

Metric learning is useful when the task is not only to classify an object, but also to understand meaningful similarity. A normal classifier may answer:

“Is this saree Kanjivaram or Banarasi?”

Metric learning can answer a slightly different and often more useful question:

“Which sarees in the dataset are most similar to this saree?”

This makes metric learning useful for image retrieval, product recommendation, face recognition, fine-grained classification, duplicate detection, provenance identification, and nearest-neighbour search.

Triplet-Based Metric Learning

One common way to train a metric learning model is by using triplets. A triplet contains three examples:

\[ (\text{Anchor}, \text{Positive}, \text{Negative}) \]

Element Meaning Saree Example
Anchor The reference example A Kanjivaram saree image
Positive An example from the same class as the anchor Another Kanjivaram saree image
Negative An example from a different class A Banarasi saree image

The model is trained so that the anchor is closer to the positive example than to the negative example:

\[ d(\text{Anchor}, \text{Positive}) < d(\text{Anchor}, \text{Negative}) \]

This means that the model should learn a representation where examples from the same class are grouped together, while examples from different classes are separated.

Metric Learning in Deep Learning

In deep learning, metric learning usually works by learning an embedding space. An input image is passed through a neural network, and the network converts the image into a numerical vector.

\[ x \rightarrow f(x) \]

Here, \(x\) is the input image and \(f(x)\) is the learned embedding. The embedding is a compact numerical representation of the image. If two images are meaningfully similar, their embeddings should be close. If they are meaningfully different, their embeddings should be far apart.

For example, in a saree provenance identification system, the model may learn embeddings in which sarees from the same craft cluster are close to each other. Sarees from different regions or weaving traditions should be placed farther apart.

One-Sentence Explanation

Metric learning is the process of teaching a model what “near” and “far” should mean for a specific problem.

Relevance to Saree Provenance Research

In saree provenance classification, metric learning can be especially powerful because saree categories are often fine-grained. Two sarees may share similar colours, borders, or decorative features, but their provenance may differ because of subtle differences in motif design, weaving technique, pallu structure, yarn appearance, or regional design grammar.

A metric learning model can help create a visual space where sarees from the same origin or craft tradition are placed close together. This can support classification, retrieval, comparison, and explainability in saree image analysis.

No comments:

Post a Comment

Understading the Paper: Fine Grained Image Analysis with Deep Learning

Fine-Grained Image Analysis with Deep Learning: A Simple Explanation In ordinary image classification, a computer vision model may be...