Understanding Manhattan Distance: Theory, Applications, and Cognitive Implications

Author: Research Notes by Priyank Goyal
Date: May 2025

Introduction

Manhattan Distance, also known as City-block Distance or the L1 norm, is one of the fundamental distance measures used in machine learning, information retrieval, and cognitive modeling. Named after the grid-like layout of Manhattan's streets, it measures the distance between two points by only moving along axes at right angles. This article explores the theoretical underpinnings, practical use cases, strengths, limitations, and cognitive relevance of Manhattan Distance.

Mathematical Definition

The Manhattan Distance between two vectors \( \mathbf{a} \) and \( \mathbf{b} \) in an \( n \)-dimensional space is defined as:

\[ D(\mathbf{a}, \mathbf{b}) = \sum_{i=1}^{n} |a_i - b_i| \]

This metric computes the sum of the absolute differences of their corresponding components. Unlike Euclidean distance which considers diagonal movement (as-the-crow-flies), Manhattan Distance considers orthogonal movement—more suitable for grid-like structures or sparse representations.

Geometric Intuition

To build intuition, consider the two points \( A(1,1) \) and \( B(3,4) \) in a 2D space:

Manhattan Distance: \( |3 - 1| + |4 - 1| = 5 \)
Euclidean Distance: \( \sqrt{(3 - 1)^2 + (4 - 1)^2} = \sqrt{13} \approx 3.6 \)

This demonstrates how Manhattan Distance accumulates all directional changes rather than measuring the direct path.

When to Use Manhattan Distance

Manhattan Distance is particularly useful in the following situations:

High-dimensional data: Where sparsity dominates (e.g., bag-of-words in NLP).
Ordinal features: Where the difference between values matters more than the square of differences.
Outlier robustness: It penalizes large deviations less harshly than Euclidean distance.
Grid-based environments: In robotics and pathfinding, where diagonal movement is disallowed.

Comparison with Other Distance Measures

Let’s compare Manhattan Distance with Euclidean, Cosine, and Pearson Correlation:

Measure	Captures	Invariant to Magnitude	Mean-Centered	Common Use Cases
Manhattan (L1)	Linear deviation	No	No	Sparse vectors, robust models
Euclidean (L2)	Quadratic deviation	No	No	Geometric proximity
Cosine	Vector orientation	Yes	No	Semantic similarity in NLP
Pearson Correlation	Linear shape match	Yes	Yes	Co-occurrence normalization

Properties and Metrics

Manhattan Distance satisfies all the properties of a valid distance metric:

Non-negativity: \( D(a, b) \geq 0 \)
Identity of indiscernibles: \( D(a, b) = 0 \iff a = b \)
Symmetry: \( D(a, b) = D(b, a) \)
Triangle inequality: \( D(a, c) \leq D(a, b) + D(b, c) \)

Inversion for Similarity

Since distance is the inverse of similarity, Manhattan Distance can be transformed into a similarity score:

\[ S(a, b) = \frac{1}{D(a, b)^2 + 1} \]

This transformation is especially useful in vector-space models like HAL or COALS to align numerical distances with human semantic similarity ratings.

Limitations

Despite its usefulness, Manhattan Distance has several limitations:

It ignores directionality or vector shape.
It treats all dimensions equally — feature scaling is critical.
In very high-dimensional spaces, it suffers from the “curse of dimensionality” where distances converge.

Figure 2: Behavior of different norms in 2D (unit circles).

Applications in NLP and Vector Spaces

In natural language processing, Manhattan Distance is useful in contexts like:

Bag-of-Words models: Counting differences in term frequencies.
Topic modeling: Comparing distributions of topic proportions.
Lexical similarity: For co-occurrence-based models where vector sparsity is high.

However, for dense embeddings like Word2Vec or GloVe, cosine similarity or correlation tends to outperform L1-based distances.

Extensions: Part of the Minkowski Family

Manhattan Distance is a special case of the Minkowski Distance:

\[ D_p(a, b) = \left( \sum_i |a_i - b_i|^p \right)^{1/p} \]

For:

\( p = 1 \): Manhattan
\( p = 2 \): Euclidean
\( p \to \infty \): Chebyshev

This formulation allows the metric to be generalized or tuned based on dataset geometry.

Cognitive and Semantic Modeling Perspective

From a cognitive modeling standpoint, Manhattan Distance is often too crude to capture nuanced semantic relationships that humans perceive. It measures only the surface-level deviation across dimensions and fails to account for conceptual overlaps.

However, when embedded in transformation functions like the inverse-squared similarity used in COALS, it can still provide value as part of a hybrid metric system.

Conclusion

Manhattan Distance remains a powerful and interpretable metric in data science and machine learning, especially for sparse, high-dimensional data. While it lacks the directional nuance of cosine similarity or the pattern sensitivity of correlation, its simplicity, efficiency, and robustness make it indispensable in many practical scenarios.

Understanding where, when, and how to use Manhattan Distance—while being aware of its limitations—is key to effective model design in both computational and cognitive domains.

Tags: #DistanceMetrics #NLP #MachineLearning #ManhattanDistance #Similarity

My Research Notes

Sunday, 18 May 2025

Manhattan Distance