Why Logistic Regression Uses Log-Odds: From Linear Outputs to Probabilities
Logistic regression is a cornerstone of statistical classification and binary prediction. It offers a simple yet mathematically elegant method of transforming a linear function into a probability estimate. In this article, we explore why logistic regression models the log of the odds and how this logit transformation links linear predictors to probabilities.
1. The Challenge of Modeling Binary Outcomes
Suppose we have a binary response variable \( Y \in \{0, 1\} \), where 1 denotes the positive class. If we naively try to model the probability \( p = P(Y = 1 \mid \mathbf{X}) \) using a linear model:
\[ p = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]we run into a problem: the linear combination of predictors on the right-hand side can produce values less than 0 or greater than 1, which are invalid probabilities.
2. Enter the Logit Function
To address this, logistic regression models the log-odds (also called the logit) of the probability:
\[ \log\left(\frac{p}{1 - p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]This transformation maps the probability range \( (0, 1) \) to the entire real line \( (-\infty, \infty) \), which makes it suitable for modeling via linear functions.
3. Inverting the Logit: The Logistic (Sigmoid) Function
Solving the above equation for \( p \) gives us the inverse logit function, commonly known as the logistic or sigmoid function:
\[ p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k)}} \]This function compresses the entire real line into the interval \( (0, 1) \), yielding valid probabilities for any input.
4. From Linear Output to Probability
In logistic regression, we first compute a linear score \( z \):
\[ z = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]Then we transform this score into a probability:
\[ p = \frac{1}{1 + e^{-z}} \]Thus, we interpret the output of the linear function as the log-odds, and apply the logistic transformation to recover the probability. This process is what’s meant when we say:
“The response variable is the log of the odds of being classified in the positive class.”
5. Visualizing the Sigmoid Transformation
The sigmoid function is S-shaped and asymptotes to 0 and 1 at the extremes. It centers at 0, where it returns \( p = 0.5 \).

As \( z \rightarrow -\infty \), \( p \rightarrow 0 \), and as \( z \rightarrow +\infty \), \( p \rightarrow 1 \). This property ensures all outputs are valid probabilities, no matter how large or small the linear predictor becomes.
6. Why This Is Probabilistically Valid
Logistic regression is grounded in probability theory. Specifically, we assume:
\[ Y \sim \text{Bernoulli}(p) \]and that:
\[ \log\left(\frac{p}{1 - p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]Using maximum likelihood estimation (MLE), we derive the coefficients \( \beta \) that best explain the observed outcomes. This ensures that the estimated \( p \) is the value that maximizes the likelihood of the data — making it the best possible estimate of the true probability, given the model.
7. Summary of the Mapping
To summarize, logistic regression works as follows:
- Compute a linear combination: \( z = \beta_0 + \beta_1 X_1 + \cdots \)
- Interpret this as the log-odds of the positive class.
- Convert the log-odds to probability: \( p = \frac{1}{1 + e^{-z}} \)
This entire process ensures that we obtain a valid probability between 0 and 1, based on linear inputs, while preserving the flexibility and interpretability of a linear model.
8. Conclusion
When we say “the response variable is the log of the odds of being in the positive class,” we are not directly modeling probabilities or outcomes — we are modeling the log-odds as a linear function. This logit-linear structure, when passed through the sigmoid function, yields interpretable and bounded probabilities, which is exactly what we need for classification tasks.
Thanks to this elegant transformation, logistic regression remains one of the most interpretable and widely used classification models in statistics and machine learning.
No comments:
Post a Comment