My Research Notes: Basics: Binary Cross Entropy Loss

Binary Cross-Entropy Loss (also known as Log Loss or Logistic Loss) is a commonly used loss function for binary classification problems in neural networks. It measures how well the model's predictions match the actual labels and is particularly suitable for problems where there are only two classes (e.g., yes/no, 0/1, true/false).

Here's an in-depth explanation of Binary Cross-Entropy Loss, including how it works and why it’s used:

1. Purpose of Binary Cross-Entropy Loss

The Binary Cross-Entropy Loss is used to measure the error between the actual label and the predicted probability that a given input belongs to the positive class. The goal of the neural network is to minimize this loss, which means making its predictions as close to the actual labels as possible.

2. Binary Classification Setup

In binary classification, the target labels are either:

$0$ or $1$ , which represent the two classes.
The model outputs a single value between $0$ and $1$ , which can be interpreted as the probability that the input belongs to class 1 (positive class).

3. Activation Function

Typically, the output layer of a binary classification neural network uses a sigmoid activation function. This function outputs a probability score between $0$ and $1$ , which is suitable for binary outcomes.

\sigma(x) = \frac{1}{1 + e^{-x}}

Where $x$ represents the input (or "logit") to the sigmoid function.

4. The Formula for Binary Cross-Entropy Loss

The Binary Cross-Entropy Loss (BCE) is defined as:

L = -\left( y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y}) \right)

Where:

$y$ is the true label (either 0 or 1).
$\hat{y}$ is the predicted probability that the input belongs to the positive class, as output by the sigmoid function.

5. How the Loss Works

If the true label $y = 1$
$L = -\log(\hat{y})$
In this case, if the predicted probability $\hat{y}$ is close to 1, the loss will be small. If $\hat{y}$ is close to 0, the loss will be high. This encourages the model to assign a high probability to the correct class.
If the true label $y = 0$
$L = -\log(1 - \hat{y})$
In this case, if the predicted probability $\hat{y}$ is close to 0, the loss will be small. If $\hat{y}$ is close to 1, the loss will be high. This encourages the model to assign a low probability to the incorrect class.

6. Properties and Intuition

Minimization Goal: The goal during training is to minimize the Binary Cross-Entropy Loss. This means that the model is trained to output a probability close to 1 for positive examples and close to 0 for negative examples.
Logarithmic Penalty: The logarithm in the loss function heavily penalizes incorrect predictions, especially if the model is confident but wrong. For example, predicting a probability close to 1 for a class that should be 0 will result in a large loss.
Probability Interpretation: The use of a sigmoid activation at the output layer makes the model's prediction interpretable as a probability, which works well with the Binary Cross-Entropy Loss to evaluate classification performance.

7. Example Calculation

Consider a scenario where:

The true label $y = 1$
The predicted probability $\hat{y} = 0.9$

The loss $L$ would be:

L = -\left( 1 \cdot \log(0.9) + (1 - 1) \cdot \log(1 - 0.9) \right)

L = -\log(0.9) \approx 0.105

Since the predicted probability is close to the true label, the loss is relatively low.

Now, consider another scenario where the true label is $y = 0 and the predicted probability is$ $\hat{y} = 0.9$

L = -\left( 0 \cdot \log(0.9) + (1 - 0) \cdot \log(1 - 0.9) \right)

L = -\log(0.1) \approx 2.302

In this case, the loss is much higher because the model was quite confident in the wrong answer.

8. Practical Use

In practice, Binary Cross-Entropy Loss is used in scenarios where:

Binary Classification is required, such as determining if an email is spam or not, whether a tumor is malignant or benign, or predicting a binary outcome like yes/no.
The model output is a single probability representing the likelihood that the input belongs to class 1.

9. Example in Python (Keras)

In Keras, you can specify the Binary Cross-Entropy Loss when compiling your model for a binary classification problem:

python

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

This tells Keras to use the Binary Cross-Entropy Loss as the loss function for training the model.

10. Summary

Binary Cross-Entropy Loss is used for binary classification problems.
The loss function measures the difference between the true labels and the predicted probabilities.
The objective during training is to minimize the loss, encouraging the model to predict probabilities close to the true labels.
The logarithm in the formula heavily penalizes confident incorrect predictions.

The core idea of Binary Cross-Entropy Loss is to penalize the model when it assigns a high probability to the wrong class or a low probability to the correct class, which pushes the model to improve its predictions during training.

My Research Notes

Saturday, 9 November 2024

Basics: Binary Cross Entropy Loss