My Research Notes: Understanding Backpropagation Through a Basic Neural Network Example

Saturday, 24 May 2025

Understanding Backpropagation Through a Basic Neural Network Example

By Priyank Goyal

In this article, we explore the foundational workings of a feedforward neural network with a single hidden layer. Our goal is to demystify the process of forward propagation, loss computation, and especially backpropagation—the core algorithm that powers learning in neural networks. This walkthrough answers three essential questions:

What is a basic neural network?
How does backpropagation work in such a network?
How can this process be translated into working Python code?

1. Anatomy of a Basic Neural Network

Let us consider a minimal neural network consisting of three layers:

Input Layer with 3 neurons (features)
Hidden Layer with 2 neurons and ReLU activation
Output Layer with 1 neuron and sigmoid activation

The goal of this network is to perform binary classification. Each input vector \( x \in \mathbb{R}^3 \) is passed through the network, which outputs a probability \( \hat{y} \in [0, 1] \).

The network's computation can be summarized as:

\[ z_1 = W_1 x + b_1,\quad h = \text{ReLU}(z_1),\quad z_2 = W_2 h + b_2,\quad \hat{y} = \sigma(z_2) \]

Where:

\( W_1 \in \mathbb{R}^{2 \times 3} \), \( b_1 \in \mathbb{R}^{2 \times 1} \)
\( W_2 \in \mathbb{R}^{1 \times 2} \), \( b_2 \in \mathbb{R} \)
\( \text{ReLU}(z) = \max(0, z) \)
\( \sigma(z) = \frac{1}{1 + e^{-z}} \) is the sigmoid function

2. Forward Propagation

Given the input vector:

\[ x = \begin{bmatrix} 1.0 \\ 0.5 \\ -1.5 \end{bmatrix} \]

and initial parameters:

\[ W_1 = \begin{bmatrix} 0.2 & -0.4 & 0.1 \\ 0.7 & 0.3 & -0.5 \end{bmatrix}, \quad b_1 = \begin{bmatrix} 0.1 \\ -0.2 \end{bmatrix}, \quad W_2 = \begin{bmatrix} 0.5 & -1.0 \end{bmatrix}, \quad b_2 = 0.2 \]

We compute:

\[ z_1 = W_1 x + b_1 = \begin{bmatrix} -0.15 \\ 1.8 \end{bmatrix},\quad h = \text{ReLU}(z_1) = \begin{bmatrix} 0 \\ 1.8 \end{bmatrix} \] \[ z_2 = W_2 h + b_2 = -1.6,\quad \hat{y} = \sigma(z_2) \approx 0.167 \]

3. Loss Computation

Assuming the true label is \( y = 1 \), the binary cross-entropy loss is:

\[ \mathcal{L} = -\left[y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})\right] \approx 1.79 \]

4. Backpropagation Step-by-Step

We now compute the gradients of the loss with respect to each parameter in reverse order, using the chain rule.

Output layer:
\[ \frac{\partial \mathcal{L}}{\partial \hat{y}} = -\frac{1}{\hat{y}},\quad \frac{\partial \hat{y}}{\partial z_2} = \hat{y}(1 - \hat{y}) \] \[ \delta_2 = \frac{\partial \mathcal{L}}{\partial z_2} \approx -0.83 \] \[ \frac{\partial \mathcal{L}}{\partial W_2} = \delta_2 \cdot h^T,\quad \frac{\partial \mathcal{L}}{\partial b_2} = \delta_2 \]
Hidden layer:
\[ \delta_1 = (W_2^T \delta_2) \circ f'(z_1),\quad \text{where ReLU}'(z) = 1 \text{ if } z > 0, 0 \text{ otherwise} \] \[ \frac{\partial \mathcal{L}}{\partial W_1} = \delta_1 \cdot x^T,\quad \frac{\partial \mathcal{L}}{\partial b_1} = \delta_1 \]

5. Python Implementation

The following code performs a full forward and backward pass on the network and updates weights using gradient descent:


import numpy as np

x = np.array([[1.0], [0.5], [-1.5]])
y_true = 1

W1 = np.array([[0.2, -0.4, 0.1], [0.7, 0.3, -0.5]])
b1 = np.array([[0.1], [-0.2]])
W2 = np.array([[0.5, -1.0]])
b2 = np.array([[0.2]])

lr = 0.1

# Forward
z1 = W1 @ x + b1
h = np.maximum(0, z1)
z2 = W2 @ h + b2
y_pred = 1 / (1 + np.exp(-z2))
loss = - (y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Backward
dL_dz2 = -(y_true / y_pred) * (y_pred * (1 - y_pred))
dL_dW2 = dL_dz2 @ h.T
dL_db2 = dL_dz2

dL_dh = W2.T @ dL_dz2
dL_dz1 = dL_dh * (z1 > 0)
dL_dW1 = dL_dz1 @ x.T
dL_db1 = dL_dz1

# Update
W2 -= lr * dL_dW2
b2 -= lr * dL_db2
W1 -= lr * dL_dW1
b1 -= lr * dL_db1

6. Final Updated Parameters

Parameter	Updated Value
`W1`	\[ \begin{bmatrix} 0.2 & -0.4 & 0.1 \\ 0.623 & 0.262 & -0.385 \end{bmatrix} \]
`b1`	\[ \begin{bmatrix} 0.1 \\ -0.277 \end{bmatrix} \]
`W2`	\[ \begin{bmatrix} 0.5 & -0.892 \end{bmatrix} \]
`b2`	\( 0.277 \)
Loss	\( \mathcal{L} \approx 1.463 \)

7. Conclusion

This example illustrates how a basic neural network performs forward and backward propagation. By following the chain rule through each layer, we calculate how much each parameter contributes to the output error. Backpropagation enables the network to learn from its mistakes by adjusting its weights and biases through gradient descent.

Once you understand this foundation, you're ready to explore deeper networks, regularization, batch training, and advanced optimizers like Adam and RMSprop. But remember, everything starts here—with a dot product, a ReLU, and a sigmoid.

My Research Notes

Saturday, 24 May 2025

Understanding Backpropagation Through a Basic Neural Network Example

Understanding Backpropagation Through a Basic Neural Network Example

1. Anatomy of a Basic Neural Network

2. Forward Propagation

3. Loss Computation

4. Backpropagation Step-by-Step

5. Python Implementation

6. Final Updated Parameters

7. Conclusion

No comments:

Post a Comment

Understading the Paper: Fine Grained Image Analysis with Deep Learning