Neural Network: Uncovering the Black Box

Sudeep Das
9 min readFeb 15, 2021

Neural network’s under-the-hood mathematics has baffled many aspiring data scientists and mathematicians for a pretty long time. Our objective here is to discuss how exactly the neural network functions and how the gradient descent and backpropagation works which is the most important function of any neural network that defines the entire working of the neural network. Here we will try to uncover the work that goes inside a black box and inspect every process and step that takes place using Excel and simple mathematics. Some of the important steps that take place in a neural network are:

- Assigning of weights, biases, and activation functions

- Forward Propagation

- Cost function to calculate error

- Back Propagation

- Adjusting of Weights(Optimization)

These steps are repeated till the right amount of weights are obtained where the cost generated from the cost function is the least.

Note : The weights are randomly assigned for convinent understanding.

Without any further a due, let’s dive into the functioning of a neural network and for this purpose, we will begin with an extremely simple neural network and slowly build upon it.

In the above diagram, we have an example of a neural network that has only one input layer and one output layer and the input is represented by ‘i’ and the weight by ‘w’ assigned, and the output generated here is represented by ‘a’ which is the product of input and weight.

Note : Here we aren’t considering bias and activation functions.

a = i*w

Let’s assign them values;

i = 1 ; w = 4 ; Target value(y) = 2

So, the output obtained here is: a = 4

But we have to train the weights such that the output generated should be ‘2’ instead of ‘4’. So, our goal is to adjust the weight such that the output generated should be 2 or very close to it, and this process is known as optimization.

Let’s denote the equation of Cost function or also known as the cost function which is given as:

Cost Function

ŷ = output obtained/ predicted output

y = target value/ expected output
So here we can notice that the cost function C is: (4–2)2 = 4

This means that we need to change the output such that the cost function decreases and that can be done by adjusting the weights. Let’s understand the equation in a mathematical form. The cost function that is given can be expressed in a graphical form too as it’s in form of x² and the diagram for that is given below:

Graph of equation y=x²

Since the graph of the cost function looks very similar to this in the case of the neural network we have taken, our role is to adjust the weight such that the point comes to the minima which can be represented below:

To understand it other words our role is to find the change in every unit of output with every unit change in the weight which can be denoted as ꝺa/ꝺw.

Chain Rule of Differentiation:

We need to understand one of the basic rules of differentiation to understand how these partial derivatives work which is given as:

Chain rule of differentiation

Back Propagation:

This rule is used a lot during the backpropagation while adjusting the weights and this is one of the most important formulas that is responsible for the backpropagation. And hence the derivative of the change in cost function due to change in weights can be shown as the product of derivate of cost function or ‘loss’ w.r.t to ‘a’ and derivate of ‘a’ w.r.t weights:

Let’s calculate for our network:

Here, we take a small value for the learning parameter of 0.2, and later we will be experimenting with the learning parameter of different values too. Now the newly adjusted weight is given as:

w = w - (0.2) *( 4)

w = 4 - (0.2) * 4

w = 3.2

Hence the new adjusted weight is 3.2 and the output generated is: i*new weight = 1*3.2 = 3.2

But still, the new output generated isn’t close to our desired output so the same process is carried on for few more times till we get close to our expected output. In other words, the process keeps iterating again and again.

Similarly, this process is repeated till we get close to our desired output and that can be performed on excel too and the table is given below for different learning rates too. In excel we have tried for 4 different learning parameters and their weights are adjusted for every iteration. Note: At times, for different learning rate parameters certain problems can take place like Vanishing and Exploding gradient problem.

Vanishing Gradient Problem: When the learning parameter is too small as we have taken of 0.0001 the changes that take place in the weights are too small that it remains almost the same even after a lot of iterations. The problem can be seen in the above diagram where the learning rate is 0.001.

Exploding Gradient Problem: When the learning parameter is a bit high then we never reach the minima and instead the values obtained are so large that it becomes computationally too heavy exploding up the calculations in the literal meaning. In the above diagram, we can see for a learning rate of 1.8 the adjusted weights after observation is too large.

Vanishing and Exploding Gradient Problem

But we can see that for learning rates 0.2 and 0.4 the weights are adjusted soon which helps us deriving the output very close to our expected output. And this is how a neural network works in general. So far, we have discussed taking neural networks where there are no bias and activation function and the neural networks are of a simple structure. Let’s work with few more neural networks to understand it better.

Let’s take an example of the above neural network and then adjust the weights till we get our obtained output. We will be using excel to solve the above neural network. So, let’s assign some values to the inputs and weights.

Our main objective to train the weights or in other words, adjust the weights till the point we tend to reach the minima of the curve where the cost function error generated is of a very low value obtaining an output close to or the same as the expected output. For the given diagram let’s calculate the cost function:

…now the process is repeated till we get an output that is close to target values.

… now the process is repeated till we get an output that is close to target values. And iterations are carried on and that can be calculated in an excel sheet given below:

Here, we have taken two different learning rates for both the output layers, and the weights are adjusted till we obtain output similar or close to our expected outputs, and the result is presented below:

Now, let’s focus on a simple neural network that includes an activation function and also a bias. And we will be using a simple sigmoid activation function after we obtain the output ‘a’.

So, for the first iteration:

w = 4 + (0.2)*[ 2(0.947) — 4(0.964) + 2(0.982) ] => 4.0001

And a similar process is carried out which can be shown in excel with different amounts of weights since here learning parameter causes a very minute change so we used weight to see a drastic change in it. We have taken two different weights at ‘w’ at 4 and the other at -0.5

In the bottom diagram, we can see the obtained output stays almost the same in the first case but for the second case, we can see how different weight and learning rates are changing causes a change in the final output which gets closer to 1.

Complete Neural Network:

Now our objective is to apply all the concepts we have learned so far is a complete neural network and understand every step that is carried on in it. Here we have taken both the biases and activation function to compute values that should be close to our target value.

Now let’s assign some values to the parameters mentioned above and carryout the forward propagation whose formulas are mentioned below:

Forward Propagation: Once the weights are assigned we compute all the mentioned calculations till we obtain some outputs and then the cost function is used to calculate the error/loss.

Forward Propagation

Back Propagation: Now we carry one of the most crucial step that is to adjust the weights by calculating the derivatives of loss w.r.t individual weights present in both the output and hidden layer. We will first adjust the weights in the output layer and then in the hidden layer.

Output Layer: The equations for the output layer are shown below.

Output Layer

Hidden Layer: Once the weights are adjusted in the output layer then the weights in the hidden layer are adjusted and the equations for the hidden layer are shown below along with a diagram.

Our next objective is to present that in an equation form which is given below in an unexpanded form after plugging in the above two values in Eqn. as:

Once, the values are obtained then they are multiplied with the learning parameter and adjusted with the actual weights, and again the similar process is repeated for multiple iterations till the point we obtain weights that produce ‘ŷ’ close to y or in other words that generate the least amount of error/loss.

In the upcoming blogs, we will be building up a neural network right from the scratch without using Keras, PyTorch, or any deep learning frameworks using only NumPy with python. And I hope my in-depth analysis and work would have brought you more clarity and an intrinsic understanding of the working of a neural network.

You can connect with me at : https://www.linkedin.com/in/sudeepdas27/

--

--

Sudeep Das

A passionate &inquisitive learner, member of Data Science Society(IMI Delhi). A strong passion for ML/DL, mathematics, quantum computing & philosophy.