Convolutional Neural Network


Convolutional Neural Networks (CNN) are mostly used for images and videos. These tend to perform better than the feed-forward network as the image is nothing but matrices of different values that represent different values that range from 0–255. For e.g.: A black and white image of dimension 100×100 would have around 10000 values in it when flattened. Similarly, an HD image of resolution 1920x108x3 would generate around 6 million values. These 6million values belong to a single image and a bunch of these are would be required to train the machine and model on would round up to a very large amount that would be computationally heavy for the machine. So, this is the major reason we opt for CNN over a feed-forward network.

  1. Pooling
  2. 1×1 Convolution Layer
  3. Fully Connected Layer


In the convolution; we actually perform a correlation operation as in convolution we flip the kernel but instead, it’s called a convolution. It requires 3 important things in it and they are as:

  • Stride(s)
  • Filter/Kernel(f)
  • Same


So far, we have been performing convolution operations on a single matrix or a single channel image which is of a greyscale image. As explained before, the color images generally consist of 3 channels of red, green, and blue RGB. Here the matrix consists of the 3 layers and the filters to which they are multiplied are of 3 layers too. The values obtained from all three layers are added together to form a single-layer matrix. This is also known as the convolution on volume.


Pooling is another concept that is performed and the reason behind carrying out this operation to select those particular features that important and also it reduces the dimension of the matrix making it easier for computation. Here there exist three parameters that are:

  • Stride
  • Types

[(n +2p -f)/s + 1] x [(n +2p -f)/s + 1] x (no. of filters)



Fully connected layer


Forward Propagation: To train the model both the forward and backward propagation is carried on alternatively. Now we will be mathematically understanding the functioning of the CNN and how both forward propagation and backward propagation take place. Generally, in this neural network, the trainable parameters are the weights of the filter that are multiplied during the convolution and the weights assigned in the fully connected layer. Here we do not consider the weights of max pooling as trainable parameters.

Maxpool in forward propagation
Maxpool in backward propagation
  1. AlexNet (2012)
  2. VGG-16
  3. VGG-19
  4. ResNet50 (2015)
  5. Inception — V4 (2016)
Inception’s Architecture
VGG-16 Architecture
ResNet50 Architecture

CNN Architecture from Scratch

Currently, we discussed in depth how CNN functions and all the mathematics that goes behind it. Now our aim is to build a CNN right from scratch without any deep learning framework.


A passionate and inquisitive learner, member of Data Science Society of IMI Delhi. A strong passion towards ML, mathematics, quantum computing & philosophy.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store