Transfer Learning with Keras application Inception-ResNetV2

Mahdi Amrollahi
5 min readApr 25, 2021

The most simple way to improve the performance of deep neural networks is by increasing their size of the network. It includes both in depth of the network and the number of neurons at each level. However, this simple solution comes with two main drawbacks.

First, a bigger size typically means massive number of parameters, which makes the network more prone to overfitting, especially when the training set is limited. Another drawback of increasing network size is the increasing amount of computational resources.

Deep convolutional networks have become the most important way in image recognition in recent years. Inception architecture has shown that it can achieve good performance at a relatively low cost. However, the introduction of residual connections combined with traditional architectures has turned out a good performance in the ILSVRC competition.

So, a new state-of-the-art model comes with the combination of residual networks and Inception. Residual Networks allow for the training of deeper networks. ResNet was proposed by Microsoft Research Teams and introduced a new architecture called Residual Network.

There is a problem in deep learning called the Vanishing/Exploding gradient. This makes the gradient too small or too large. Thus, when the number of layers increases, the error rate for both training and testing will also increase.

In order to solve the issue of vanishing/exploding gradient, residual networks use a technique called “skip connections”. In summary, the skip connection skips training from some layers and connects directly to the output.

The approach behind this network is that instead of layers learn the mapping, the network is allowed to fit the residual mapping. The advantage of adding this type of skip connection is because if any layer hurts the performance of architecture, then it will be skipped by regularization.

In summary, training with residual networks can help to speed up the training of the Inception model. In the residual version on Inception, blocks are almost lighter than the original Inception architecture. The computation cost of Inception-ResNet-v1 is the same as Inception-v3. However, the cost for Inception-ResNet-v2 is roughly near Inception-v4.

Another important difference between residual and traditional Inception model is that in a newly residual network, the BatchNomalization has been used only on top of the traditional layer.

Inception-ResNet Block

Dataset:

For training our model, we have chosen “Scene Classification” dataset that includes a wide range of natural scenes. It contains about 25 thousand images each by 150*150 pixels in 3 channels.

The images come in 6 classes which are tagged with Buildings, Forests, Mountains, Glacier, Street, and Sea. The dataset has split the training and testing images, so all our training images come with tags; however, there is no tag for the rest of the images.

Experiment:

In the first step, we need to access our images. So, there is a CSV file that specifies the training images out of the whole dataset. Then, we need to split the training data into training and validating data. At this phase, we have the training and validating data; however, according to the small number of images, we need to do some augmentations on images, including flipping, zooming, shifting, and rotating images. Then we can use `ImageDataGenerator` to make a generator out of all real images and augmented images.

In transfer learning, there are two phases that we need to satisfy:

First, we need to extract features from the input data. At this level, we utilize the pre-defined model to get the pre-trained weights on ImageNet. The key here is that at the top of the pre-trained model, there is a softmax layer that is for classifying that model. However, we need to remove this layer to have our own classifier.

Second, we need to fine-tune the model based on setting some layers(or blocks) of the pre-trained model to trainable neurons, which means the weights can be changed according to backpropagation operation.

To use the base model, we needed to use tf.keras.applications which contains many successful models. There are some input parameters about the implemented model that we need to describe:

Weights: As the trained model, we can use the pre-trained weights based on the available datasets. We have chosen “imagenet” weights to use in our model.

include_top: The last layer of the base model includes the classification softmax layer. However, if we want to have other layers on top of it, we need to set include_top = False to remove the classification layer.

input_shape: We need to specify the shape of the input in our model. According to our dataset, each picture comes in 150*150 pixels with three channels.

We consider this model in four phases. In the first step, we only removed the last layer of the Inception-ResNet model and substituted it with our Dense(6) so, it means that we no hidden layer. At each stage, we tried to remove the last one, three, and five blocks. So, based on how many layers we have and the size of the hidden layer, and also removed blocks of the model, we will have different trainable parameters.

https://www.kaggle.com/nitishabharathi/scene-classification

--

--