Mục Lục

What is the VGG neural network?

Introduction to VGGNet

Photo by Dhruv Deshmukh on Unsplash

VGGNet is invented by Visual Geometry Group (by Oxford University). This architecture is the 1st runner up of ILSVR2014 in the classification task while the winner is GoogLeNet. The reason to understand VGGNet is that many modern image classification models are built on top of this architecture. Just like word2vec in NLP field.

This story will discuss Very Deep Convolutional Networks for Large-scale Image Recognition (Simonyan et al., 2014) and the following are will be covered:

Data
Architecture
Experiment

Data

An image can be any size while Simonyan et al. standardize the input to a fixed size 224×224 RGB image. Rescaled training images are cropped randomly. Processing step, subtracting the mean RGB value is the only step in the data processing stage. To increase data volume for model training, data augmentation is applied. The cropped images will flipping horizontally and shifting RGB randomly as well.

Architecture

To reduce the number of parameters, authors propose to use a small respective field to replace large one. Authors conclude:

Incorporate multiple non-linear rectification layers instead of a single rectification layer are more discriminative.
It helps to decrease the number of parameters while keeping performance. For example, using 2 layers of 3×3 filter is equal to 1 layer of 5×5 filter but using fewer parameters. The number of a parameter is reduced by 28% ((25–18)/25). For

Number of Parameters of 2 Layers of 3×3 Filter: 2x3x3 = 18

Number of Parameters of 1 Layer of 5×5 Filter: 1x5x5 = 25

If you want to further understand how stacking small layers perform better than a single large layer, you may check out this story.

Simonyan et al. initialized 6 different ConvNet to see the performance of stacking layers. The difference is the number stacking layer within the same blocks. For example, VGG-11 (i.e Config A) uses 2 Conv3–256 layers while VGG-19 (i.e. Config E) uses 4 Conv3–256 layers in the third layer of blocks.

Experiment Architecture of VGGNet (Simonyan et al., 2014)

Experiment

To handle different scenarios, authors setup 3 experiments which are single scale, multi-scale and multi-crop evaluation.

Single Scale Evaluation

Intuitively, more layer is better. However, the authors found that VGG-16 is better than VGG-19. By comparing among those configurations, VGG-19 (Config E) got the lowest error rate while a number of parameters increased 3.6% only.

Single Scale Evaluation Result

Multi-Scale Evaluation

Besides using a single scale image for evaluation. Authors introduce multi-scale evaluation. It means that it accepting the different scale of images.

Multi-Scale Evaluation Result

Multi-Crop Evaluation

Meanwhile, the authors tried dense and multi-crop methods to evaluate the model.

Multi-Crop Evaluation Result

Take Away

It was a great breakthrough in 2013 in image classification. The first time that deep learning achieving error rate under 10%.
You may find architecture skeleton in many other modern image classification models.

About Me

I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related. You can reach me from Medium Blog, LinkedIn or Github.