Review: STN — Spatial Transformer Network (Image Classification)

With STN, Spatially Transformed Data within Network, Learn Invariance to Translation, Scale, Rotation and More Generic Warping.

In this story, Spatial Transformer Network (STN), by Google DeepMind, is briefly reviewed. STN helps to crop out and scale-normalizes the appropriate region, which can simplify the subsequent classification task and lead to better classification performance as below:

(a) Input Image with Random Translation, Scale, Rotation, and Clutter, (b) STN Applied to Input Image, (c) Output of STN, (d) Classification Prediction

It is published in 2015 NIPS with more than 1300 citations. Spatial transformation such as affine transformation and homography registration has been studied for decades. But in this paper, spatial transformation is coped with neural network. With learning-based spatial transformation, transformation is applied conditioned on input or feature map. And it is highly related to another paper called “Deformable Convolutional Networks” (2017 ICCV). Thus, I decided to read this first. (Sik-Ho Tsang @ Medium)