Simple Guide to Hyperparameter Tuning in Neural Networks

This repository contains Jupyter notebook content associated with my series on fully connected neural networks.

All related code can now be found in my GitHub repository:

You can access the previous articles below. The first provides a simple introduction to the topic of neural networks, to those who are unfamiliar. The second article covers more intermediary topics such as activation functions, neural architecture, and loss functions.

This is the fourth article in my series on fully connected (vanilla) neural networks. In this article, we will be optimizing a neural network and performing hyperparameter tuning in order to obtain a high-performing model on the Beale function — one of many test functions commonly used for studying the effectiveness of various optimization techniques. This analysis can be reused for any function, but I recommend trying this out yourself on another common test function to test your skills. Personally, I find that optimizing a neural network can be incredibly frustrating (although not as bad as a GAN, if you’re familiar with those..) unless you have a clear and well-defined procedure to follow. I hope you enjoy this article and find it insightful.

For those of who reading that are not familiar with the Jupyter notebook, feel free to read more about it here .

By learning how to approach a difficult optimization function, the reader should be more prepared to deal with real-life scenarios for implementing neural networks.

The remainder of this article will follow the Jupyter notebook tutorial on my GitHub repository. We will discuss the way in which one would tackle this kind of artificial landscape. This landscape is analogous to the loss surface of a neural network. When training a neural network, the goal is to find the global minimum on the loss surface by performing some form of optimization — typically stochastic gradient descent.

This function does not look particularly terrifying, right? The reason this is a test function is that it assesses how well the optimization algorithms perform when in flat regions with very shallow gradients. In these cases, it is particularly difficult for gradient-based optimization procedures to reach any minimum, as they are unable to learn effectively.

From just scrolling down the Wikipedia article on optimization test functions, you can see that some of the functions are pretty nasty. Many of them have been chosen as they highlight specific issues that can plague optimization algorithms. For this article, we will be looking at a relatively innocuous-looking function called the Beale function.

When applied mathematicians develop a new optimization algorithm, one thing they like to do is test it on a test function, which is sometimes called an artificial landscape. These artificial landscapes help us find a way of comparing the performance of various algorithms in terms of their:

Neural networks are fairly commonplace now in industry and research, but an embarrassingly large proportion of them are unable to work with them well enough to be able to produce high-performing networks that are capable of outperforming most other algorithms.

Optimization in Neural Networks

A Keras Refresher

Callbacks: taking a peek into our model while it’s training

Step 1 — Deciding on the network topology (not really considered optimization but is very important)

Preprocessing the data

Step 2 — Adjusting the learning rate

Loss as a function of epochs.

Apply a custom learning rate change using LearningRateScheduler

Step 3 — Choosing an optimizer and a loss function