hyperparameter tuning in neural networks

I was trying to fine tune a neural network model for a multilabel classification problem. I was reading Jason Brownlee ‘s article for the same. As per the article, there are a number of parameters to optimize which are:

batch size and training epochs
optimization algorithm
learning rate and momentum
network weight initialization
activation function in the hidden layer
dropout regularization
the number of neurons in the hidden layer

The code snippet is as below.

model = KerasClassifier(build_fn=create_model, verbose=1)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
neurons = [1, 5, 10, 15, 20, 25, 30]

param_grid = dict(neurons=neurons, batch_size=batch_size, epochs=epochs, learn_rate=learn_rate, 
                 momentum=momentum, dropout_rate=dropout_rate, weight_constraint=weight_constraint)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train, validation_split=0.2)

Along with this, the number of hidden layers in the network is also another parameter.

I was doing hold out partitioning of the data and grid search for fine tuning. But it is taking huge time for computation even in a GPU machine.

Here I specified all these parameters in the same grid. I was wondering can we simplify this probably by finding the each parameter separately? For example, finding the optimal number of neurons first then, finding the batch size, etc.. What other approaches could be followed to reduce the search time?

I was also reading Bengio’s paper Practical Recommendations for Gradient-Based Training of Deep Architectures but could not get much.