How To Optimise A Neural Network?

When we are solving an industry problem involving neural networks, very often we end up with bad performance. Here are some suggestions on what should be done in order to improve the performance.

Is your model underfitting or overfitting?

You must break down the input data set into two parts – training and test. The general practice is to have 80% for training and 20% for testing.

You should train your neural network with the training set and test with the testing set. This sounds like common sense but we often skip it.

Compare the performance (MSE in case of regression and accuracy/f1/recall/precision in case of classification) of your model with the training set and with the test set.

If it is performing badly for both test and training it is underfitting and if it is performing great for the training set but not test set, it is overfitting.

In case of Underfitting

If the performance over test set is continuously improving over the iterations or epochs, it means you need to increase the iterations/epochs. If it is taking too much time, you may want to use GPUs. You can also try adding an optimizer such as Adam instead of only plain Gradient Descent.

If the performance isn’t improving, it means you have a true case of underfitting. In such cases, There are three possibilities:

Insufficient data
No correlation in data – random data
You need a better model

If the data is insufficient, you can do the following:

You can generate more data. This is called data augmenting. For example, you could take more pictures from different angles, You could reshape them a bit, put more colour filters, remove some pixels from border etc.
You can download similar data from the internet. Say you want to build a neural network to recognize the faces in your office. You can download more picture of faces from across the globe and first train the model on those faces and then train the model using the faces from your office.
You can download a pre-trained neural network and add a layer on top of it and further train it using your data.

If there is no correlation in data, you can’t do much. You can just recheck the labels. A common error is label mismatch. Imagine that there are two files one containing the features and other containing the label and those in different orders or we skipped just one line in either causing the label mismatch. So, recheck if the labels are in the same order as the features. Also, check with the data gathering team if there is something wrong with data.

The last case where you need to improve upon the model is the hardest. In case of neural networks, you can do the following:

Add more layers
Add more neurons to full connected / dense layers but prefer adding more neurons to increasing neurons
Add more filters
Experiment with different strides
Add RELU if you aren’t using it already
If you have the diminishing or exploding gradients problem,
- use batch normalization.
- Try initializing the weights using the xavier_initializer or other heuristics
- Also, try gradient clipping
Normalize the features either using the min-max scaling or standardization
Try normalizing the labels too. Though it is not recommended first.

In case of overfitting

If you notice that your model is overfitting you should do the regularization and also make sure that you are shuffling the training set at every iteration such that every batch is different every time.

For regularization, you can use L1 or L2 normalization or dropout layer.

These are my quick notes. Feel free to let us know you observe any errors in this post .

If you liked it share it with your friends.