Series: Optimization Intro to Optimization in Deep Learning: Busting the Myth About Batch Normalization Batch Normalisation does NOT reduce internal covariate shift. This posts looks into why internal covariate shift is a problem and how batch normalisation is used to address it.
Series: Optimization Intro to Optimization in Deep Learning: Vanishing Gradients and Choosing the Right Activation Function An look into how various activation functions like ReLU, PReLU, RReLU and ELU are used to address the vanishing gradient problem, and how to chose one amongst them for your network.
Series: Optimization Intro to optimization in deep learning: Momentum, RMSProp and Adam In this post, we take a look at a problem that plagues training of neural networks, pathological curvature.
Series: Optimization Intro to optimization in deep learning: Gradient Descent An in-depth explanation of Gradient Descent, and how to avoid the problems of local minima and saddle points.