TTIC 31230: Fundamentals of Deep Learning

Stochastic Gradient Descent (SGD)

Blog post on SGD variants

Training Resnt-50 on Imagenet in one hour

Some theory on scaling learning rate with batch size

Adding Gradient Noise

Temperature Cycling in SGD

MCMC with momentum