The Calculus of Optimization

Understanding Gradient Descent and Modern Optimizers

$$ G_{t} = G_{t-1} + (\nabla J(\theta_t))^2 $$ $$ \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{G_t + \epsilon}} \odot \nabla J(\theta_t) $$