In the last post I've been wondering, why aren't the momentum descent methods widely used for the neural network training. And guess what, a quick search has showed that they are, as well as the auto-scaling methods.
In particular, here is an interesting paper:
https://arxiv.org/pdf/1412.6980.pdf
Adam: A Method for Stochastic Optimization
What they do looks actually more like a momentum method than selecting the rate of descent. This paper has references to a few other methods:
- AdaGrad
- RMSProp
- vSGD
- AdaDelta
And I've found this algorithm through another web site that talks about various ML methods:
https://machinelearningmastery.com/adam-optimization-from-scratch/
No comments:
Post a Comment