Sunday, December 25, 2022

optimization 12 - adaptable methods of neural network training

 In the last post I've been wondering, why aren't the momentum descent methods widely used for the neural network training. And guess what, a quick search has showed that they are, as well as the auto-scaling methods.

In particular, here is an interesting paper:

https://arxiv.org/pdf/1412.6980.pdf

Adam: A Method for Stochastic Optimization

What they do looks actually more like a momentum method than selecting the rate of descent. This paper has references to a few other methods:

  • AdaGrad
  • RMSProp
  • vSGD
  • AdaDelta

 And I've found this algorithm through another web  site that talks about various ML methods:

https://machinelearningmastery.com/adam-optimization-from-scratch/

No comments:

Post a Comment