http://arxiv.org/abs/1910.10094
The Proximal Gradient Method (PGM) is a robust and efficient way to minimize the sum of a smooth convex function $f$ and a non-differentiable convex function $r$. It determines the sizes of gradient steps according to the Lipschitz constant of the gradient of $f$. For many problems in data analysis, the Lipschitz constants are expensive or impossible to compute analytically because they depend on details of the experimental setup and the noise properties of the data. Adaptive optimization methods like AdaGrad choose step sizes according to on-the-fly estimates of the Hessian of $f$. As quasi-Newton methods, they generally outperform first-order gradient methods like PGM and adjust step sizes iteratively and with low computational cost. We propose an iterative proximal quasi-Newton algorithm, AdaProx, that utilizes the adaptive schemes of Adam and its variants (AMSGrad, AdamX, PAdam) and works for arbitrary proxable penalty functions $r$. In test cases for Constrained Matrix Factorization we demonstrate the advantages of AdaProx in fidelity and performance over PGM, especially when factorization components are poorly balanced. The python implementation of the algorithm presented here is available as an open-source package at https://github.com/pmelchior/proxmin
P. Melchior, R. Joseph and F. Moolekamp
Wed, 23 Oct 19
53/64
Comments: 12 pages, 5 figures; submitted to Optimization & Engineering
You must be logged in to post a comment.