SC²S Colloquium - May 10, 2012
|Date:||Mai 10, 2012|
|Time:||3 pm, s.t.|
Robert Seidl: Preconditioning for Hessian-free optimization
Recently Martens adapted the Hessian-free optimization method for the training of deep neural networks. One key aspect of this approach is that the Hessian is never computed explicitly, instead the Conjugate Gradient(CG) Algorithm is used to compute the new search direction by applying only matrix-vector products of the Hessian with arbitrary vectors. This can be done efficiently using a variant of the backpropagation algorithm. Recent algorithms use diagonal preconditioners to reduce the needed iterations of the CG algorithm. They are used because of their easy calculation and application. Unfortunately in later stages of the optimization these diagonal preconditioners are not as well suited for the inner iteration as they are for the optimization in the earlier stages. This is mostly due to an increased number of elements of the dense Hessian having the same order of magnitude near an optimum.
We construct a sparse approximate inverse preconditioner (SPAI) that is used to accelerate the inner iteration especially in the later stages of the optimization. The quality of our preconditioner depends on a predefined sparsity pattern. We apply the knowledge of the pattern of the Gauss-Newton approximation of the Hessian to efficiently construct the needed pattern for our preconditioner which can then be computed efficiently fully in parallel using GPUs. This preconditioner is then applied to a deep auto-encoder test case using different update strategies.