Krylov subspace methods are general and widely-used iterative methods for solving large, sparse linear systems Ax = b. Computationally, the vector updates in each iteration of a Krylov subspace method require one or more matrix-vector products and inner products. These computations both require expensive interprocessor communication. The computation of inner products in particular requires a global synchronization, which can be especially costly on large-scale machines.
Dependencies between vector recurrences whose updates involve communication-bound operations is thus the fundamental obstacle to efficient parallel implementation of Krylov subspace methods. There are many approaches to overcoming this obstacle described in the literature, which typically involve modifying the Krylov subspace method in a way that reduces the number of synchronization points and increases parallelism within individual iterations. While such modifications can improve per-iteration performance, they do come with a cost. It is well-known that finite precision computation can cause delayed convergence and decreased attainable accuracy in standard Krylov subspace methods, and even slight modifications to the standard recurrences can amplify these effects.
Focusing on the conjugate gradient method, we discuss various synchronization-reducing modifications of CG, including the popular pipelined and s-step variants. We present theoretical and experimental results showing the effect such modifications have on finite precision behavior. In particular, we show that for both pipelined and s-step variants, the size of the gap in residuals may be amplified compared with the standard CG method. There are thus complicated tradeoffs between the speed of each iteration, the convergence rate, and the attainable accuracy in synchronization-reducing variants, and the optimal point in this tradeoff space is highly application dependent. We therefore emphasize in the talk that the optimization of Krylov subspace methods requires a holistic approach with respect to a particular problem, application domain, and machine architecture.
Courant Instructor/Assistant Professor
Courant Institute of Mathematical Sciences
New York University