Bounds for Lanczos-FA on linear systems

Introduction

Krylov subspace methosd are among the most widely used class of algorithms for solving $\mathbf{A}\mathbf{x} = \mathbf{b}$ when $\mathbf{A}$ is a very large-sparse matrix. These methods work by constructing the Krylov subspace $\mathcal{K}_{k}(\mathbf{A},\mathbf{b}) := \operatorname{span}\{\mathbf{b}, \mathbf{A}\mathbf{b}, \ldots, \mathbf{A}^{k-1}\mathbf{b} \}.$ This page focuses on the Lanczos method for matrix function approximation (Lanczos-FA) used to approximate $\mathbf{A}^{-1}\mathbf{b}$ . The Lanczos-FA iterate is defined as $\mathsf{lan}_{k}(1/x) := \| \mathbf{b} \|_2 \mathbf{Q} f(\mathbf{T}) \mathbf{e}_1,$ where $\mathbf{Q} = [\mathbf{q}_1, \ldots, \mathbf{q}_k]$ is an orthonormal basis for $\mathcal{K}_k(\mathbf{A},\mathbf{b})$ such that $\operatorname{span}\{\mathbf{q}_1, \ldots, \mathbf{q}_j\} = \mathcal{K}_{j}(\mathbf{A},\mathbf{b})$ for all $j\leq k$ , and $\mathbf{T}:=\mathbf{Q}^\mathsf{T}\mathbf{A} \mathbf{Q}$ .

Positive definite systems

In the case that $\mathbf{A}$ is positive definite, then the Lanczos-FA iterate is equivalent to the well-known conjugate gradient algorithm. The simplest proof of this amounts to showing the Lanczos-FA iterate satisfies the same optimality property as CG.

Theorem. If $\mathbf{A}$ is positive definite, then $\| \mathbf{A}^{-1} \mathbf{b} - \mathsf{lan}_{k}(1/x) \|_{\mathbf{A}} = \min_{\mathbf{x}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{A}^{-1} \mathbf{b} - \mathbf{x} \|_{\mathbf{A}}.$

Proof. An arbitrary vector in $\mathcal{K}_k(\mathbf{A},\mathbf{b})$ can be written $\mathbf{Q}\mathbf{c}$ for some $\mathbf{x} \in \mathbb{R}^k$ . Thus, $\begin{align*} \min_{\mathbf{x}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{A}^{-1} \mathbf{b} - \mathbf{x} \|_{\mathbf{A}} &= \min_{\mathbf{c}\in\mathbb{R}^k} \| \mathbf{A}^{-1} \mathbf{b} - \mathbf{Q}\mathbf{c} \|_{\mathbf{A}}. \\&= \min_{\mathbf{c}\in\mathbb{R}^k} \| \mathbf{A}^{1/2} (\mathbf{A}^{-1} \mathbf{b} - \mathbf{Q}\mathbf{c} ) \|_{2}. \\&= \min_{\mathbf{c}\in\mathbb{R}^k} \| \mathbf{A}^{-1/2} \mathbf{b} - \mathbf{A}^{1/2} \mathbf{Q}\mathbf{c} \|_{2}. \end{align*}$ Writing the solution to the normal equations, we find that the above equations are minimized for $\mathbf{c} = ((\mathbf{A}^{1/2}\mathbf{Q} )^\mathsf{T}(\mathbf{A}^{1/2} \mathbf{Q}) )^{-1} (\mathbf{A}^{1/2} \mathbf{Q})^\mathsf{T}(\mathbf{A}^{-1/2} \mathbf{b}) = (\mathbf{Q}^\mathsf{T}\mathbf{A} \mathbf{Q})^{-1} \mathbf{Q}^\mathsf{T}\mathbf{b} = \| \mathbf{b} \|_2 \mathbf{T}^{-1} \mathbf{e}_1.$ Thus, we have solution $\mathbf{x} = \mathbf{Q} \mathbf{c} = \| \mathbf{b}\|_2 \mathbf{Q} \mathbf{T}^{-1} \mathbf{e}_1 = \mathsf{lan}_{k}(1/x).$ This proves the theorem. $~~~\square$

This optimality property allows us to derive a number of prior bounds for the convergence of Lanczos-FA (and equivalently CG).

Indefinite systems

If $\mathbf{A}$ is not positive definite, $\mathbf{T}$ may have an eigenvalue at or near to zero and the error of the Lanczos-FA approximation to $\mathbf{A}^{-1}\mathbf{b}$ can be arbitrarily large.

The MINRES iterates are defined as $\hat{\mathbf{y}}_k := \operatornamewithlimits{argmin}_{\mathbf{y}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{b} - \mathbf{A} \mathbf{y} \|_{2} = \operatornamewithlimits{argmin}_{\mathbf{y}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{A}^{-1}\mathbf{b} - \mathbf{y} \|_{\mathbf{A}^2}.$ Define the residual vectors $\mathbf{r}_k := \mathbf{b} - \mathbf{A} \mathsf{lan}_{k}(1/x) ,\qquad \mathbf{r}_k^{\mathrm{M}} := \mathbf{b} - \mathbf{A} \hat{\mathbf{y}}_k,$ and note that the MINRES residual norms are non-increasing due to the optimality of the MINRES iterates. In [1], it is shown that the CG residual norms are near the MINRES residual norms at iterations where MINRES makes good progress. More precisely, the algorithms are related by $\| \mathbf{r}_k \|_2 = \frac{\| \mathbf{r}_{k}^{\mathrm{M}} \|_2}{\sqrt{1- \left( \| \mathbf{r}_{k}^{\mathrm{M}} \|_2 / \| \mathbf{r}_{k-1}^{\mathrm{M}} \|_2 \right)^2}}.$ This bound says that when MINRES makes progress ( $\| \mathbf{r}_{k}^{\mathrm{M}} \|_2 \ll \| \mathbf{r}_{k-1}^{\mathrm{M}} \|_2$ ) the CG residual is verys similar, but when MINRES stagnates the CG residual spikes. However, it does not clear gurantee that the convergence of CG is similar to that of MINRES.

The following theorem from [2] does so. To the best of our knowledge, the result is new.

Theorem. For every $k\geq 1$ , $\min_{0\leq j\leq k} \|\mathbf{r}_{j}^{\mathrm{F}}\|_2 \leq \sqrt{k+1} \cdot \|\mathbf{r}_k^{\mathrm{G}}\|_2.$

It also turns out the bound is sharp. Theorem. For every $k\geq 1$ and $\varepsilon > 0$ , there exists a matrix $\mathbf{A}$ and vector $\mathbf{b}$ for which $\min_{j\leq k} \|\mathbf{r}_j^{\mathrm{F}} \|_2 \geq \left( \sqrt{k+1} - \varepsilon \right) \cdot \|\mathbf{r}_k^{\mathrm{G}}\|_2.$

References

1. Cullum, J.; Greenbaum, A. Relations Between Galerkin and Norm-Minimizing Iterative Methods for Solving Linear Systems. SIAM Journal on Matrix Analysis and Applications 1996, 17, 223–247, doi:10.1137/S0895479893246765.

2. Chen, T.; Meurant, G. Near-Optimal Convergence of the Full Orthogonalization Method 2024.