Bounds for Lanczos-FA on linear systems

Tyler Chen

Introduction

Krylov subspace methosd are among the most widely used class of algorithms for solving Ax=b\mathbf{A}\mathbf{x} = \mathbf{b} when A\mathbf{A} is a very large-sparse matrix. These methods work by constructing the Krylov subspace Kk(A,b):=span{b,Ab,,Ak1b}. \mathcal{K}_{k}(\mathbf{A},\mathbf{b}) := \operatorname{span}\{\mathbf{b}, \mathbf{A}\mathbf{b}, \ldots, \mathbf{A}^{k-1}\mathbf{b} \}. This page focuses on the Lanczos method for matrix function approximation (Lanczos-FA) used to approximate A1b\mathbf{A}^{-1}\mathbf{b}. The Lanczos-FA iterate is defined as lank(1/x):=b2Qf(T)e1, \mathsf{lan}_{k}(1/x) := \| \mathbf{b} \|_2 \mathbf{Q} f(\mathbf{T}) \mathbf{e}_1, where Q=[q1,,qk]\mathbf{Q} = [\mathbf{q}_1, \ldots, \mathbf{q}_k] is an orthonormal basis for Kk(A,b)\mathcal{K}_k(\mathbf{A},\mathbf{b}) such that span{q1,,qj}=Kj(A,b)\operatorname{span}\{\mathbf{q}_1, \ldots, \mathbf{q}_j\} = \mathcal{K}_{j}(\mathbf{A},\mathbf{b}) for all jkj\leq k, and T:=QTAQ\mathbf{T}:=\mathbf{Q}^\mathsf{T}\mathbf{A} \mathbf{Q}.

Positive definite systems

In the case that A\mathbf{A} is positive definite, then the Lanczos-FA iterate is equivalent to the well-known conjugate gradient algorithm. The simplest proof of this amounts to showing the Lanczos-FA iterate satisfies the same optimality property as CG.

Theorem. If A\mathbf{A} is positive definite, then A1blank(1/x)A=minxKk(A,b)A1bxA. \| \mathbf{A}^{-1} \mathbf{b} - \mathsf{lan}_{k}(1/x) \|_{\mathbf{A}} = \min_{\mathbf{x}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{A}^{-1} \mathbf{b} - \mathbf{x} \|_{\mathbf{A}}.

Proof. An arbitrary vector in Kk(A,b)\mathcal{K}_k(\mathbf{A},\mathbf{b}) can be written Qc\mathbf{Q}\mathbf{c} for some xRk\mathbf{x} \in \mathbb{R}^k. Thus, minxKk(A,b)A1bxA=mincRkA1bQcA.=mincRkA1/2(A1bQc)2.=mincRkA1/2bA1/2Qc2. \begin{align*} \min_{\mathbf{x}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{A}^{-1} \mathbf{b} - \mathbf{x} \|_{\mathbf{A}} &= \min_{\mathbf{c}\in\mathbb{R}^k} \| \mathbf{A}^{-1} \mathbf{b} - \mathbf{Q}\mathbf{c} \|_{\mathbf{A}}. \\&= \min_{\mathbf{c}\in\mathbb{R}^k} \| \mathbf{A}^{1/2} (\mathbf{A}^{-1} \mathbf{b} - \mathbf{Q}\mathbf{c} ) \|_{2}. \\&= \min_{\mathbf{c}\in\mathbb{R}^k} \| \mathbf{A}^{-1/2} \mathbf{b} - \mathbf{A}^{1/2} \mathbf{Q}\mathbf{c} \|_{2}. \end{align*} Writing the solution to the normal equations, we find that the above equations are minimized for c=((A1/2Q)T(A1/2Q))1(A1/2Q)T(A1/2b)=(QTAQ)1QTb=b2T1e1. \mathbf{c} = ((\mathbf{A}^{1/2}\mathbf{Q} )^\mathsf{T}(\mathbf{A}^{1/2} \mathbf{Q}) )^{-1} (\mathbf{A}^{1/2} \mathbf{Q})^\mathsf{T}(\mathbf{A}^{-1/2} \mathbf{b}) = (\mathbf{Q}^\mathsf{T}\mathbf{A} \mathbf{Q})^{-1} \mathbf{Q}^\mathsf{T}\mathbf{b} = \| \mathbf{b} \|_2 \mathbf{T}^{-1} \mathbf{e}_1. Thus, we have solution x=Qc=b2QT1e1=lank(1/x). \mathbf{x} = \mathbf{Q} \mathbf{c} = \| \mathbf{b}\|_2 \mathbf{Q} \mathbf{T}^{-1} \mathbf{e}_1 = \mathsf{lan}_{k}(1/x). This proves the theorem.   ~~~\square

This optimality property allows us to derive a number of prior bounds for the convergence of Lanczos-FA (and equivalently CG).

Indefinite systems

If A\mathbf{A} is not positive definite, T\mathbf{T} may have an eigenvalue at or near to zero and the error of the Lanczos-FA approximation to A1b\mathbf{A}^{-1}\mathbf{b} can be arbitrarily large.

The MINRES iterates are defined as y^k:=argminyKk(A,b)bAy2=argminyKk(A,b)A1byA2. \hat{\mathbf{y}}_k := \operatornamewithlimits{argmin}_{\mathbf{y}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{b} - \mathbf{A} \mathbf{y} \|_{2} = \operatornamewithlimits{argmin}_{\mathbf{y}\in\mathcal{K}_k(\mathbf{A},\mathbf{b})} \| \mathbf{A}^{-1}\mathbf{b} - \mathbf{y} \|_{\mathbf{A}^2}. Define the residual vectors rk:=bAlank(1/x),rkM:=bAy^k, \mathbf{r}_k := \mathbf{b} - \mathbf{A} \mathsf{lan}_{k}(1/x) ,\qquad \mathbf{r}_k^{\mathrm{M}} := \mathbf{b} - \mathbf{A} \hat{\mathbf{y}}_k, and note that the MINRES residual norms are non-increasing due to the optimality of the MINRES iterates. In [1], it is shown that the CG residual norms are near the MINRES residual norms at iterations where MINRES makes good progress. More precisely, the algorithms are related by rk2=rkM21(rkM2/rk1M2)2. \| \mathbf{r}_k \|_2 = \frac{\| \mathbf{r}_{k}^{\mathrm{M}} \|_2}{\sqrt{1- \left( \| \mathbf{r}_{k}^{\mathrm{M}} \|_2 / \| \mathbf{r}_{k-1}^{\mathrm{M}} \|_2 \right)^2}}. This bound says that when MINRES makes progress (rkM2rk1M2\| \mathbf{r}_{k}^{\mathrm{M}} \|_2 \ll \| \mathbf{r}_{k-1}^{\mathrm{M}} \|_2) the CG residual is verys similar, but when MINRES stagnates the CG residual spikes. However, it does not clear gurantee that the convergence of CG is similar to that of MINRES.

The following theorem from [2] does so. To the best of our knowledge, the result is new.

Theorem. For every k1k\geq 1, min0jkrjF2k+1rkG2. \min_{0\leq j\leq k} \|\mathbf{r}_{j}^{\mathrm{F}}\|_2 \leq \sqrt{k+1} \cdot \|\mathbf{r}_k^{\mathrm{G}}\|_2.

It also turns out the bound is sharp. Theorem. For every k1k\geq 1 and ε>0\varepsilon > 0, there exists a matrix A\mathbf{A} and vector b\mathbf{b} for which minjkrjF2(k+1ε)rkG2. \min_{j\leq k} \|\mathbf{r}_j^{\mathrm{F}} \|_2 \geq \left( \sqrt{k+1} - \varepsilon \right) \cdot \|\mathbf{r}_k^{\mathrm{G}}\|_2.

References

1. Cullum, J.; Greenbaum, A. Relations Between Galerkin and Norm-Minimizing Iterative Methods for Solving Linear Systems. SIAM Journal on Matrix Analysis and Applications 1996, 17, 223–247, doi:10.1137/S0895479893246765.

2. Chen, T.; Meurant, G. Near-Optimal Convergence of the Full Orthogonalization Method 2024.