Error bounds for Lanczos-based matrix function approximation

Tyler Chen

This is a companion piece to https://doi.org/10.1137/21M1427784

Code to replicate the figures in the corresponding paper can be found on Github.

Introduction

The Lanczos method for matrix function approximation (Lanczos-FA) can be used to approximate f(A)bf(\mathbf{A})\mathbf{b} (read more about this task here), and in the case that f(x)=1/xf(x) = 1/x and A\mathbf{A} is positive definite, this approximation is optimal over Krylov subspace. This case is very well studied and a range of error bounds and estimates exist. However, for other functions, the standard bounds are often too pessimistic as they do not take into account fine grained information about the spectrum of A\mathbf{A} such as outlaying or clustered eigenvalues. This makes it difficult to know when Lanczos-FA has reached a suitable accuracy for a given problem.

In this paper we show how to reduce the error of approximating f(A)bf(\mathbf{A})\mathbf{b} with Lanczos-FA to the error of solving a certain linear system with the Lanczos-FA. This allows us to leverage the range of existing bounds for the convergence of Lanczos-FA on linear systems to easily obtain a priori and a posteriori bounds for other matrix functions including piecewise analytic functions such as the sign function. Our a posteriori bounds are highly accurate and can be used as practical stopping criteria.

The basic idea

The kk-th Lanczos-FA approximation to f(A)bf(\mathbf{A}) \mathbf{b} is defined as lank(f):=Qf(T)QTb, \mathsf{lan}_k(f) := \mathbf{Q} f(\mathbf{T}) \mathbf{Q}^{\mathsf{T}} \mathbf{b}, where Q\mathbf{Q} and T\mathbf{T} are produced by the Lanczos method run for kk steps on (A,b)(\mathbf{A},\mathbf{b}).

Suppose A\mathbf{A} is a Hermitian matrix and If ff is analytic in a neighborhood of the eigenvalues of A\mathbf{A} and Γ\Gamma is a contour in this neighborhood containing these eigenvalues, f(A)b=12πiΓf(z)(AzI)1bdz. f(\mathbf{A})\mathbf{b} = - \frac{1}{2 \pi i} \oint_{\Gamma} f(z) (\mathbf{A} - z \mathbf{I} )^{-1} \mathbf{b} \, \mathrm{d}{z}. If the eigenvalues of T\mathbf{T} are also contained in Γ\Gamma, we similarly have Qf(T)QTb=12πiΓf(z)Q(TzI)1QTbdz. \mathbf{Q} f(\mathbf{T}) \mathbf{Q}^\mathsf{T}\mathbf{b} = - \frac{1}{2 \pi i} \oint_{\Gamma} f(z) \mathbf{Q} (\mathbf{T} - z \mathbf{I} )^{-1} \mathbf{Q}^\mathsf{T}\mathbf{b} \, \mathrm{d}{z} . Combining, these, we see that the Lanczos-FA error can be written as f(A)bQf(T)QTb=12πiΓf(z)errk(z)dz. f(\mathbf{A}) \mathbf{b} - \mathbf{Q} f( \mathbf{T} ) \mathbf{Q}^{\mathsf{T}} \mathbf{b} = - \frac{1}{2 \pi i} \oint_{\Gamma} f(z) \, \mathsf{err}_k(z) \, \mathrm{d}{z}.

For zCz \in \mathbb{C}, define the kk-th Lanczos-FA error and residual for the linear system (AzI)x=b(\mathbf{A}-z\mathbf{I}) \mathbf{x} = \mathbf{b} as, errk(z,A,b):=(AzI)1bQ(TzI)1QTb,resk(z,A,b):=b(AzI)Q(TzI)1QTb.\begin{align*} \mathsf{err}_k(z,\mathbf{A},\mathbf{b}) &:= (\mathbf{A} - z \mathbf{I})^{-1} \mathbf{b} - \mathbf{Q}(\mathbf{T}-z\mathbf{I})^{-1}\mathbf{Q}^\mathsf{T}\mathbf{b}%\lan_k ( h_z ) ,\\ \mathsf{res}_k(z,\mathbf{A},\mathbf{b}) &:= \mathbf{b} - (\mathbf{A} - z \mathbf{I}) \mathbf{Q}(\mathbf{T}-z\mathbf{I})^{-1}\mathbf{Q}^\mathsf{T}\mathbf{b}.%\,\lan_k ( h_z ). \end{align*}

It is a well-known fact that resk(z)=((1)kdet(TzI)j=1kβj)b2qk+1. \mathsf{res}_k(z) = \left( \frac{(-1)^{k}}{\det(\mathbf{T} -z \mathbf{I}) }\prod_{j=1}^{k} \beta_j \right) \| \mathbf{b} \|_2\: \mathbf{q}_{k+1}. Consequently, for all z,wCz , w \in \mathbb{C}, where AzI\mathbf{A} - z \mathbf{I} and AwI\mathbf{A} - w \mathbf{I} are both invertible, errk(z)=det(hw,z(T))hw,z(A)errk(w)resk(z)=det(hw,z(T))resk(w),\begin{align*} \mathsf{err}_k(z) &= \det(h_{w,z}(\mathbf{T})) h_{w,z}(\mathbf{A}) \,\mathsf{err}_k(w) \\ \mathsf{res}_k(z) &= \det(h_{w,z}(\mathbf{T})) \,\mathsf{res}_k(w), \end{align*} where hw,z(x)=xwxz. h_{w,z}(x) = \frac{x-w}{x-z}.

Thus, if Γ\Gamma is a simple closed curve or union of simple closed curves inside this neighborhood and enclosing the eigenvalues of A\mathbf{A} and T\mathbf{T} and ww a point not in Λ(T)Λ(A)\Lambda(\mathbf{T})\cup\Lambda(\mathbf{A}), f(A)blank(f)=(12πiΓf(z)det(hw,z(T))hw,z(A)dz)errk(w). f(\mathbf{A}) \mathbf{b} - \mathsf{lan}_k(f) = \left( - \frac{1}{2\pi i} \oint_{\Gamma} f(z) \det(h_{w,z}(\mathbf{T})) h_{w,z}(\mathbf{A}) \mathrm{d}{z} \right) \, \mathsf{err}_k(w).

Applying the triangle inequality and the submultiplicitivity of matrix-norms, we can then obtain a bound
f(A)blank(f)(12πΓf(z)(i=1khw,zSi)hw,zS0dz)integral termerrk(w),linear system error \| f(\mathbf{A})\mathbf{b} - \mathsf{lan}_k(f) \| \leq \underbrace{\vphantom{ \bigg| }\left( \frac{1}{2\pi}\oint_{\Gamma} |f(z)| \left(\prod_{i=1}^{k} \| h_{w,z}\|_{S_i}\right) \|h_{w,z}\|_{S_0} | \mathrm{d}{z} | \right)}_{\text{integral term}} \hspace{-.5 em}\underbrace{\vphantom{ \Bigg| } \| \mathsf{err}_k(w) \| , \hspace{-.4em} }_{\text{linear system error}} \hspace{-.5em} where SiS_i are some suitably chosen sets and gS:=maxxSg(x)\|g\|_S:= \max_{x\in S}|g(x)|.

Note that the integral term and linear system error term in the theorem are entirely decoupled! Thus, once the integral term is computed, bounding the error of Lanczos-FA for f(A)bf(\mathbf{A})\mathbf{b} is reduced to bounding errk(w)\| \mathsf{err}_k(w) \|, and if the integral term can be bounded independently of kk, implies that, up to a constant factor, the Lanczos-FA approximation to f(A)bf(\mathbf{A})\mathbf{b} converges at least as fast as errk(w)\| \mathsf{err}_k(w) \|.

Much of the paper is focused on practical aspects regarding the use of such a bound. In particular:

It’s worth noting that similar ideas have been previously used to get error bounds for Stieltjes functions [13]. As discussed in Section 2.2 of the paper, our bounds differ in a number of key ways. One key difference is that we reduce error bounds for Lanczos-FA a general function to error bounds for Lanczos-FA on a fixed linear system. This allows intuition about the convergence of Lanczos-FA on linear systems to be transferred to other functions. In addition, our analysis allows bounds for piecewise continuous functions like the sign function.

References

1. Ilic, M.D.; Turner, I.W.; Simpson, D.P. A Restarted Lanczos Approximation to Functions of a Symmetric Matrix. IMA Journal of Numerical Analysis 2009, 30, 1044–1061, doi:10.1093/imanum/drp003.

2. Frommer, A.; Güttel, S.; Schweitzer, M. Efficient and Stable Arnoldi Restarts for Matrix Functions Based on Quadrature. SIAM Journal on Matrix Analysis and Applications 2014, 35, 661–683, doi:10.1137/13093491x.

3. Frommer, A.; Schweitzer, M. Error Bounds and Estimates for Krylov Subspace Approximations of Stieltjes Matrix Functions. BIT Numerical Mathematics 2015, 56, 865–892, doi:10.1007/s10543-015-0596-3.