Error bounds for Lanczos-based matrix function approximation

This is a companion piece to https://doi.org/10.1137/21M1427784

Code to replicate the figures in the corresponding paper can be found on Github.

Introduction

The Lanczos method for matrix function approximation (Lanczos-FA) can be used to approximate $f(\mathbf{A})\mathbf{b}$ (read more about this task here), and in the case that $f(x) = 1/x$ and $\mathbf{A}$ is positive definite, this approximation is optimal over Krylov subspace. This case is very well studied and a range of error bounds and estimates exist. However, for other functions, the standard bounds are often too pessimistic as they do not take into account fine grained information about the spectrum of $\mathbf{A}$ such as outlaying or clustered eigenvalues. This makes it difficult to know when Lanczos-FA has reached a suitable accuracy for a given problem.

In this paper we show how to reduce the error of approximating $f(\mathbf{A})\mathbf{b}$ with Lanczos-FA to the error of solving a certain linear system with the Lanczos-FA. This allows us to leverage the range of existing bounds for the convergence of Lanczos-FA on linear systems to easily obtain a priori and a posteriori bounds for other matrix functions including piecewise analytic functions such as the sign function. Our a posteriori bounds are highly accurate and can be used as practical stopping criteria.

The basic idea

The $k$ -th Lanczos-FA approximation to $f(\mathbf{A}) \mathbf{b}$ is defined as $\mathsf{lan}_k(f) := \mathbf{Q} f(\mathbf{T}) \mathbf{Q}^{\mathsf{T}} \mathbf{b},$ where $\mathbf{Q}$ and $\mathbf{T}$ are produced by the Lanczos method run for $k$ steps on $(\mathbf{A},\mathbf{b})$ .

Suppose $\mathbf{A}$ is a Hermitian matrix and If $f$ is analytic in a neighborhood of the eigenvalues of $\mathbf{A}$ and $\Gamma$ is a contour in this neighborhood containing these eigenvalues, $f(\mathbf{A})\mathbf{b} = - \frac{1}{2 \pi i} \oint_{\Gamma} f(z) (\mathbf{A} - z \mathbf{I} )^{-1} \mathbf{b} \, \mathrm{d}{z}.$ If the eigenvalues of $\mathbf{T}$ are also contained in $\Gamma$ , we similarly have $\mathbf{Q} f(\mathbf{T}) \mathbf{Q}^\mathsf{T}\mathbf{b} = - \frac{1}{2 \pi i} \oint_{\Gamma} f(z) \mathbf{Q} (\mathbf{T} - z \mathbf{I} )^{-1} \mathbf{Q}^\mathsf{T}\mathbf{b} \, \mathrm{d}{z} .$ Combining, these, we see that the Lanczos-FA error can be written as $f(\mathbf{A}) \mathbf{b} - \mathbf{Q} f( \mathbf{T} ) \mathbf{Q}^{\mathsf{T}} \mathbf{b} = - \frac{1}{2 \pi i} \oint_{\Gamma} f(z) \, \mathsf{err}_k(z) \, \mathrm{d}{z}.$

For $z \in \mathbb{C}$ , define the $k$ -th Lanczos-FA error and residual for the linear system $(\mathbf{A}-z\mathbf{I}) \mathbf{x} = \mathbf{b}$ as, $\begin{align*} \mathsf{err}_k(z,\mathbf{A},\mathbf{b}) &:= (\mathbf{A} - z \mathbf{I})^{-1} \mathbf{b} - \mathbf{Q}(\mathbf{T}-z\mathbf{I})^{-1}\mathbf{Q}^\mathsf{T}\mathbf{b}%\lan_k ( h_z ) ,\\ \mathsf{res}_k(z,\mathbf{A},\mathbf{b}) &:= \mathbf{b} - (\mathbf{A} - z \mathbf{I}) \mathbf{Q}(\mathbf{T}-z\mathbf{I})^{-1}\mathbf{Q}^\mathsf{T}\mathbf{b}.%\,\lan_k ( h_z ). \end{align*}$

It is a well-known fact that $\mathsf{res}_k(z) = \left( \frac{(-1)^{k}}{\det(\mathbf{T} -z \mathbf{I}) }\prod_{j=1}^{k} \beta_j \right) \| \mathbf{b} \|_2\: \mathbf{q}_{k+1}.$ Consequently, for all $z , w \in \mathbb{C}$ , where $\mathbf{A} - z \mathbf{I}$ and $\mathbf{A} - w \mathbf{I}$ are both invertible, $\begin{align*} \mathsf{err}_k(z) &= \det(h_{w,z}(\mathbf{T})) h_{w,z}(\mathbf{A}) \,\mathsf{err}_k(w) \\ \mathsf{res}_k(z) &= \det(h_{w,z}(\mathbf{T})) \,\mathsf{res}_k(w), \end{align*}$ where $h_{w,z}(x) = \frac{x-w}{x-z}.$

Thus, if $\Gamma$ is a simple closed curve or union of simple closed curves inside this neighborhood and enclosing the eigenvalues of $\mathbf{A}$ and $\mathbf{T}$ and $w$ a point not in $\Lambda(\mathbf{T})\cup\Lambda(\mathbf{A})$ , $f(\mathbf{A}) \mathbf{b} - \mathsf{lan}_k(f) = \left( - \frac{1}{2\pi i} \oint_{\Gamma} f(z) \det(h_{w,z}(\mathbf{T})) h_{w,z}(\mathbf{A}) \mathrm{d}{z} \right) \, \mathsf{err}_k(w).$

Applying the triangle inequality and the submultiplicitivity of matrix-norms, we can then obtain a bound
$\| f(\mathbf{A})\mathbf{b} - \mathsf{lan}_k(f) \| \leq \underbrace{\vphantom{ \bigg| }\left( \frac{1}{2\pi}\oint_{\Gamma} |f(z)| \left(\prod_{i=1}^{k} \| h_{w,z}\|_{S_i}\right) \|h_{w,z}\|_{S_0} | \mathrm{d}{z} | \right)}_{\text{integral term}} \hspace{-.5 em}\underbrace{\vphantom{ \Bigg| } \| \mathsf{err}_k(w) \| , \hspace{-.4em} }_{\text{linear system error}} \hspace{-.5em}$ where $S_i$ are some suitably chosen sets and $\|g\|_S:= \max_{x\in S}|g(x)|$ .

Note that the integral term and linear system error term in the theorem are entirely decoupled! Thus, once the integral term is computed, bounding the error of Lanczos-FA for $f(\mathbf{A})\mathbf{b}$ is reduced to bounding $\| \mathsf{err}_k(w) \|$ , and if the integral term can be bounded independently of $k$ , implies that, up to a constant factor, the Lanczos-FA approximation to $f(\mathbf{A})\mathbf{b}$ converges at least as fast as $\| \mathsf{err}_k(w) \|$ .

Much of the paper is focused on practical aspects regarding the use of such a bound. In particular:

we provide a detailed discussion and analysis of the validity of our bounds when the Lanczos iterate was computed using finite precision arithmetic.
we provide several analytic examples where the integral term is computed directly
we use numerical experiments to demonstrate the effectiveness of our bounds on a variety of functions such as the square root, inverse square root, and sign functions, and
we derive similar bounds for quadratic forms.

It’s worth noting that similar ideas have been previously used to get error bounds for Stieltjes functions [1–3]. As discussed in Section 2.2 of the paper, our bounds differ in a number of key ways. One key difference is that we reduce error bounds for Lanczos-FA a general function to error bounds for Lanczos-FA on a fixed linear system. This allows intuition about the convergence of Lanczos-FA on linear systems to be transferred to other functions. In addition, our analysis allows bounds for piecewise continuous functions like the sign function.

References

1. Ilic, M.D.; Turner, I.W.; Simpson, D.P. A Restarted Lanczos Approximation to Functions of a Symmetric Matrix. IMA Journal of Numerical Analysis 2009, 30, 1044–1061, doi:10.1093/imanum/drp003.

2. Frommer, A.; Güttel, S.; Schweitzer, M. Efficient and Stable Arnoldi Restarts for Matrix Functions Based on Quadrature. SIAM Journal on Matrix Analysis and Applications 2014, 35, 661–683, doi:10.1137/13093491x.

3. Frommer, A.; Schweitzer, M. Error Bounds and Estimates for Krylov Subspace Approximations of Stieltjes Matrix Functions. BIT Numerical Mathematics 2015, 56, 865–892, doi:10.1007/s10543-015-0596-3.