Randomized matrix-free quadrature for spectrum and spectral sum approximation

Tyler Chen

This is a companion piece to the paper: https://arxiv.org/abs/2204.01941

Code to replicate the figures in this document can be found on Github.

Introduction

The cumulative empirical spectral measure (CESM) Φ:R[0,1]\Phi : \mathbb{R}\to [0,1], induced by A\mathbf{A}, is defined by Φ(x)=ΦA:=i=0n1n1I[λix]. \Phi(x) = \Phi_{\mathbf{A}} := \sum_{i=0}^{n-1} n^{-1} \mathbb{I}[ \lambda_i \leq x]. Here I[true]=1\mathbb{I}[ \texttt{true} ] =1 and I[false]=0\mathbb{I}[\texttt{false}] = 0. Traces of matrix functions are related to the CESM by tr(f(A))=nfdΦ. \operatorname{tr}(f(\mathbf{A})) = n \int f\,\mathrm{d}\Phi. You can read more about applications of these quantities here.

In this paper, we provide a unified perspective of algorithms for randomized matrix-free quadrature. Specifically, the class of algorithms we study is characterized by the use of a Krylov subspace method to approximate independent and identically distributed samples of vTf(A)v\mathbf{v}^\mathsf{T}f(\mathbf{A})\mathbf{v}, where v\mathbf{v} is an isotropic random vector. Since many widely used algorithms, such as stochastic Lanczos quadrature (SLQ) [1] and the kernel polynomial method (KPM) [2], fall into this class, our unified perspective allows the algorithms’ tradeoffs to be more clearly understood.

Define the weighted CESM Ψ:R[0,1]\Psi : \mathbb{R}\to[0,1], an estimator for the CESM defined by Ψ(x):=vTI(Ax)v=i=1nvTui2I(λix), \Psi(x) := \mathbf{v}^\mathsf{T}\mathbb{I}(\mathbf{A}\leq x) \mathbf{v} = \sum_{i=1}^{n} |\mathbf{v}^\mathsf{T}\mathbf{u}_i|^2 \mathbb{I}(\lambda_i\leq x), where v\mathbf{v} is a random vector satisfying E[vvT]=n1I\mathbb{E}[\mathbf{v}\mathbf{v}^\mathsf{T}] = n^{-1} \mathbf{I}; i.e., v\mathbf{v} is isotropic with appropriate scale.

Then, using basic properties of the trace and expectation, we have E[Ψ(x)]=E[vTI(Ax)v]=E[tr(vTI(Ax)v)]=E[tr(I(Ax)vvT)]=tr(E[I(Ax)vvT])=tr(I(Ax)E[vvT])=n1tr(I(Ax))=Φ(x).\begin{align*} \mathbb{E}[\Psi(x)] &= \mathbb{E}[\mathbf{v}^\mathsf{T}\mathbb{I}(\mathbf{A}\leq x) \mathbf{v}] = \mathbb{E}[\operatorname{tr}(\mathbf{v}^\mathsf{T}\mathbb{I}(\mathbf{A}\leq x) \mathbf{v})] \\& \hspace{3em} = \mathbb{E}[\operatorname{tr}(\mathbb{I}(\mathbf{A}\leq x) \mathbf{v} \mathbf{v}^\mathsf{T})] = \operatorname{tr}(\mathbb{E}[ \mathbb{I}(\mathbf{A}\leq x) \mathbf{v} \mathbf{v}^\mathsf{T}]) \\& \hspace{6em} = \operatorname{tr}( \mathbb{I}(\mathbf{A}\leq x) \mathbb{E}[ \mathbf{v} \mathbf{v}^\mathsf{T}] ) = n^{-1} \operatorname{tr}(\mathbb{I}(\mathbf{A}\leq x)) = \Phi(x). \end{align*} Further, almost by definition, we see that f(x)dΨ(x)=vTf(A)v\int_{-\infty}^{\infty} f(x) \mathrm{d}\Psi(x) = \mathbf{v}^\mathsf{T}f(\mathbf{A})\mathbf{v} is an unbiased estimator for n1tr(f(A))n^{-1} \operatorname{tr}(f(\mathbf{A})).

The Lanczos method for matrix function approximation (Lanczos-FA) can be used to approximate vTf(A)v\mathbf{v}^\mathsf{T}f(\mathbf{A})\mathbf{v} (read more about this task here).

The prototypical algorithm

A natural approach to approximate the CESM Ψ\Psi and therefore tr(f(A))\operatorname{tr}(f(\mathbf{A})) is to average several copies of Ψ\Psi. However, computing Ψ\Psi requires knowledge of the eigenvalues of A\mathbf{A}. Therefore, we can approximate Ψ\Psi with a quadrature rule; i.e. with a simpler distribution function. The most common way to do this is by obtaining the moments of Ψ\Psi, and using this information to ensure that polynomials up to some degree are integrated exactly.

This is typically done with a Krylov subspace method. Indeed, for any i,ji,j with 0i,jk0\leq i,j \leq k, (Ajv)T(Ajv)=vTAi+jv=xi+jdΨ(x). (\mathbf{A}^j\mathbf{v})^\mathsf{T}(\mathbf{A}^j\mathbf{v}) = \mathbf{v}^\mathsf{T}\mathbf{A}^{i+j} \mathbf{v} = \int_{-\infty}^{\infty} x^{i+j} \mathrm{d}\Psi(x). Thus, the moments of Ψ\Psi through degree 2k2k can be computed using the information in the Krylov subspace Kk(A,v):=span{v,Av,,Akv}={p(A)v:deg(p)k}. \mathcal{K}_{k}(\mathbf{A},\mathbf{v}) := \operatorname{span}\{\mathbf{v}, \mathbf{A}\mathbf{v}, \ldots, \mathbf{A}^{k}\mathbf{v} \} = \{ p(\mathbf{A})\mathbf{v} : \deg(p) \leq k \}.

This can be summarized with the following algorithm:

Algorithm. Prototypical randomized matrix free quadrature 1:protoalg(A,nv,k,)2:for =1,2,,nv do3:define (implicitly) ΨiidΨ by sampling viidvE[vvT]=n1I4:compute the moments of Ψ through degree s by constructing Kk(A,v)5:approximate Ψ by a quadrature rule6:return average of individual quadrature rules\begin{aligned} \small{1:}&\hspace{1em} \textsf{protoalg}(\mathbf{A}, n_{\text{v}}, k, \circ) \\ \small{2:}&\hspace{1em} \textbf{for}~ \ell = 1, 2, \ldots, n_{\textup{v}}~ \textbf{do} \\ \small{3:}&\hspace{2em} \text{define (implicitly) \( \Psi_\ell \stackrel{\text{iid}}{\sim} \Psi \) by sampling \( \mathbf{v}_\ell \stackrel{\text{iid}}{\sim} \mathbf{v} \), \( \mathbb{E}[\mathbf{v}\mathbf{v}^\mathsf{T}] = n^{-1} \mathbf{I} \)} \\ \small{4:}&\hspace{2em} \text{compute the moments of \( \Psi_{\ell} \) through degree \( s \) by constructing \( \mathcal{K}_k(\mathbf{A},\mathbf{v}_\ell) \)} \\ \small{5:}&\hspace{2em} \text{approximate \( \Psi_{\ell} \) by a quadrature rule} \\ \small{6:}&\hspace{1em}\textbf{return}~\text{average of individual quadrature rules} \end{aligned}

A unified perspective

Common algorithms like SLQ and KPM are special instances of the prototypical algorithm described above. As such, they can all be analyzed through the lens of randomized quadrature. Our analysis shows that all algorithms of this form satisfy similar theoretical bounds. However there are many practical concerns which separate the algorithms.

A number of important past works have argued that Lanczos-based approaches are unstable, and therefore should not be used without costly reorthogonalization schemes . Based on the analysis in [3] , we argue that, even without reorthogonalzation, Lanczos-based approaches tend to outperform approaches based on explicit polynomials. In fact, we go so far as to argue that explicit polynomial methods such as the kernel polynomial method should typically be implemented using Lanczos. The qualitative tradeoffs between the algorithms are further illustrated by our numerical experiments in a wite variety of numerical experiments.

References

1. Ubaru, S.; Chen, J.; Saad, Y. Fast Estimation of tr (f(A)) via Stochastic Lanczos Quadrature. SIAM Journal on Matrix Analysis and Applications 2017, 38, 1075–1099, doi:10.1137/16m1104974.

2. Weiße, A.; Wellein, G.; Alvermann, A.; Fehske, H. The Kernel Polynomial Method. Reviews of Modern Physics 2006, 78, 275–306, doi:10.1103/revmodphys.78.275.

3. Knizhnerman, L.A. The Simple Lanczos Procedure: Estimates of the Error of the Gauss Quadrature Formula and Their Applications. Comput. Math. Math. Phys. 1996, 36, 1481–1492.