Krylov-aware stochastic trace estimation

Tyler Chen

This is a companion piece to the paper: https://arxiv.org/abs/2205.01736

Code to replicate the figures in this document can be found on Github.

Introduction

Familiarity with Lanczos-FA or other Krylov subspace methods will be useful.
An understanding of Hutch++ is important for understanding the broader context, but not strictly necessary for this introduction.

Let B\mathbf{B} be a d×dd\times d symmetric matrix. An important primative in a number of state of the art trace estimation algorithms like Hutch++ [1] is finding an orthonormal matrix Q\mathbf{Q} so that BQ(QTBQ)QT. \mathbf{B} \approx \mathbf{Q} (\mathbf{Q}^\mathsf{T}\mathbf{B} \mathbf{Q}) \mathbf{Q}^\mathsf{T}. That is, finding the approximate span of the dominant eigenspace of B\mathbf{B}.

A standard technique is by sketching [2]. In particular, we can use the following algorithm:

Algorithm. Randomized range finder 1:randomized-range-finder(A,b)2:sample Gaussian d×b matrix Ω3:compute AΩ5:compute orthonormal basis Q for AΩ6:return Q\begin{aligned} \small{1:}&\hspace{1em} \textsf{randomized-range-finder}(\mathbf{A},b) \\ \small{2:}&\hspace{1em} \text{sample Gaussian $d\times b$ matrix $\mathbf{\Omega}$} \\ \small{3:}&\hspace{2em} \text{compute $\mathbf{A}\mathbf{\Omega}$}\\ \small{5:}&\hspace{2em} \text{compute orthonormal basis $\mathbf{Q}$ for $\mathbf{A}\mathbf{\Omega}$}\\ \small{6:}&\hspace{1em}\textbf{return}~\text{$\mathbf{Q}$} \end{aligned}

It can be shown that the Q\mathbf{Q} produced by this algorithm is competitive with the best possible rank bb Q\mathbf{Q}.

Krylov aware approximation

Often, B=f(A)\mathbf{B} = f(\mathbf{A}) for some matrix function f(A)f(\mathbf{A}). In this case, it is standard to approximate products with f(A)f(\mathbf{A}) using a Krylov subspace methods, for instance the (block) Lanczos method for matrix function approximation. Such methods approximate terms like f(A)Zf(\mathbf{A})\mathbf{Z} by constructing the Krylov subspace Kq+1(A,Z)=span{Z,AZ,,AqZ}. \mathcal{K}_{q+1}(\mathbf{A},\mathbf{Z}) = \operatorname{span}\{\mathbf{Z},\mathbf{A}\mathbf{Z}, \ldots, \mathbf{A}^q \mathbf{Z} \}.

If we are using such a method with the randomized range finder, the first step is to approximate f(A)Ωf(\mathbf{A})\mathbf{\Omega} using Kq+1(A,Ω)\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega}). Then, we obtain an orthonormal basis Q\mathbf{Q} for this space. However, in doing so, we implcitly construct the space Kq+1(A,Ω\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega}, so we can obtain a better Q\mathbf{Q} by using Qˉq+1\mathbf{\bar{Q}}_{q+1}, an orthonormal basis for Kq+1(A,Ω)\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega}).

The ostensible downside is that to compute QTf(A)Q\mathbf{Q}^\mathsf{T}f(\mathbf{A})\mathbf{Q} now requires q+1q+1 matrix-vector products with A\mathbf{A}. However, this is not actually the case. The key observation made in this paper is the following: If Qˉq+1\mathbf{\bar{Q}}_{q+1} is a basis for Kq+1(A,Ω)\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega}), then Kn(A,Qˉq+1)=Kq+n(A,Ω). \mathcal{K}_{n}(\mathbf{A},\mathbf{\bar{Q}}_{q+1}) = \mathcal{K}_{q+n}(\mathbf{A},\mathbf{\Omega}).

This means that quantities such as Qˉq+1Tf(A)Qˉq+1\mathbf{\bar{Q}}_{q+1}^\mathsf{T}f(\mathbf{A})\mathbf{\bar{Q}}_{q+1} can be approximated from Kq+n(A,Ω)\mathcal{K}_{q+n}(\mathbf{A},\mathbf{\Omega}). This requires just nrnr additional matrix vector products with A\mathbf{A}, rather than n(q+1)n(q+1).

While this idea is simple, our algorithm can often perform much better than if matrix-vector products with f(A)f(\mathbf{A}) are treated as a black box. This is particularly true for functions like f(x)=xf(x) = \sqrt{x}, where we may require qq to be quite large to obtain a good approximation to f(A)Ωf(\mathbf{A})\mathbf{\Omega}, but even the basis obtained with q=1q=1 (which is basically like sketching A\mathbf{A}) works well.

A similar idea was explored in [3] where it is shown that sketching A\mathbf{A} instead of f(A)f(\mathbf{A}) is a good idea for operator monotone functions.

References

1. Meyer, R.A.; Musco, C.; Musco, C.; Woodruff, D.P. Hutch++: Optimal Stochastic Trace Estimation. In Proceedings of the Symposium on simplicity in algorithms (sosa); SIAM, 2021; pp. 142–155.

2. Halko, N.; Martinsson, P.G.; Tropp, J.A. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review 2011, 53, 217–288, doi:10.1137/090771806.

3. Persson, D.; Kressner, D. Randomized Low-Rank Approximation of Monotone Matrix Functions. arXiv preprint: 2209.11023 2022.