# Krylov-aware stochastic trace estimation

This is a companion piece to the paper: https://arxiv.org/abs/2205.01736

Code to replicate the figures in this document can be found on Github.

## Introduction

Familiarity with Lanczos-FA or other Krylov subspace methods will be useful.
An understanding of Hutch++ is important for understanding the broader context, but not strictly necessary for this introduction.

Let $\mathbf{B}$ be a $d\times d$ symmetric matrix. An important primative in a number of state of the art trace estimation algorithms like Hutch++  is finding an orthonormal matrix $\mathbf{Q}$ so that $\mathbf{B} \approx \mathbf{Q} (\mathbf{Q}^\mathsf{T}\mathbf{B} \mathbf{Q}) \mathbf{Q}^\mathsf{T}.$ That is, finding the approximate span of the dominant eigenspace of $\mathbf{B}$.

A standard technique is by sketching . In particular, we can use the following algorithm:

Algorithm. Randomized range finder \begin{aligned} \small{1:}&\hspace{1em} \textsf{randomized-range-finder}(\mathbf{A},b) \\ \small{2:}&\hspace{1em} \text{sample Gaussian d\times b matrix \mathbf{\Omega}} \\ \small{3:}&\hspace{2em} \text{compute \mathbf{A}\mathbf{\Omega}}\\ \small{5:}&\hspace{2em} \text{compute orthonormal basis \mathbf{Q} for \mathbf{A}\mathbf{\Omega}}\\ \small{6:}&\hspace{1em}\textbf{return}~\text{\mathbf{Q}} \end{aligned}

It can be shown that the $\mathbf{Q}$ produced by this algorithm is competitive with the best possible rank $b$ $\mathbf{Q}$.

## Krylov aware approximation

Often, $\mathbf{B} = f(\mathbf{A})$ for some matrix function $f(\mathbf{A})$. In this case, it is standard to approximate products with $f(\mathbf{A})$ using a Krylov subspace methods, for instance the (block) Lanczos method for matrix function approximation. Such methods approximate terms like $f(\mathbf{A})\mathbf{Z}$ by constructing the Krylov subspace $\mathcal{K}_{q+1}(\mathbf{A},\mathbf{Z}) = \operatorname{span}\{\mathbf{Z},\mathbf{A}\mathbf{Z}, \ldots, \mathbf{A}^q \mathbf{Z} \}.$

If we are using such a method with the randomized range finder, the first step is to approximate $f(\mathbf{A})\mathbf{\Omega}$ using $\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega})$. Then, we obtain an orthonormal basis $\mathbf{Q}$ for this space. However, in doing so, we implcitly construct the space $\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega}$, so we can obtain a better $\mathbf{Q}$ by using $\mathbf{\bar{Q}}_{q+1}$, an orthonormal basis for $\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega})$.

The ostensible downside is that to compute $\mathbf{Q}^\mathsf{T}f(\mathbf{A})\mathbf{Q}$ now requires $q+1$ matrix-vector products with $\mathbf{A}$. However, this is not actually the case. The key observation made in this paper is the following: If $\mathbf{\bar{Q}}_{q+1}$ is a basis for $\mathcal{K}_{q+1}(\mathbf{A},\mathbf{\Omega})$, then $\mathcal{K}_{n}(\mathbf{A},\mathbf{\bar{Q}}_{q+1}) = \mathcal{K}_{q+n}(\mathbf{A},\mathbf{\Omega}).$

This means that quantities such as $\mathbf{\bar{Q}}_{q+1}^\mathsf{T}f(\mathbf{A})\mathbf{\bar{Q}}_{q+1}$ can be approximated from $\mathcal{K}_{q+n}(\mathbf{A},\mathbf{\Omega})$. This requires just $nr$ additional matrix vector products with $\mathbf{A}$, rather than $n(q+1)$.

While this idea is simple, our algorithm can often perform much better than if matrix-vector products with $f(\mathbf{A})$ are treated as a black box. This is particularly true for functions like $f(x) = \sqrt{x}$, where we may require $q$ to be quite large to obtain a good approximation to $f(\mathbf{A})\mathbf{\Omega}$, but even the basis obtained with $q=1$ (which is basically like sketching $\mathbf{A}$) works well.

A similar idea was explored in  where it is shown that sketching $\mathbf{A}$ instead of $f(\mathbf{A})$ is a good idea for operator monotone functions.

## References

1. Meyer, R.A.; Musco, C.; Musco, C.; Woodruff, D.P. Hutch++: Optimal Stochastic Trace Estimation. In Proceedings of the Symposium on simplicity in algorithms (sosa); SIAM, 2021; pp. 142–155.

2. Halko, N.; Martinsson, P.G.; Tropp, J.A. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review 2011, 53, 217–288, doi:10.1137/090771806.

3. Persson, D.; Kressner, D. Randomized Low-Rank Approximation of Monotone Matrix Functions. arXiv preprint: 2209.11023 2022.