Stochastic Lanczos Quadrature - Randomized Numerical Linear Algebra with Examples

In the previous section, we introduced SLQ as a combination of the Girard-Hutchinson trace estimator and a black-box Lanczos-based method for approximating quadratic forms. To truly understand the algorithm, particularly its behavior in finite precision arithmetic, it is informative to view SLQ and related algorithms through their relation to quadrature. This perspective is taken in Chen, 2024. We provide a brief summary here.

Connection to quadrature¶

Fix a vector $\vec{x}$ and consider the weighted spectral density

\psi(x;\vec{A},\vec{x}) := \frac{1}{n} \sum_{i=1}^{n} |\vec{x}^\T \vec{u}_i|^2 \delta(x-\lambda_i).

(6.13)

and observe that

\vec{x}^\T f(\vec{A})\vec{x} = n \int_{-\infty}^{\infty} f(x) \psi(x;\vec{A},\vec{x}) \d{x}.

(6.14)

Note that when $\vec{x}$ is drawn from a suitable distribution ( $\EE[\vec{x}\vec{x}^\T] = \vec{I}$ ), then $\EE[\psi(x)] = \varphi(x)$ .

Let $\vec{T}$ be the output of the Lanczos algorithm run on $(\vec{A},\vec{x})$ for $k$ iterations, and suppose $\vec{T}$ has eigendecomposition $\vec{T} = \sum_{j=1}^{k} \vec{\theta}_{j} \vec{s}_{j} \vec{s}_{j}^\T$ . Define weights $\omega_{j} = \| \vec{x}\|^2 | \vec{e}_1^\T \vec{s}_{j}|^2$ . Consider the probability density

\psi_k(x;\vec{A},\vec{x}) := \frac{1}{n} \sum_{j=1}^{k} \omega_{j} \delta(x-\theta_{j}).

(6.15)

This is the Gaussian quadrature approximation to $\psi(x)$ ; i.e. the unique density function supported on $k$ points whose moments match $\psi(x)$ through degree $2k-1$ .

Randomized matrix-free quadrature¶

Hence, the SLQ approximation is nothing more than the average of the Gaussian quadrature approximations to weighted spectral densities corresponding to random vectors $\vec{x}_i$ .

Observe that the SLQ spectrum approximation defined here, and the SLQ trace estimator are compatible:

\Call{SLQ}_{k,m}(f;\vec{A}) = n \int f(x) \varphi_{k,m}^{\Call{SLQ}}(x;\vec{A}) \d{x}.

(6.16)

As in the case of spectral sum approximation, increasing $k$ and $m$ increases the quality of the approximation. For instance, we have the following:

Proof

Full proofs can be found in Chen et al., 2021Chen, 2024Chen et al., 2025, but the basic idea is relatively straightforward.

First, one notes that the Wasserstein distance has an equivalent definition

\operatorname{W}(\varphi_1,\varphi_2) = \sup_{f\text{ 1-Lipshitz}} \int_{-\infty}^{\infty} f(x) (\varphi_1(x) - \varphi_2(x)) \d{x}.

For a fixed 1-Lipshitz functions, (6.16) and Theorem 6.4 give that $m = O(1/(n\varepsilon^2))$ and $m = O(1/\varepsilon)$ (since the best $2k-1$ polynomial approximation to 1-Lipshitz functions converges like $\sim 1/k$ ).

However, to bound the Wasserstein distance, we require the result holds for all 1-Lipshitz functions. They key to exending to all 1-Lipshitz functions is to observe that Lipshitz functions have a stable Chebyshev series; i.e. a slight perturbation to their Chebyshev approximation still yields a good approximation. Therefore, if we can approximate the Chevbyshev moments of $\varphi(x;\vec{A}) (traces of the Chebyshev polynomials) well, then we can (up to a slight loss) approximate any Lipshitz function.

References¶

Chen, T. (2024). The Lanczos algorithm for matrix functions: a handbook for scientists. https://arxiv.org/abs/2410.11090
Chen, T., Trogdon, T., & Ubaru, S. (2021). Analysis of stochastic Lanczos quadrature for spectrum approximation. Proceedings of the 38th International Conference on Machine Learning, 139, 1728–1739. http://proceedings.mlr.press/v139/chen21s.html
Chen, T., Trogdon, T., & Ubaru, S. (2025). Randomized Matrix-Free Quadrature: Unified and Uniform Bounds for Stochastic Lanczos Quadrature and the Kernel Polynomial Method. SIAM Journal on Scientific Computing, 47(3), A1733–A1757. 10.1137/23m1600414

Randomized Numerical Linear Algebra with Examples

Spectrum approximation

Randomized Numerical Linear Algebra with Examples

Kernel Polynomial Method