Skip to article frontmatterSkip to article content

In the previous section, we introduced SLQ as a combination of the Girard-Hutchinson trace estimator and a black-box Lanczos-based method for approximating quadratic forms. To truly understand the algorithm, particularly its behavior in finite precision arithmetic, it is informative to view SLQ and related algorithms through their relation to quadrature. This perspective is taken in Chen, 2024. We provide a brief summary here.

Connection to quadrature

Fix a vector x\vec{x} and consider the weighted spectral density

ψ(x;A,x):=1ni=1nxTui2δ(xλi).\psi(x;\vec{A},\vec{x}) := \frac{1}{n} \sum_{i=1}^{n} |\vec{x}^\T \vec{u}_i|^2 \delta(x-\lambda_i).

and observe that

xTf(A)x=nf(x)ψ(x;A,x)dx.\vec{x}^\T f(\vec{A})\vec{x} = n \int_{-\infty}^{\infty} f(x) \psi(x;\vec{A},\vec{x}) \d{x}.

Note that when x\vec{x} is drawn from a suitable distribution (E[xxT]=I\EE[\vec{x}\vec{x}^\T] = \vec{I}), then E[ψ(x)]=φ(x)\EE[\psi(x)] = \varphi(x).

Let T\vec{T} be the output of the Lanczos algorithm run on (A,x)(\vec{A},\vec{x}) for kk iterations, and suppose T\vec{T} has eigendecomposition T=j=1kθjsjsjT\vec{T} = \sum_{j=1}^{k} \vec{\theta}_{j} \vec{s}_{j} \vec{s}_{j}^\T. Define weights ωj=x2e1Tsj2\omega_{j} = \| \vec{x}\|^2 | \vec{e}_1^\T \vec{s}_{j}|^2. Consider the probability density

ψk(x;A,x):=1nj=1kωjδ(xθj).\psi_k(x;\vec{A},\vec{x}) := \frac{1}{n} \sum_{j=1}^{k} \omega_{j} \delta(x-\theta_{j}).

This is the Gaussian quadrature approximation to ψ(x)\psi(x); i.e. the unique density function supported on kk points whose moments match ψ(x)\psi(x) through degree 2k12k-1.

Randomized matrix-free quadrature

Hence, the SLQ approximation is nothing more than the average of the Gaussian quadrature approximations to weighted spectral densities corresponding to random vectors xi\vec{x}_i.

Observe that the SLQ spectrum approximation defined here, and the SLQ trace estimator are compatible:

SLQk,m(f;A)=nf(x)φk,mSLQ(x;A)dx.\Call{SLQ}_{k,m}(f;\vec{A}) = n \int f(x) \varphi_{k,m}^{\Call{SLQ}}(x;\vec{A}) \d{x}.

As in the case of spectral sum approximation, increasing kk and mm increases the quality of the approximation. For instance, we have the following:

Proof

Full proofs can be found in Chen et al., 2021Chen, 2024Chen et al., 2025, but the basic idea is relatively straightforward.

First, one notes that the Wasserstein distance has an equivalent definition

W(φ1,φ2)=supf 1-Lipshitzf(x)(φ1(x)φ2(x))dx.\operatorname{W}(\varphi_1,\varphi_2) = \sup_{f\text{ 1-Lipshitz}} \int_{-\infty}^{\infty} f(x) (\varphi_1(x) - \varphi_2(x)) \d{x}.

For a fixed 1-Lipshitz functions, (6.16) and Theorem 6.4 give that m=O(1/(nε2))m = O(1/(n\varepsilon^2)) and m=O(1/ε)m = O(1/\varepsilon) (since the best 2k12k-1 polynomial approximation to 1-Lipshitz functions converges like 1/k\sim 1/k).

However, to bound the Wasserstein distance, we require the result holds for all 1-Lipshitz functions. They key to exending to all 1-Lipshitz functions is to observe that Lipshitz functions have a stable Chebyshev series; i.e. a slight perturbation to their Chebyshev approximation still yields a good approximation. Therefore, if we can approximate the Chevbyshev moments of $\varphi(x;\vec{A}) (traces of the Chebyshev polynomials) well, then we can (up to a slight loss) approximate any Lipshitz function.

References
  1. Chen, T. (2024). The Lanczos algorithm for matrix functions: a handbook for scientists. https://arxiv.org/abs/2410.11090
  2. Chen, T., Trogdon, T., & Ubaru, S. (2021). Analysis of stochastic Lanczos quadrature for spectrum approximation. Proceedings of the 38th International Conference on Machine Learning, 139, 1728–1739. http://proceedings.mlr.press/v139/chen21s.html
  3. Chen, T., Trogdon, T., & Ubaru, S. (2025). Randomized Matrix-Free Quadrature: Unified and Uniform Bounds for Stochastic Lanczos Quadrature and the Kernel Polynomial Method. SIAM Journal on Scientific Computing, 47(3), A1733–A1757. 10.1137/23m1600414