Let x be a random vector satisfying E[x]=0 and E[xxT]=I.
Then, a direct computations reveals that xTAx is an unbiased estimator of the trace of A.
In particular, by the cylic property of the trace and linearity of expectation,
This approach unnecessarily computes and stores the off-diagonal entries of XTAX.
However, in trace estimation we typically view products with A as expensive, so the additional cost of forming XTAX is often negligible.
We can easily characterize the expectation and variance of the Girard-Hutchinson estimator when x is a Gaussian vector.
Proof
A standard computation reveals that
E[trm(A)]=tr(A),V[trm(A)]=m1V[xTAx].
Since A is symmetric, it has an eigendecomposition A=UΛUT, where U∈Rn×n is an orthogonal matrix and Λ=diag(λ1,…,λn) is a diagonal matrix.
By the Gaussian orthogonal invariance property, z=Ux is also a Gaussian vector.
Observe that
xTAx=zTΛz=i=1∑nλizi2.
Since zi2 are independent χ12 random variables, we have,
V[xTAx]=i=1∑nλi2V[zi2]=i=1∑nλi2⋅2=2∥A∥F2.
If A is not symmetric, we can use the fact that trm(A)=trm((A+AT)/2).
Besides Gaussian, two commonly used distributions are:
Rademacher: Each component of x is iid {−1,1} with equal probability.
Random unit vector (real): Each component of x is iid N(0,1), and then x is normalized to have norm n.
When A is symmetric, the variance of the estimator can be computed explicitly:
Distribution of x
Variance
Gaussian
m1⋅2∥A∥F2
Rademacher
m1⋅2(∥A∥F2−∑iAii2)
Uniform (real)
m1⋅n+22n(∥A∥F2−n1tr(A)2)
Deriving the variance for Rademacher vectors is a straightforward (but tedious) exercise in basic probability.
The variance for the uniform distribution can also be derived from elementary techniques, but requires some care to handle the normalization.
Readers can refer to Ethan’s blog post for a derivation of the variances listed above, as well as some additional discussion on interesting optimality properties of certain distributions.
A task closely related to trace estimation is approximating the diagonal of a matrix A∈Rn×n.
The analog of Definition 6.2 is the Girard-Hutchinson estimator for diagonal estimation:
It’s not too hard to analyze the mean and variance to prove a result analogous to Theorem 6.1.
See also Dharangutte & Musco, 2023 for a probability bound.
Hutchinson, M. F. (1989). A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines. Communications in Statistics - Simulation and Computation, 18(3), 1059–1076. 10.1080/03610918908812806
Cortinovis, A., & Kressner, D. (2021). On Randomized Trace Estimates for Indefinite Matrices with an Application to Determinants. Foundations of Computational Mathematics, 22(3), 875–903. 10.1007/s10208-021-09525-9
Dharangutte, P., & Musco, C. (2023). A Tight Analysis of Hutchinson’s Diagonal Estimator. In Symposium on Simplicity in Algorithms (SOSA) (pp. 353–364). Society for Industrial. 10.1137/1.9781611977585.ch32