Girard-Hutchinson Estimator - Randomized Numerical Linear Algebra with Examples

Let $\vec{x}$ be a random vector satisfying $\mathbb{E}[\vec{x}] = \vec{0}$ and $\mathbb{E}[\vec{x}\vec{x}^\T] = \vec{I}$ . Then, a direct computations reveals that $\vec{x}^\T\vec{A}\vec{x}$ is an unbiased estimator of the trace of $\vec{A}$ . In particular, by the cylic property of the trace and linearity of expectation,

\EE[ \vec{x}^\T \vec{A}\vec{x}] = \EE[ \tr(\vec{x}^\T \vec{A}\vec{x}) ] = \EE[ \tr(\vec{A}\vec{x}\vec{x}^\T ) ] = \tr(\vec{A}\EE[ \vec{x}\vec{x}^\T ] ) = \text{tr}(\vec{A}).

(6.1)

This suggests a simple randomized estimator.

We can implement the Girard-Hutchinson estimator.

def girard_hutchinson(A,m):

    n,_ = A.shape

    X = np.random.randn(n,m)
    trm = np.mean(np.diag(X.T@(A@X)))

    return trm

In the above implementation we use that if $\vec{X} = [\vec{x}_1, \ldots, \vec{x}_m] \in \mathbb{R}^{n \times m}$ then

\vec{X}^\T \vec{A} \vec{X} = \begin{bmatrix} \vec{x}_1^\T \vec{A} \vec{x}_1 & \vec{x}_1^\T \vec{A} \vec{x}_2 & \cdots & \vec{x}_1^\T \vec{A} \vec{x}_m \\ \vec{x}_2^\T \vec{A} \vec{x}_1 & \vec{x}_2^\T \vec{A} \vec{x}_2 & \cdots & \vec{x}_2^\T \vec{A} \vec{x}_m \\ \vdots & \vdots & \ddots & \vdots \\ \vec{x}_m^\T \vec{A} \vec{x}_1 & \vec{x}_m^\T \vec{A} \vec{x}_2 & \cdots & \vec{x}_m^\T \vec{A} \vec{x}_m \end{bmatrix},

(6.2)

and hence $\widehat{\tr}_m(\vec{A}) = \frac{1}{m} \tr(\vec{X}^\T \vec{A} \vec{X})$ .

This approach unnecessarily computes and stores the off-diagonal entries of $\vec{X}^\T \vec{A} \vec{X}$ . However, in trace estimation we typically view products with $\vec{A}$ as expensive, so the additional cost of forming $\vec{X}^\T \vec{A} \vec{X}$ is often negligible.

Variance Bounds¶

We can easily characterize the expectation and variance of the Girard-Hutchinson estimator when $\vec{x}$ is a Gaussian vector.

Proof

A standard computation reveals that

\EE[ \widehat{\tr}_m(\vec{A}) ] = \tr(\vec{A}), \qquad \VV[ \widehat{\tr}_m(\vec{A}) ] = \frac{1}{m} \VV[\vec{x}^\T\vec{A}\vec{x}].

Since $\vec{A}$ is symmetric, it has an eigendecomposition $\vec{A} = \vec{U}\vec{\Lambda}\vec{U}^\T$ , where $\vec{U}\in\R^{n\times n}$ is an orthogonal matrix and $\vec{\Lambda} = \text{diag}(\lambda_1,\ldots,\lambda_n)$ is a diagonal matrix. By the Gaussian orthogonal invariance property, $\vec{z} = \vec{U}\vec{x}$ is also a Gaussian vector. Observe that

\vec{x}^\T\vec{A}\vec{x} = \vec{z}^\T\vec{\Lambda}\vec{z} = \sum_{i=1}^{n} \lambda_i z_i^2.

Since $z_i^2$ are independent $\chi^2_1$ random variables, we have,

\VV[\vec{x}^\T\vec{A}\vec{x}] = \sum_{i=1}^{n} \lambda_i^2 \VV[z_i^2] = \sum_{i=1}^{n} \lambda_i^2 \cdot 2 = 2\|\vec{A}\|_\F^2.

If $\vec{A}$ is not symmetric, we can use the fact that $\widehat{\tr}_m(\vec{A})=\widehat{\tr}_m((\vec{A} + \vec{A}^\T)/2)$ .

One can also derive probability bounds; see e.g. Cortinovis & Kressner, 2021.

Other Distributions¶

Besides Gaussian, two commonly used distributions are:

Rademacher: Each component of $\vec{x}$ is iid $\{-1,1\}$ with equal probability.
Random unit vector (real): Each component of $\vec{x}$ is iid $\mathcal{N}(0,1)$ , and then $\vec{x}$ is normalized to have norm $\sqrt{n}$ .

When $\vec{A}$ is symmetric, the variance of the estimator can be computed explicitly:

Distribution of $\vec{x}$	Variance
Gaussian	$\frac{1}{m} \cdot 2 \Vert\vec{A}\Vert_\F^2$
Rademacher	$\frac{1}{m} \cdot 2 (\Vert\vec{A}\Vert_\F^2 - \sum_i A_{ii}^2)$
Uniform (real)	$\frac{1}{m} \cdot \frac{2n}{n+2} \left( \Vert \vec{A} \Vert_\F^2 - \frac{1}{n}\tr(\vec{A})^2 \right)$

Deriving the variance for Rademacher vectors is a straightforward (but tedious) exercise in basic probability. The variance for the uniform distribution can also be derived from elementary techniques, but requires some care to handle the normalization. Readers can refer to Ethan’s blog post for a derivation of the variances listed above, as well as some additional discussion on interesting optimality properties of certain distributions.

Diagonal Approximation¶

A task closely related to trace estimation is approximating the diagonal of a matrix $\vec{A}\in\mathbb{R}^{n\times n}$ . The analog of Definition 6.2 is the Girard-Hutchinson estimator for diagonal estimation:

It’s not too hard to analyze the mean and variance to prove a result analogous to Theorem 6.1. See also Dharangutte & Musco, 2023 for a probability bound.

References¶

Hutchinson, M. F. (1989). A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines. Communications in Statistics - Simulation and Computation, 18(3), 1059–1076. 10.1080/03610918908812806
Girard, D. (1987). Un algorithme simple et rapide pour la validation croisée généralisée sur des problèmes de grande taille. https://membres-ljk.imag.fr/Didier.Girard/TR-665-M-IMAG.pdf
Chen, T. (2022). Lanczos-Based Methods for Matrix Functions (p. 188) [Phdthesis, University of Washington]. https://www.proquest.com/dissertations-theses/lanczos-based-methods-matrix-functions/docview/2719149525/se-2
Cortinovis, A., & Kressner, D. (2021). On Randomized Trace Estimates for Indefinite Matrices with an Application to Determinants. Foundations of Computational Mathematics, 22(3), 875–903. 10.1007/s10208-021-09525-9
Dharangutte, P., & Musco, C. (2023). A Tight Analysis of Hutchinson’s Diagonal Estimator. In Symposium on Simplicity in Algorithms (SOSA) (pp. 353–364). Society for Industrial. 10.1137/1.9781611977585.ch32

Randomized Numerical Linear Algebra with Examples

Trace Estimation