Randomly Pivoted Cholesky - Randomized Numerical Linear Algebra with Examples

Suppose $\vec{A}$ is positive definite and recall the Nyström approximation

\vec{A} \langle \vec{\Omega}\rangle := \vec{A}\vec{\Omega} ( \vec{\Omega}^\T\vec{A}\vec{\Omega})^+ \vec{\Omega}^\T\vec{A}.

(7.4)

When the columns of $\vec{\Omega}$ a subset of the columns of the identity, then $\vec{A}\vec{\Omega}$ corresponds to subsampling the columns of $\vec{A}$ . In particular, let $S \subseteq \{1, \ldots, n\}$ be a tuple containing the $k$ pivots (columns of $\vec{A}$ ) we will observe; i.e. so that $\vec{\Omega} = \vec{I}[:,S]$ . Then

\vec{A} \langle \vec{\Omega}\rangle = \vec{A}[:,S] \vec{A}[S,S]^+ \vec{A}[S,:] = : \vec{A} \langle S\rangle.

(7.5)

In particular, since $\vec{A}$ is symmetric, we can compute $\vec{A} \langle \vec{\Omega}\rangle$ having only observed $\vec{A}(:,S)$ , which contains just $kn$ entries of $\vec{A}$ .

The key question is how best to choose the columns of $\vec{A}$ . Ideally, we would like to choose $k$ columns so that the Nyström approximation is competitive with the best rank- $k$ approximation to $\vec{A}$ .

Partial Cholesky factorization with pivoting¶

It turns out that, to compute (7.5) we can use a Cholesky factorization algorithm with pivoting, and stop after $k$ steps.

Note that the “textbook” Cholesky factorization algorithms maintain $\vec{A} - \vec{F}_i \vec{F}_i^\T$ directly. On the other hand, anticipating that we will terminate for some $k<n$ , Algorithm 7.2 only computes the necessary parts of $\vec{A} - \vec{F}_i\vec{F}_i^\T$ as they are needed.

Proof

See Ethan’s blog.

Adaptive pivoting¶

Nothing in Algorithm 7.2 requires that the pivot $s_i$ be chosen prior to step $i$ ! In particular, we can choose the $i$ -th pivot adaptively, based on the approximation $\vec{A}\langle S_{i-1}\rangle = \vec{F}_{i-1}\vec{F}_{i-1}^\T$ , where $S_i := (s_1, \ldots, s_{i-1})$ . While computing the error $\vec{A} - \vec{A}\langle S_{i-1}\rangle$ would let us try to find the column that would reduce the error the most, we want to avoid looking at all of the entires of $\vec{A}$ .

Amazingly, we can find good pivots without observing all of $\vec{A}$ . Towards this end, note that:

The error $\vec{A} - \vec{A}\langle \vec{\Omega} \rangle$ of a Nyström approximation is positive definite.
For any positive definite $\vec{E}$ , $\|\vec{E}\|\leq \|\vec{E}\|_\F \leq \tr(\vec{E})$ .

By computing the $n$ diagonal entries of $\vec{A}$ , we can keep track of $\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle )$ and use this to choose the pivot. One approach is to greedily choose the pivot as the largest entries of $\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle )$ . However, this approach is has the tendency to focus on outlier entires. Instead, we can sample proprtional to the values $\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle )$ . This results in the Randomly Pivoted Cholesky algorithm introduced in Chen et al., 2024

Algorithm 7.3 (Randomly pivoted Cholesky)

Input: $\vec{A}\in\R^{n\times n}$

Initialize $\vec{F}_0 = []\in\R^{n\times 0}$ , $\vec{d}_0 = \operatorname{diag}(\vec{A})$
For $i=1,\ldots, k$
- Sample $s_i$ so that $\PP[s_i = j] \propto \vec{d}_{i-1}[j]$
- $\tilde{\vec{g}}_i = \vec{A}[:,s_i] - \vec{F}_{i-1}\vec{F}_{i-1}[s_i,:]^\T$
- $\vec{g}_i = \tilde{\vec{g}}_i / \sqrt{\tilde{\vec{g}}_i[s_i]}$
- $\vec{F}_i = [\vec{F}_{i-1} ~ \vec{g}_i ] \in \R^{n\times i}$
- $\vec{d}_i = \vec{d}_{i-1} - \operatorname{diag}(\vec{g}_i\vec{g}_{i}^\T)$

Output: $\vec{F}_k\vec{F}_k^\T$

References¶

Chen, Y., Epperly, E. N., Tropp, J. A., & Webber, R. J. (2024). Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. Communications on Pure and Applied Mathematics, 78(5), 995–1041. 10.1002/cpa.22234

Randomized Numerical Linear Algebra with Examples

CUR Decomposition