Skip to article frontmatterSkip to article content

Suppose A\vec{A} is positive definite and recall the Nyström approximation

AΩ:=AΩ(ΩTAΩ)+ΩTA.\vec{A} \langle \vec{\Omega}\rangle := \vec{A}\vec{\Omega} ( \vec{\Omega}^\T\vec{A}\vec{\Omega})^+ \vec{\Omega}^\T\vec{A}.

When the columns of Ω\vec{\Omega} a subset of the columns of the identity, then AΩ\vec{A}\vec{\Omega} corresponds to subsampling the columns of A\vec{A}. In particular, let S{1,,n}S \subseteq \{1, \ldots, n\} be a tuple containing the kk pivots (columns of A\vec{A}) we will observe; i.e. so that Ω=I[:,S]\vec{\Omega} = \vec{I}[:,S]. Then

AΩ=A[:,S]A[S,S]+A[S,:]=:AS.\vec{A} \langle \vec{\Omega}\rangle = \vec{A}[:,S] \vec{A}[S,S]^+ \vec{A}[S,:] = : \vec{A} \langle S\rangle.

In particular, since A\vec{A} is symmetric, we can compute AΩ\vec{A} \langle \vec{\Omega}\rangle having only observed A(:,S)\vec{A}(:,S), which contains just knkn entries of A\vec{A}.

The key question is how best to choose the columns of A\vec{A}. Ideally, we would like to choose kk columns so that the Nyström approximation is competitive with the best rank-kk approximation to A\vec{A}.

Partial Cholesky factorization with pivoting

It turns out that, to compute (7.5) we can use a Cholesky factorization algorithm with pivoting, and stop after kk steps.

Note that the “textbook” Cholesky factorization algorithms maintain AFiFiT\vec{A} - \vec{F}_i \vec{F}_i^\T directly. On the other hand, anticipating that we will terminate for some k<nk<n, Algorithm 7.2 only computes the necessary parts of AFiFiT\vec{A} - \vec{F}_i\vec{F}_i^\T as they are needed.

Proof

See Ethan’s blog.

Adaptive pivoting

Nothing in Algorithm 7.2 requires that the pivot sis_i be chosen prior to step ii! In particular, we can choose the ii-th pivot adaptively, based on the approximation ASi1=Fi1Fi1T\vec{A}\langle S_{i-1}\rangle = \vec{F}_{i-1}\vec{F}_{i-1}^\T, where Si:=(s1,,si1)S_i := (s_1, \ldots, s_{i-1}). While computing the error AASi1\vec{A} - \vec{A}\langle S_{i-1}\rangle would let us try to find the column that would reduce the error the most, we want to avoid looking at all of the entires of A\vec{A}.

Amazingly, we can find good pivots without observing all of A\vec{A}. Towards this end, note that:

  1. The error AAΩ\vec{A} - \vec{A}\langle \vec{\Omega} \rangle of a Nyström approximation is positive definite.

  2. For any positive definite E\vec{E}, EEFtr(E)\|\vec{E}\|\leq \|\vec{E}\|_\F \leq \tr(\vec{E}).

By computing the nn diagonal entries of A\vec{A}, we can keep track of diag(AASi1)\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle ) and use this to choose the pivot. One approach is to greedily choose the pivot as the largest entries of diag(AASi1)\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle ). However, this approach is has the tendency to focus on outlier entires. Instead, we can sample proprtional to the values diag(AASi1)\operatorname{diag}(\vec{A} - \vec{A}\langle S_{i-1}\rangle ). This results in the Randomly Pivoted Cholesky algorithm introduced in Chen et al., 2024

References
  1. Chen, Y., Epperly, E. N., Tropp, J. A., & Webber, R. J. (2024). Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. Communications on Pure and Applied Mathematics, 78(5), 995–1041. 10.1002/cpa.22234