Sketch and Precondition - Randomized Numerical Linear Algebra with Examples

The sketch-and-precondition approach solves Linear Regression by using sketching to build a high-quality preconditioner. Unlike sketch-and-solve, which may sacrifice accuracy for speed, sketch-and-precondition maintains high accuracy while achieving significant computational speedups.

Preconditioned Iterative Methods¶

Given any invertible matrix $\vec{M}\in\R^{d\times d}$ (called a preconditioner), solving Linear Regression is equivalent to solving

\min_{\vec{y}} \|\vec{A}\vec{M}\vec{y} - \vec{b}\| ,\quad \vec{x} = \vec{M}\vec{y}.

(4.2)

The rate of convergence of iterative methods applied to the preconditioned system depends on the condition number of the preconditioned matrix $\vec{A}\vec{M}$ .

Randomized Preconditioning¶

The basic sketch-and-precondition algorithm follows a two-stage approach: first construct a preconditioner using a randomized sketch, then solve the preconditioned system using an iterative method run for $t$ iterations.

Any version of the sketched-QR algorithm that produces subspace embedding will yield a high-quality preconditioner.

Proof

The sketched-QR algorithm gives a factorization $\vec{A} = \vec{Q}\vec{R}$ . By Theorem 3.2,

\cond(\vec{A}\vec{M}) = \cond(\vec{A}\vec{R}^{-1}) = \cond(\vec{Q}) \leq (1+\varepsilon)/(1-\varepsilon)

Note that we initialized the iterative method from the sketch-and-solve solution. This ensures that the initial error is bounded.

Together, the above observations yield the following guarantee.

Sketch and precondition can be implemented in Numpy as follows:

def sketch_and_precondition(A,b,k,zeta,n_iters,rng):
    
    n, d = A.shape
    S = sparse_stack_sketch(n,k,zeta,rng) 
    Y = S @ A 
    Q,R = np.linalg.qr(Y, mode='reduced')

    x0 = sp.linalg.solve_triangular(R,Q.T@(S@b),lower=False)

    M = sp.linalg.solve_triangular(R,np.eye(d),lower=False)

    x = lsqr(A,b,x0,M,n_iters)

    return x

Numerical Experiment¶

We now perform numerical experiment to illustrate the behavior of sketch-and-precondition. Here we solve an ill-conditioned regression problem and compare the convergence of sketch-and-precondition for several choices of embedding dimension. Using a large embedding dimension leads to a better preconditioner and faster convergence, but requires some additional time to compute the preconditioner.

import numpy as np
import scipy as sp
import time
import pandas as pd

import matplotlib.pyplot as plt

import sys
sys.path.append('../')
from randnla import *

# Generate a random matrix with controlled condition number
n = 100_000
d = 800
cond = 1e8

U, s, Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1/cond,1,d)  # define singular values
A = U@np.diag(s)@Vt

# define ||b|| and ||b-Ax*||
b_norm = 1
residual_norm = .1

# construct right hand side
v = np.random.randn(n)
v_span = U @(U.T@v)
v_perp = v - v_span
v_span /= np.linalg.norm(v_span)
v_perp /= np.linalg.norm(v_perp)

b = v_span*np.sqrt(b_norm**2-residual_norm**2)+v_perp*residual_norm

# construct true solution
x_true = Vt.T@(np.diag(1/s)@(U.T@b))

# Define experimental configurations
zeta = 4
rng = np.random.default_rng(0) 

experiment_configs = [
    {'k': 2*d, 'iters': np.linspace(0,40,5,dtype='int')},
    {'k': 4*d, 'iters': np.linspace(0,23,5,dtype='int')},
    {'k': 12*d, 'iters': np.linspace(0,13,5,dtype='int')},
]

n_repeat = 10
errs = {}
times = {}

for experiment in experiment_configs:
    k = experiment['k']
    iters = experiment['iters']

    for n_iters in iters:

        start = time.time()
        for _ in range(n_repeat):
            x = sketch_and_precondition(A,b,k,zeta,n_iters,rng)
        end = time.time()
        avg_time = (end - start) / n_repeat
        
        errs[k,n_iters] = np.linalg.norm(x-x_true)/np.linalg.norm(x_true)
        times[k,n_iters] = avg_time

# Time baseline method
start = time.time()
for _ in range(n_repeat):
    x_np = np.linalg.lstsq(A,b,rcond=None)[0]
end = time.time()

avg_time_np = (end - start) / n_repeat
err_np = np.linalg.norm(A@(x_np-x_true))/np.linalg.norm(A@x_true)

# Plot the results
fig, axs = plt.subplots(1,2,figsize=(10, 4),sharey=True)

for experiment in experiment_configs:
    k = experiment['k']
    iters = experiment['iters']
    
    times_k = [times[k,n_iters] for n_iters in iters]
    errs_k = [errs[k,n_iters] for n_iters in iters]

    axs[0].plot(iters,errs_k,marker='o',label=f'$k={k//d}d$')
    axs[1].plot(times_k,errs_k,marker='o',label=f'$k={k//d}d$')

axs[1].scatter(avg_time_np,err_np,marker='x',color='k',label='Numpy')

axs[0].set_xlabel('iterations')
axs[1].set_xlabel('time (s)')

axs[0].set_ylabel('error')
plt.legend()
plt.yscale('log')

plt.savefig('sketch_and_precondition.svg')
plt.close()

Two-panel comparison plot for sketch-and-precondition performance. Left panel shows error vs iterations for different sketch sizes (k=2d, 4d, 12d), demonstrating faster convergence with larger sketches. Right panel shows error vs computation time, comparing sketch-and-precondition methods to NumPy's direct solver (marked with x). All methods achieve better time-to-accuracy trade-offs than the direct method.

Even when targeting full-accuracy, sketch and precondition outperforms Numpy. In settings where moderate accuracy is sufficient or where $\vec{A}$ is sparse, the speedups may be even more substantial!

Note also that as the embedding dimension $k$ increases, the quality of the preconditioner improves, leading to faster convergence of the iterative method. However, this also increases the time required to compute the preconditioner. A good implementation will balance these trade-offs to achieve the best overall performance.

Parameter selection¶

For sketches that behave similarly to Gaussian sketches, we can use the Gaussian theory to estimate the largest and smallest eigenvalues of of the preconditioned system $\vec{A}\vec{M}$ as a function of the embedding dimension $k$ .

This can be used in a variety of ways. For example:

To get an a priori estimate of the number of iterations required to reach a desired accuracy.
- This can then be used to balance the time spent computing the preconditioner (increases with $k$ ) vs the time spent in the iterative method (decreases with $k$ ).
To use a iterative method like heavy ball momentum which requires knowledge of the extreme eigenvalues to set parameters, but may be more parallelize than LSQR.

Further discussion can be found in Chen et al., 2025 and the references within.

Footnotes¶

LSQR is mathematically equivalent to the conjugate gradient method applied to the normal equations, but it is more numerically stable.
↩

References¶

Chen, T., Niroula, P., Ray, A., Subrahmanya, P., Pistoia, M., & Kumar, N. (2025). GPU-Parallelizable Randomized Sketch-and-Precondition for Linear Regression using Sparse Sign Sketches. https://arxiv.org/abs/2506.03070

Randomized Numerical Linear Algebra with Examples

Sketch and Solve