The sketch-and-precondition approach solves Linear Regression by using sketching to build a high-quality preconditioner. Unlike sketch-and-solve, which may sacrifice accuracy for speed, sketch-and-precondition maintains high accuracy while achieving significant computational speedups.
Preconditioned Iterative Methods¶
Given any invertible matrix (called a preconditioner), solving Linear Regression is equivalent to solving
The rate of convergence of iterative methods applied to the preconditioned system depends on the condition number of the preconditioned matrix .
Randomized Preconditioning¶
The basic sketch-and-precondition algorithm follows a two-stage approach: first construct a preconditioner using a randomized sketch, then solve the preconditioned system using an iterative method run for iterations.
Any version of the sketched-QR algorithm that produces subspace embedding will yield a high-quality preconditioner.
Proof
The sketched-QR algorithm gives a factorization . By Theorem 3.2,
Note that we initialized the iterative method from the sketch-and-solve solution. This ensures that the initial error is bounded.
Together, the above observations yield the following guarantee.
Sketch and precondition can be implemented in Numpy as follows:
def sketch_and_precondition(A,b,k,zeta,n_iters,rng):
n, d = A.shape
S = sparse_stack_sketch(n,k,zeta,rng)
Y = S @ A
Q,R = np.linalg.qr(Y, mode='reduced')
x0 = sp.linalg.solve_triangular(R,Q.T@(S@b),lower=False)
M = sp.linalg.solve_triangular(R,np.eye(d),lower=False)
x = lsqr(A,b,x0,M,n_iters)
return x
Numerical Experiment¶
We now perform numerical experiment to illustrate the behavior of sketch-and-precondition. Here we solve an ill-conditioned regression problem and compare the convergence of sketch-and-precondition for several choices of embedding dimension. Using a large embedding dimension leads to a better preconditioner and faster convergence, but requires some additional time to compute the preconditioner.
# Generate a random matrix with controlled condition number
n = 100_000
d = 800
cond = 1e8
U, s, Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1/cond,1,d) # define singular values
A = U@np.diag(s)@Vt
# define ||b|| and ||b-Ax*||
b_norm = 1
residual_norm = .1
# construct right hand side
v = np.random.randn(n)
v_span = U @(U.T@v)
v_perp = v - v_span
v_span /= np.linalg.norm(v_span)
v_perp /= np.linalg.norm(v_perp)
b = v_span*np.sqrt(b_norm**2-residual_norm**2)+v_perp*residual_norm
# construct true solution
x_true = Vt.T@(np.diag(1/s)@(U.T@b))
# Define experimental configurations
zeta = 4
rng = np.random.default_rng(0)
experiment_configs = [
{'k': 2*d, 'iters': np.linspace(0,40,5,dtype='int')},
{'k': 4*d, 'iters': np.linspace(0,23,5,dtype='int')},
{'k': 12*d, 'iters': np.linspace(0,13,5,dtype='int')},
]
Source
n_repeat = 10
errs = {}
times = {}
for experiment in experiment_configs:
k = experiment['k']
iters = experiment['iters']
for n_iters in iters:
start = time.time()
for _ in range(n_repeat):
x = sketch_and_precondition(A,b,k,zeta,n_iters,rng)
end = time.time()
avg_time = (end - start) / n_repeat
errs[k,n_iters] = np.linalg.norm(x-x_true)/np.linalg.norm(x_true)
times[k,n_iters] = avg_time
# Time baseline method
start = time.time()
for _ in range(n_repeat):
x_np = np.linalg.lstsq(A,b,rcond=None)[0]
end = time.time()
avg_time_np = (end - start) / n_repeat
err_np = np.linalg.norm(A@(x_np-x_true))/np.linalg.norm(A@x_true)
# Plot the results
fig, axs = plt.subplots(1,2,figsize=(10, 4),sharey=True)
for experiment in experiment_configs:
k = experiment['k']
iters = experiment['iters']
times_k = [times[k,n_iters] for n_iters in iters]
errs_k = [errs[k,n_iters] for n_iters in iters]
axs[0].plot(iters,errs_k,marker='o',label=f'$k={k//d}d$')
axs[1].plot(times_k,errs_k,marker='o',label=f'$k={k//d}d$')
axs[1].scatter(avg_time_np,err_np,marker='x',color='k',label='Numpy')
axs[0].set_xlabel('iterations')
axs[1].set_xlabel('time (s)')
axs[0].set_ylabel('error')
plt.legend()
plt.yscale('log')
plt.savefig('sketch_and_precondition.svg')
plt.close()
Even when targeting full-accuracy, sketch and precondition outperforms Numpy. In settings where moderate accuracy is sufficient or where is sparse, the speedups may be even more substantial!
Note also that as the embedding dimension increases, the quality of the preconditioner improves, leading to faster convergence of the iterative method. However, this also increases the time required to compute the preconditioner. A good implementation will balance these trade-offs to achieve the best overall performance.
Parameter selection¶
For sketches that behave similarly to Gaussian sketches, we can use the Gaussian theory to estimate the largest and smallest eigenvalues of of the preconditioned system as a function of the embedding dimension .
This can be used in a variety of ways. For example:
To get an a priori estimate of the number of iterations required to reach a desired accuracy.
This can then be used to balance the time spent computing the preconditioner (increases with ) vs the time spent in the iterative method (decreases with ).
To use a iterative method like heavy ball momentum which requires knowledge of the extreme eigenvalues to set parameters, but may be more parallelize than LSQR.
Further discussion can be found in Chen et al., 2025 and the references within.
LSQR is mathematically equivalent to the conjugate gradient method applied to the normal equations, but it is more numerically stable.
- Chen, T., Niroula, P., Ray, A., Subrahmanya, P., Pistoia, M., & Kumar, N. (2025). GPU-Parallelizable Randomized Sketch-and-Precondition for Linear Regression using Sparse Sign Sketches. https://arxiv.org/abs/2506.03070