This section shares the title of Ethan’s blog post, which inspired the design of the experiments we conduct here.
In the past several sections, we have seen several oblivious sketching methods. Which one should we use?
In short, we recommend Sparse Sketches as a good default choice. In settings where dense matrix-matrix multiplication is highly optimized (e.g. when working with relatively small matrices on GPUs), dense sketches may be preferable.
Comparison¶
The following summarizes the performance of the different sketching methods we have seen so far.
Sketch Type | Embedding dimension | Apply Cost |
---|---|---|
Gaussian Sketch | ||
Trigonometric Sketch | ||
SparseStack |
Theoretically, SparseStack sketches seem to be the best choice. They achieve an optimal embedding dimension and are efficient to apply to both dense and sparse .
Numerical Experiments¶
We now perform some numerical comparisons which show that SparseStack sketches are a good choice in practice as well.
Generate / Apply Cost¶
We begin by comparing the cost of generating and applying the different sketches. When timing the apply time, we consider the time to apply the sketch matrix to a dense matrix , as well as the time to apply the sketch matrix to a sparse matrix .
# Generate dense and sparse random matrices
n = 2**14
d = 200
A = np.random.randn(n,d)
A_s = sp.sparse.random(n,d, density=0.01).tocsr()
k = 10*d
# Define sketching methods
sketch_methods = [
{'name': 'Gaussian',
'func': lambda k,rng: gaussian_sketch(n,k,rng)},
{'name': 'Trig',
'func': lambda k,rng: trig_sketch(n,k,rng)},
{'name': 'Sparse (zeta=4)',
'func': lambda k,rng: sparse_stack_sketch(n,k,4,rng)},
{'name': 'Sparse (zeta=8)',
'func': lambda k,rng: sparse_stack_sketch(n,k,8,rng)},
]
Source
# Time the different sketches
n_repeat = 5
results = []
for sketch_method in sketch_methods:
method_name = sketch_method['name']
# Time the method
start = time.time()
for i in range(n_repeat):
rng = np.random.RandomState(i)
S = sketch_method['func'](k,rng)
end = time.time()
avg_gen_time = (end - start) / n_repeat
start = time.time()
for i in range(n_repeat):
S@A
end = time.time()
avg_dense_apply_time = (end - start) / n_repeat
start = time.time()
for i in range(n_repeat):
S@A_s
end = time.time()
avg_sparse_apply_time = (end - start) / n_repeat
# Compute accuracy metrics
results.append({
'method': method_name,
'gen. time (s)': avg_gen_time,
'apply to dense time (s)': avg_dense_apply_time,
'apply to sparse time (s)': avg_sparse_apply_time,
})
# Create DataFrame and compute relative performance
results_df = pd.DataFrame(results)
# Display results with formatting
results_df.style.format({
'gen. time (s)': '{:1.2e}',
'apply to dense time (s)': '{:1.2e}',
'apply to sparse time (s)': '{:1.2e}'
})
The results of the experiment are essentially as expected. We emphasize that SparseStack sketches are able to take advantage of sparsity in , which is not the case for trig sketches.
Note also that generating a Gaussian sketch is extremely expensive!
Distortion¶
We now consider the distortion of the different sketches for two different subspaces. The first subspace is a random subspace, which we generate by sampling a random orthonormal basis, and the second is a subspace spanned by the first columns of a random matrix , which is known to be a hard example for sparse sketching distributions. We also plot , which is the asymptotic distortion for large Gaussian matrices.
# Generate onb for typical and hard subspaces
U,_ = np.linalg.qr(A)
U_hard = np.zeros((n,d))
U_hard[:d] = np.eye(d)
Source
# Define the test problems
sketch_problems = {
'typical': {
'U': U,
},
'hard': {
'U': U_hard,
},
}
# Look at distortion for different sketches
ks = np.geomspace(2*d,1000,5,dtype=int)
n_repeat = 10
results = {}
for problem_name,problem_info in sketch_problems.items():
U_problem = problem_info['U']
for sketch_method in sketch_methods:
method_name = sketch_method['name']
distortion = np.zeros((n_repeat,len(ks)))
for i in range(n_repeat):
rng = np.random.RandomState(i)
for j,k in enumerate(ks):
S = sketch_method['func'](k,rng)
Y = S@U_problem
s = np.linalg.svd(Y,compute_uv=False)
distortion[i,j] = max(s[0]-1,1-s[-1])
results[problem_name,method_name] = {
'distortion': distortion,
}
# Plot the results
σ = 0.1
fig, axs = plt.subplots(1,len(sketch_problems),figsize=(10, 4),sharey=True)
for i, (problem_name, problem_info) in enumerate(sketch_problems.items()):
ax = axs[i]
for sketch_method in sketch_methods:
method_name = sketch_method['name']
# Get the distortion data for this problem and method
distortion = results[problem_name, method_name]['distortion']
bot, mid, top = np.quantile(distortion, [σ, .5, 1-σ], axis=0)
ax.plot(ks/d, mid, label=method_name, linewidth=2)
ax.fill_between(ks/d, bot, top, alpha=.2)
# Theoretical bound
ax.plot(ks/d, np.sqrt(d/ks), ls=':', color='k', label=r'$\sqrt{d/k}$ estimate')
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('Sketch size ($k/d$)')
ax.set_title(f'Distortion vs Sketch Size - {problem_name.capitalize()} Problem')
axs[0].set_ylabel(rf'Distortion $\varepsilon$')
axs[0].legend()
plt.savefig('distortion.svg')
plt.close()
On the typical problem, all sketches perform remarkably similarly to the asymptotic Gaussian rate. Numerical experiments on a wide range of problems further confirm this observation Dong & Martinsson, 2023Epperly, 2023Chen et al., 2025Camaño et al., 2025. However, on the hard problem, when the sparsity is too low, the performance of SparseStack sketches degrades slightly.
- Dong, Y., & Martinsson, P.-G. (2023). Simpler is better: a comparative study of randomized pivoting algorithms for CUR and interpolative decompositions. Advances in Computational Mathematics, 49(4). 10.1007/s10444-023-10061-z
- Epperly, E. N. (2023). Stochstic trace estimation. https://www.ethanepperly.com/index.php/2023/01/26/stochastic-trace-estimation/
- Chen, T., Niroula, P., Ray, A., Subrahmanya, P., Pistoia, M., & Kumar, N. (2025). GPU-Parallelizable Randomized Sketch-and-Precondition for Linear Regression using Sparse Sign Sketches. https://arxiv.org/abs/2506.03070
- Camaño, C., Epperly, E. N., Meyer, R. A., & Tropp, J. A. (2025). Faster Linear Algebra Algorithms with Structured Random Matrices. https://arxiv.org/abs/2508.21189