Skip to article frontmatterSkip to article content

This section shares the title of Ethan’s blog post, which inspired the design of the experiments we conduct here.

In the past several sections, we have seen several oblivious sketching methods. Which one should we use?

In short, we recommend Sparse Sketches as a good default choice. In settings where dense matrix-matrix multiplication is highly optimized (e.g. when working with relatively small matrices on GPUs), dense sketches may be preferable.

Comparison

The following summarizes the performance of the different sketching methods we have seen so far.

Sketch TypeEmbedding dimension kkApply Cost
Gaussian SketchO(d)O(d)O(nnz(A)k)O(\nnz(\vec{A}) k)
Trigonometric SketchO(dlog(d))O(d\log(d))O(ndlog(n))O(n d \log (n))
SparseStackO(d)O(d)O(nnz(A)log(d))O(\nnz(\vec{A}) \log (d))

Theoretically, SparseStack sketches seem to be the best choice. They achieve an optimal embedding dimension O(d)O(d) and are efficient to apply to both dense and sparse A\vec{A}.

Numerical Experiments

We now perform some numerical comparisons which show that SparseStack sketches are a good choice in practice as well.

Generate / Apply Cost

We begin by comparing the cost of generating and applying the different sketches. When timing the apply time, we consider the time to apply the sketch matrix to a dense matrix A\vec{A}, as well as the time to apply the sketch matrix to a sparse matrix A\vec{A}.

# Generate dense and sparse random matrices
n = 2**14
d = 200

A = np.random.randn(n,d)
A_s = sp.sparse.random(n,d, density=0.01).tocsr()
k = 10*d

# Define sketching methods
sketch_methods = [
    {'name': 'Gaussian',
     'func': lambda k,rng: gaussian_sketch(n,k,rng)},
    {'name': 'Trig',
     'func': lambda k,rng: trig_sketch(n,k,rng)},
    {'name': 'Sparse (zeta=4)',
     'func': lambda k,rng: sparse_stack_sketch(n,k,4,rng)},
    {'name': 'Sparse (zeta=8)',
     'func': lambda k,rng: sparse_stack_sketch(n,k,8,rng)},
]
Source


# Time the different sketches


n_repeat = 5
results = []
for sketch_method in sketch_methods:
    method_name = sketch_method['name']
    
    # Time the method
    start = time.time()
    for i in range(n_repeat):
        rng = np.random.RandomState(i)
        S = sketch_method['func'](k,rng)
    end = time.time()
    
    avg_gen_time = (end - start) / n_repeat

    start = time.time()
    for i in range(n_repeat):
        S@A
    end = time.time()
    
    avg_dense_apply_time = (end - start) / n_repeat 

    start = time.time()
    for i in range(n_repeat):
        S@A_s
    end = time.time()
    
    avg_sparse_apply_time = (end - start) / n_repeat 
    
    # Compute accuracy metrics
    results.append({
        'method': method_name,
        'gen. time (s)': avg_gen_time,
        'apply to dense time (s)': avg_dense_apply_time,
        'apply to sparse time (s)': avg_sparse_apply_time,
    })


# Create DataFrame and compute relative performance
results_df = pd.DataFrame(results)

# Display results with formatting
results_df.style.format({
    'gen. time (s)': '{:1.2e}',
    'apply to dense time (s)': '{:1.2e}',
    'apply to sparse time (s)': '{:1.2e}'
})
Loading...

The results of the experiment are essentially as expected. We emphasize that SparseStack sketches are able to take advantage of sparsity in A\vec{A}, which is not the case for trig sketches.

Note also that generating a Gaussian sketch is extremely expensive!

Distortion

We now consider the distortion of the different sketches for two different subspaces. The first subspace is a random subspace, which we generate by sampling a random orthonormal basis, and the second is a subspace spanned by the first kk columns of a random matrix A\vec{A}, which is known to be a hard example for sparse sketching distributions. We also plot ϵ=d/k\epsilon = \sqrt{d/k}, which is the asymptotic distortion for large Gaussian matrices.

# Generate onb for typical and hard subspaces
U,_ = np.linalg.qr(A)

U_hard = np.zeros((n,d))
U_hard[:d] = np.eye(d)
Source
# Define the test problems
sketch_problems = {
    'typical': {
        'U': U,
    },
    'hard': {
        'U': U_hard,
    },
}


# Look at distortion for different sketches
ks = np.geomspace(2*d,1000,5,dtype=int)

n_repeat = 10

results = {}

for problem_name,problem_info in sketch_problems.items():

    U_problem = problem_info['U']
    for sketch_method in sketch_methods:
        method_name = sketch_method['name']

        distortion = np.zeros((n_repeat,len(ks)))
        for i in range(n_repeat):
            rng = np.random.RandomState(i)
            for j,k in enumerate(ks):
            
                S = sketch_method['func'](k,rng)
                Y = S@U_problem

                s = np.linalg.svd(Y,compute_uv=False)
                distortion[i,j] = max(s[0]-1,1-s[-1]) 

        results[problem_name,method_name] = {
            'distortion': distortion,
        }


# Plot the results
σ = 0.1
fig, axs = plt.subplots(1,len(sketch_problems),figsize=(10, 4),sharey=True)


for i, (problem_name, problem_info) in enumerate(sketch_problems.items()):
    ax = axs[i]
    
    for sketch_method in sketch_methods:
        method_name = sketch_method['name']
        # Get the distortion data for this problem and method
        distortion = results[problem_name, method_name]['distortion']
        
        bot, mid, top = np.quantile(distortion, [σ, .5, 1-σ], axis=0)
        ax.plot(ks/d, mid, label=method_name, linewidth=2)
        ax.fill_between(ks/d, bot, top, alpha=.2)

    # Theoretical bound
    ax.plot(ks/d, np.sqrt(d/ks), ls=':', color='k', label=r'$\sqrt{d/k}$ estimate')

    ax.set_yscale('log')
    ax.set_xscale('log')
    ax.set_xlabel('Sketch size ($k/d$)')
    ax.set_title(f'Distortion vs Sketch Size - {problem_name.capitalize()} Problem')

axs[0].set_ylabel(rf'Distortion $\varepsilon$')
axs[0].legend()

plt.savefig('distortion.svg')
plt.close()
Two-panel comparison plot showing distortion vs sketch size. Left panel shows "Typical Problem" where all sketching methods (Gaussian, Trigonometric, Sparse with different zeta values) perform similarly and follow the theoretical sqrt(d/k) bound. Right panel shows "Hard Problem" where sparse sketches show slightly degraded performance compared to dense methods

On the typical problem, all sketches perform remarkably similarly to the asymptotic Gaussian rate. Numerical experiments on a wide range of problems further confirm this observation Dong & Martinsson, 2023Epperly, 2023Chen et al., 2025Camaño et al., 2025. However, on the hard problem, when the sparsity is too low, the performance of SparseStack sketches degrades slightly.

References
  1. Dong, Y., & Martinsson, P.-G. (2023). Simpler is better: a comparative study of randomized pivoting algorithms for CUR and interpolative decompositions. Advances in Computational Mathematics, 49(4). 10.1007/s10444-023-10061-z
  2. Epperly, E. N. (2023). Stochstic trace estimation. https://www.ethanepperly.com/index.php/2023/01/26/stochastic-trace-estimation/
  3. Chen, T., Niroula, P., Ray, A., Subrahmanya, P., Pistoia, M., & Kumar, N. (2025). GPU-Parallelizable Randomized Sketch-and-Precondition for Linear Regression using Sparse Sign Sketches. https://arxiv.org/abs/2506.03070
  4. Camaño, C., Epperly, E. N., Meyer, R. A., & Tropp, J. A. (2025). Faster Linear Algebra Algorithms with Structured Random Matrices. https://arxiv.org/abs/2508.21189