The sketch-and-solve paradigm is perhaps the simplest randomized approach to the linear regression problem Linear Regression. This method produces a low-accuracy solution, so it is only suitable for special computational problems where alternatives are not viable. Nevertheless, it is pedagogically important.
To quantify the accuracy, it is common to compare the residual of the approximate solution to the optimal residual. Specifically, we want to ensure that, for some accuracy parameter ,
where is the solution to Linear Regression. Such a guarantee on the residual is equivalent to a bound on the error.
Proof
Note that the true residual is orthogonal to the range of ; i.e., . Thus, for any , by the Pythagorean theorem,
Rearranging, we find that
Active Regression¶
An important setting where sketch-and-solve is particularly useful is the active regression problem. In this task, the cost of the problem is measured by the number of entries of that we observe. By using a subsampling sketch, we can solve the active regression problem by observing only a few entries of . We discuss this problem in more detail in the Chapter on Sampling Based Methods that discusses how to use sketch-and-solve to solve the active regression problem.
Analysis¶
The Gaussian case¶
To get a handle on the sketch-and-solve algorithm, we will consider the case where the sketching matrix is a Gaussian random matrix. Since other mixing-based sketching methods behave similarly to a Gaussian sketch with regard to embedding dimension , , this analysis will give us insight into the performance of sketch-and-solve with other sketching methods. Note that using a Gaussian sketch results in an algorithm that is not computationally efficient, so other mixing-based sketching methods should be used in practice.
Thus we see we solve (4.1) when .
Proof
Decompose
where is the residual vector. We can then write
Let be an orthonormal basis for the range of and and be an orthonormal basis for the orthogonal complement of the range of . Decompose
Since is an orthogonal matrix, by the Gaussian orthogonal invariance property, we have that and are independent. Moreover, by the optimality of , is orthogonal to the range of . Hence,
Therefore,
Next, note that
where we used the fact that is orthogonal to the range of , the Pythagorean theorem and orthogonal invariance of the Euclidean norm, and an identity for the psuedoinverse.
Finally, by a direct computation and the Pythagorean theorem and orthogonal invariance of the Euclidean norm
Combining the above equations gives the result.
A similar proof can be found on Ethan’s blog.
Bounds based on Subspace Embeddings¶
If is a subspace embedding, then it is easy to see that the solution returned by the sketch-and-solve algorithm is accurate:
Proof
Observe that, for any , and hence, since is a subspace embedding for , we have
Therefore,
While the proof of Theorem 4.3 is extremely simple, it does not recover the rate predicted by our analysis of the Gaussian case (recall a Gaussian matrix is a subspace embedding when ). In fact, as outlined on Ethan’s blog, a more careful analysis actually yields a quadratically better rate
Explicit bounds are available.
Numerical Experiment¶
Let’s look at the accuracy of the sketch-and-solve algorithm as a function of the sketching dimension for the different mixing-based sketches.
# Generate a random matrix with controlled condition number
n = 50_000
d = 100
cond = 1e8
U, s, Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1/cond,1,d) # define singular values
A = U@np.diag(s)@Vt
# define ||b|| and ||b-Ax*||
b_norm = 1
residual_norm = .1
# construct right hand side
v = np.random.randn(n)
v_span = U @(U.T@v)
v_perp = v - v_span
v_span /= np.linalg.norm(v_span)
v_perp /= np.linalg.norm(v_perp)
b = v_span*np.sqrt(b_norm**2-residual_norm**2)+v_perp*residual_norm
# construct true solution
Ax_true = U@(U.T@b)
# Define sketching methods
sketch_methods = [
{'name': 'Gaussian',
'func': lambda k,rng: gaussian_sketch(n,k,rng)},
{'name': 'Trig',
'func': lambda k,rng: trig_sketch(n,k,rng)},
{'name': 'Sparse (zeta=4)',
'func': lambda k,rng: sparse_stack_sketch(n,k,4,rng)},
{'name': 'Sparse (zeta=8)',
'func': lambda k,rng: sparse_stack_sketch(n,k,8,rng)},
]
Source
# Run experiment
ks = np.geomspace(2*d,3000,10,dtype=int)
n_repeat = 20
results = {}
for sketch_method in sketch_methods:
method_name = sketch_method['name']
errors = np.zeros((n_repeat,len(ks)))
for i in range(n_repeat):
rng = np.random.RandomState(i)
for j,k in enumerate(ks):
S = sketch_method['func'](k,rng)
x = np.linalg.lstsq(S@A,S@b,rcond=None)[0]
errors[i,j] = np.linalg.norm(Ax_true-A@x)
results[method_name] = {
'error': errors,
}
# Plot the results
σ = 0.1
fig,axs = plt.subplots(1,1,figsize=(10,4))
for sketch_method in sketch_methods:
method_name = sketch_method['name']
# Get the error data for this problem and method
error = results[method_name]['error']
bot, mid, top = np.quantile(error, [σ, .5, 1-σ], axis=0)
plt.plot(ks/d, mid, label=method_name, linewidth=2)
plt.fill_between(ks/d, bot, top, alpha=.2)
# Theoretical bound
plt.plot(ks/d, np.sqrt(d/(ks-d-1))*residual_norm, ls=':', color='k', label=r'Gaussian expectation')
plt.ylabel(r'accuracy: $\|A(x^*-\widehat{x})\|^2$')
plt.xlabel('Sketch size ($k/d$)')
plt.yscale('log')
plt.xscale('log')
plt.legend()
plt.savefig('sketch_and_solve.svg')
plt.close()
Similar to as we observed in our numerical study of the embedding dimension, all sketching distributions behave very similarly to the Gaussian sketch.