Skip to article frontmatterSkip to article content

The sketch-and-solve paradigm is perhaps the simplest randomized approach to the linear regression problem Linear Regression. This method produces a low-accuracy solution, so it is only suitable for special computational problems where alternatives are not viable. Nevertheless, it is pedagogically important.

To quantify the accuracy, it is common to compare the residual of the approximate solution x^\widehat{\vec{x}} to the optimal residual. Specifically, we want to ensure that, for some accuracy parameter ε>0\varepsilon > 0,

bAx^(1+ε)bAx,\|\vec{b}-\vec{A}\widehat{\vec{x}}\| \leq (1+\varepsilon) \|\vec{b}-\vec{A}\vec{x}^*\|,

where x\vec{x}^* is the solution to Linear Regression. Such a guarantee on the residual is equivalent to a bound on the error.

Proof

Note that the true residual bAx\vec{b} - \vec{A}\vec{x}^* is orthogonal to the range of A\vec{A}; i.e., AT(bAx)=0\vec{A}^\T(\vec{b} - \vec{A}\vec{x}^*) = \vec{0}. Thus, for any x~Rn\tilde{\vec{x}} \in \R^n, by the Pythagorean theorem,

bAx~2=bA(x+x~x)2=bAxA(xx~)2=bAx2+A(xx~)2.\begin{aligned} \|\vec{b} - \vec{A}\tilde{\vec{x}}\|^2 &= \|\vec{b} - \vec{A}(\vec{x}^* + \tilde{\vec{x}} - \vec{x}^*)\|^2 \\&= \| \vec{b} - \vec{A}\vec{x}^* - \vec{A}(\vec{x}^* - \tilde{\vec{x}}) \|^2 \\&= \|\vec{b} - \vec{A}\vec{x}^*\|^2 + \|\vec{A}(\vec{x}^* - \tilde{\vec{x}})\|^2. \end{aligned}

Rearranging, we find that

A(xx~)=(bAx~2bAx2)1/2.\|\vec{A}(\vec{x}^* - \tilde{\vec{x}})\| = \left(\|\vec{b} - \vec{A}\tilde{\vec{x}}\|^2 - \|\vec{b} - \vec{A}\vec{x}^*\|^2\right)^{1/2}.

Active Regression

An important setting where sketch-and-solve is particularly useful is the active regression problem. In this task, the cost of the problem is measured by the number of entries of b\vec{b} that we observe. By using a subsampling sketch, we can solve the active regression problem by observing only a few entries of b\vec{b}. We discuss this problem in more detail in the Chapter on Sampling Based Methods that discusses how to use sketch-and-solve to solve the active regression problem.

Analysis

The Gaussian case

To get a handle on the sketch-and-solve algorithm, we will consider the case where the sketching matrix S\vec{S} is a Gaussian random matrix. Since other mixing-based sketching methods behave similarly to a Gaussian sketch with regard to embedding dimension kk, , this analysis will give us insight into the performance of sketch-and-solve with other sketching methods. Note that using a Gaussian sketch results in an algorithm that is not computationally efficient, so other mixing-based sketching methods should be used in practice.

Thus we see we solve (4.1) when k=O(d/ε)k = O(d/\varepsilon).

Proof

Decompose

b=AA+b+(bAA+b)=Ax+r,\vec{b} = \vec{A}\vec{A}^+ \vec{b} + (\vec{b} - \vec{A}\vec{A}^+ \vec{b} ) = \vec{A}\vec{x}^* + \vec{r},

where r=bAx\vec{r} = \vec{b} - \vec{A}\vec{x}^* is the residual vector. We can then write

x^=x+(SA)+S(bAA+b).\widehat{\vec{x}} = \vec{x}^* + (\vec{S}\vec{A})^+ \vec{S}(\vec{b} - \vec{A}\vec{A}^+ \vec{b} ).

Let U1\vec{U}_1 be an orthonormal basis for the range of A\vec{A} and and U2\vec{U}_2 be an orthonormal basis for the orthogonal complement of the range of A\vec{A}. Decompose

S=S1U1T+S2U2T,S1=SU1,S2=SU2,\vec{S} = \vec{S}_1\vec{U}_1^\T + \vec{S}_2\vec{U}_2^\T, \qquad \vec{S}_1 = \vec{S}\vec{U}_1, \quad \vec{S}_2 = \vec{S}\vec{U}_2,

Since [U1U2][\vec{U}_1 \, \vec{U}_2] is an orthogonal matrix, by the Gaussian orthogonal invariance property, we have that S1Gaussian(k,d)\vec{S}_1\sim \Call{Gaussian}(k,d) and S2Gaussian(k,nd)\vec{S}_2\sim \Call{Gaussian}(k,n-d) are independent. Moreover, by the optimality of x\vec{x}^*, r\vec{r} is orthogonal to the range of A\vec{A}. Hence,

x^=x+(S1U1TA)+S2U2Tr.\widehat{\vec{x}} = \vec{x}^* + (\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}.

Therefore,

E[x^]=x+E[(S1U1TA)+]E[S2]U2Tr=x.\EE[\widehat{\vec{x}}] = \vec{x}^* + \EE[(\vec{S}_1\vec{U}_1^\T\vec{A})^+] \EE[\vec{S}_2]\vec{U}_2^\T\vec{r} = \vec{x}^*.

Next, note that

bAx^2= r+A(S1U1TA)+S2U2Tr2=r2+A(S1U1TA)+S2U2Tr2,=r2+U1TA(S1U1TA)+S2U2Tr2,=r2+S1+S2U2Tr2,\begin{aligned} \|\vec{b} - \vec{A}\widehat{\vec{x}}\|^2 &=\ \| \vec{r} + \vec{A}(\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2 \\&= \|\vec{r}\|^2 + \|\vec{A}(\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2, \\&= \|\vec{r}\|^2 + \|\vec{U}_1^\T\vec{A}(\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2, \\&= \|\vec{r}\|^2 + \|\vec{S}_1^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2, \end{aligned}

where we used the fact that r\vec{r} is orthogonal to the range of A\vec{A}, the Pythagorean theorem and orthogonal invariance of the Euclidean norm, and an identity for the psuedoinverse.

Finally, by a direct computation and the Pythagorean theorem and orthogonal invariance of the Euclidean norm

E[S1+S2U2Tr2]=E[S1F2]U2Tr2=dkd1r2.\EE[\|\vec{S}_1^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2] = \EE[ \| \vec{S}_1 \|_\F^2 ] \|\vec{U}_2^\T\vec{r}\|^2 = \frac{d}{k-d-1} \|\vec{r}\|^2.

Combining the above equations gives the result.

A similar proof can be found on Ethan’s blog.

Bounds based on Subspace Embeddings

If S\vec{S} is a subspace embedding, then it is easy to see that the solution x^\widehat{\vec{x}} returned by the sketch-and-solve algorithm is accurate:

Proof

Observe that, for any zRd\vec{z}\in\R^{d}, bAzrange([A,b])\vec{b} - \vec{A}\vec{z} \in \range([\vec{A}, \vec{b}]) and hence, since S\vec{S} is a subspace embedding for [A,b][\vec{A},\vec{b}], we have

(1ε)bAzS(bAz)(1+ε)bAz.(1-\varepsilon)\|\vec{b} - \vec{A}\vec{z}\|\leq \| \vec{S}(\vec{b} - \vec{A}\vec{z}) \| \leq (1+\varepsilon)\|\vec{b} - \vec{A}\vec{z}\|.

Therefore,

bAx^11εS(bAx^)=11εminxS(bAx)1+ε1εminxbAx.=1+ε1εbAx.\begin{aligned} \|\vec{b} - \vec{A}\widehat{\vec{x}}\| &\leq \frac{1}{1-\varepsilon} \|\vec{S}(\vec{b} - \vec{A}\widehat{\vec{x}})\| \\ \\&= \frac{1}{1-\varepsilon} \min_{\vec{x}} \|\vec{S}(\vec{b} - \vec{A}\vec{x})\| \\&\leq \frac{1+\varepsilon}{1-\varepsilon} \min_{\vec{x}} \|\vec{b} - \vec{A}\vec{x}\|. \\&= \frac{1+\varepsilon}{1-\varepsilon} \|\vec{b} - \vec{A}\vec{x}^*\|. \end{aligned}

While the proof of Theorem 4.3 is extremely simple, it does not recover the rate predicted by our analysis of the Gaussian case (recall a Gaussian matrix is a ε\varepsilon subspace embedding when k=O(d/ε2)k = O(d/\varepsilon^2)). In fact, as outlined on Ethan’s blog, a more careful analysis actually yields a quadratically better rate

bAx^(1+4ε2+O(ε3))minxbAx.\|\vec{b}-\vec{A}\widehat{\vec{x}}\| \leq (1+4\varepsilon^2 + O(\varepsilon^3)) \min_{\vec{x}} \|\vec{b}-\vec{A}\vec{x}\|.

Explicit bounds are available.

Numerical Experiment

Let’s look at the accuracy of the sketch-and-solve algorithm as a function of the sketching dimension kk for the different mixing-based sketches.

# Generate a random matrix with controlled condition number
n = 50_000
d = 100
cond = 1e8

U, s, Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1/cond,1,d)  # define singular values
A = U@np.diag(s)@Vt

# define ||b|| and ||b-Ax*||
b_norm = 1
residual_norm = .1

# construct right hand side
v = np.random.randn(n)
v_span = U @(U.T@v)
v_perp = v - v_span
v_span /= np.linalg.norm(v_span)
v_perp /= np.linalg.norm(v_perp)

b = v_span*np.sqrt(b_norm**2-residual_norm**2)+v_perp*residual_norm

# construct true solution
Ax_true = U@(U.T@b)
# Define sketching methods
sketch_methods = [
    {'name': 'Gaussian',
     'func': lambda k,rng: gaussian_sketch(n,k,rng)},
    {'name': 'Trig',
     'func': lambda k,rng: trig_sketch(n,k,rng)},
    {'name': 'Sparse (zeta=4)',
     'func': lambda k,rng: sparse_stack_sketch(n,k,4,rng)},
    {'name': 'Sparse (zeta=8)',
     'func': lambda k,rng: sparse_stack_sketch(n,k,8,rng)},
]
Source
# Run experiment 
ks = np.geomspace(2*d,3000,10,dtype=int)

n_repeat = 20
results = {}

for sketch_method in sketch_methods:
    method_name = sketch_method['name']

    errors = np.zeros((n_repeat,len(ks)))
    for i in range(n_repeat):
        rng = np.random.RandomState(i)
        for j,k in enumerate(ks):
        
            S = sketch_method['func'](k,rng)
            
            x = np.linalg.lstsq(S@A,S@b,rcond=None)[0]
            errors[i,j] = np.linalg.norm(Ax_true-A@x)

    results[method_name] = {
        'error': errors,
    }

# Plot the results
σ = 0.1

fig,axs = plt.subplots(1,1,figsize=(10,4))

for sketch_method in sketch_methods:
    method_name = sketch_method['name']
    # Get the error data for this problem and method
    error = results[method_name]['error']
    
    bot, mid, top = np.quantile(error, [σ, .5, 1-σ], axis=0)
    plt.plot(ks/d, mid, label=method_name, linewidth=2)
    plt.fill_between(ks/d, bot, top, alpha=.2)

# Theoretical bound
plt.plot(ks/d, np.sqrt(d/(ks-d-1))*residual_norm, ls=':', color='k', label=r'Gaussian expectation')

plt.ylabel(r'accuracy: $\|A(x^*-\widehat{x})\|^2$')
plt.xlabel('Sketch size ($k/d$)')
plt.yscale('log')
plt.xscale('log')
plt.legend()

plt.savefig('sketch_and_solve.svg')
plt.close()
Performance comparison plot showing accuracy vs sketch size for different sketching methods in sketch-and-solve. Log-log plot shows error decreasing as sketch size increases, with all methods (Gaussian, Trigonometric, Sparse variants) performing similarly and following the theoretical Gaussian expectation bound. The y-axis shows solution accuracy and x-axis shows sketch size relative to problem dimension.

Similar to as we observed in our numerical study of the embedding dimension, all sketching distributions behave very similarly to the Gaussian sketch.