Sketch and Solve - Randomized Numerical Linear Algebra with Examples

The sketch-and-solve paradigm is perhaps the simplest randomized approach to the linear regression problem Linear Regression. This method produces a low-accuracy solution, so it is only suitable for special computational problems where alternatives are not viable. Nevertheless, it is pedagogically important.

To quantify the accuracy, it is common to compare the residual of the approximate solution $\widehat{\vec{x}}$ to the optimal residual. Specifically, we want to ensure that, for some accuracy parameter $\varepsilon > 0$ ,

\|\vec{b}-\vec{A}\widehat{\vec{x}}\| \leq (1+\varepsilon) \|\vec{b}-\vec{A}\vec{x}^*\|,

(4.1)

where $\vec{x}^*$ is the solution to Linear Regression. Such a guarantee on the residual is equivalent to a bound on the error.

Proof

Note that the true residual $\vec{b} - \vec{A}\vec{x}^*$ is orthogonal to the range of $\vec{A}$ ; i.e., $\vec{A}^\T(\vec{b} - \vec{A}\vec{x}^*) = \vec{0}$ . Thus, for any $\tilde{\vec{x}} \in \R^n$ , by the Pythagorean theorem,

\begin{aligned} \|\vec{b} - \vec{A}\tilde{\vec{x}}\|^2 &= \|\vec{b} - \vec{A}(\vec{x}^* + \tilde{\vec{x}} - \vec{x}^*)\|^2 \\&= \| \vec{b} - \vec{A}\vec{x}^* - \vec{A}(\vec{x}^* - \tilde{\vec{x}}) \|^2 \\&= \|\vec{b} - \vec{A}\vec{x}^*\|^2 + \|\vec{A}(\vec{x}^* - \tilde{\vec{x}})\|^2. \end{aligned}

Rearranging, we find that

\|\vec{A}(\vec{x}^* - \tilde{\vec{x}})\| = \left(\|\vec{b} - \vec{A}\tilde{\vec{x}}\|^2 - \|\vec{b} - \vec{A}\vec{x}^*\|^2\right)^{1/2}.

Active Regression¶

An important setting where sketch-and-solve is particularly useful is the active regression problem. In this task, the cost of the problem is measured by the number of entries of $\vec{b}$ that we observe. By using a subsampling sketch, we can solve the active regression problem by observing only a few entries of $\vec{b}$ . We discuss this problem in more detail in the Chapter on Sampling Based Methods that discusses how to use sketch-and-solve to solve the active regression problem.

Analysis¶

The Gaussian case¶

To get a handle on the sketch-and-solve algorithm, we will consider the case where the sketching matrix $\vec{S}$ is a Gaussian random matrix. Since other mixing-based sketching methods behave similarly to a Gaussian sketch with regard to embedding dimension $k$ , , this analysis will give us insight into the performance of sketch-and-solve with other sketching methods. Note that using a Gaussian sketch results in an algorithm that is not computationally efficient, so other mixing-based sketching methods should be used in practice.

Thus we see we solve (4.1) when $k = O(d/\varepsilon)$ .

Proof

Decompose

\vec{b} = \vec{A}\vec{A}^+ \vec{b} + (\vec{b} - \vec{A}\vec{A}^+ \vec{b} ) = \vec{A}\vec{x}^* + \vec{r},

where $\vec{r} = \vec{b} - \vec{A}\vec{x}^*$ is the residual vector. We can then write

\widehat{\vec{x}} = \vec{x}^* + (\vec{S}\vec{A})^+ \vec{S}(\vec{b} - \vec{A}\vec{A}^+ \vec{b} ).

Let $\vec{U}_1$ be an orthonormal basis for the range of $\vec{A}$ and and $\vec{U}_2$ be an orthonormal basis for the orthogonal complement of the range of $\vec{A}$ . Decompose

\vec{S} = \vec{S}_1\vec{U}_1^\T + \vec{S}_2\vec{U}_2^\T, \qquad \vec{S}_1 = \vec{S}\vec{U}_1, \quad \vec{S}_2 = \vec{S}\vec{U}_2,

Since $[\vec{U}_1 \, \vec{U}_2]$ is an orthogonal matrix, by the Gaussian orthogonal invariance property, we have that $\vec{S}_1\sim \Call{Gaussian}(k,d)$ and $\vec{S}_2\sim \Call{Gaussian}(k,n-d)$ are independent. Moreover, by the optimality of $\vec{x}^*$ , $\vec{r}$ is orthogonal to the range of $\vec{A}$ . Hence,

\widehat{\vec{x}} = \vec{x}^* + (\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}.

Therefore,

\EE[\widehat{\vec{x}}] = \vec{x}^* + \EE[(\vec{S}_1\vec{U}_1^\T\vec{A})^+] \EE[\vec{S}_2]\vec{U}_2^\T\vec{r} = \vec{x}^*.

Next, note that

\begin{aligned} \|\vec{b} - \vec{A}\widehat{\vec{x}}\|^2 &=\ \| \vec{r} + \vec{A}(\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2 \\&= \|\vec{r}\|^2 + \|\vec{A}(\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2, \\&= \|\vec{r}\|^2 + \|\vec{U}_1^\T\vec{A}(\vec{S}_1\vec{U}_1^\T\vec{A})^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2, \\&= \|\vec{r}\|^2 + \|\vec{S}_1^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2, \end{aligned}

where we used the fact that $\vec{r}$ is orthogonal to the range of $\vec{A}$ , the Pythagorean theorem and orthogonal invariance of the Euclidean norm, and an identity for the psuedoinverse.

Finally, by a direct computation and the Pythagorean theorem and orthogonal invariance of the Euclidean norm

\EE[\|\vec{S}_1^+ \vec{S}_2\vec{U}_2^\T\vec{r}\|^2] = \EE[ \| \vec{S}_1 \|_\F^2 ] \|\vec{U}_2^\T\vec{r}\|^2 = \frac{d}{k-d-1} \|\vec{r}\|^2.

Combining the above equations gives the result.

A similar proof can be found on Ethan’s blog.

Bounds based on Subspace Embeddings¶

If $\vec{S}$ is a subspace embedding, then it is easy to see that the solution $\widehat{\vec{x}}$ returned by the sketch-and-solve algorithm is accurate:

Proof

Observe that, for any $\vec{z}\in\R^{d}$ , $\vec{b} - \vec{A}\vec{z} \in \range([\vec{A}, \vec{b}])$ and hence, since $\vec{S}$ is a subspace embedding for $[\vec{A},\vec{b}]$ , we have

(1-\varepsilon)\|\vec{b} - \vec{A}\vec{z}\|\leq \| \vec{S}(\vec{b} - \vec{A}\vec{z}) \| \leq (1+\varepsilon)\|\vec{b} - \vec{A}\vec{z}\|.

Therefore,

\begin{aligned} \|\vec{b} - \vec{A}\widehat{\vec{x}}\| &\leq \frac{1}{1-\varepsilon} \|\vec{S}(\vec{b} - \vec{A}\widehat{\vec{x}})\| \\ \\&= \frac{1}{1-\varepsilon} \min_{\vec{x}} \|\vec{S}(\vec{b} - \vec{A}\vec{x})\| \\&\leq \frac{1+\varepsilon}{1-\varepsilon} \min_{\vec{x}} \|\vec{b} - \vec{A}\vec{x}\|. \\&= \frac{1+\varepsilon}{1-\varepsilon} \|\vec{b} - \vec{A}\vec{x}^*\|. \end{aligned}

While the proof of Theorem 4.3 is extremely simple, it does not recover the rate predicted by our analysis of the Gaussian case (recall a Gaussian matrix is a $\varepsilon$ subspace embedding when $k = O(d/\varepsilon^2)$ ). In fact, as outlined on Ethan’s blog, a more careful analysis actually yields a quadratically better rate

\|\vec{b}-\vec{A}\widehat{\vec{x}}\| \leq (1+4\varepsilon^2 + O(\varepsilon^3)) \min_{\vec{x}} \|\vec{b}-\vec{A}\vec{x}\|.

Explicit bounds are available.

Numerical Experiment¶

Let’s look at the accuracy of the sketch-and-solve algorithm as a function of the sketching dimension $k$ for the different mixing-based sketches.

# Generate a random matrix with controlled condition number
n = 50_000
d = 100
cond = 1e8

U, s, Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1/cond,1,d)  # define singular values
A = U@np.diag(s)@Vt

# define ||b|| and ||b-Ax*||
b_norm = 1
residual_norm = .1

# construct right hand side
v = np.random.randn(n)
v_span = U @(U.T@v)
v_perp = v - v_span
v_span /= np.linalg.norm(v_span)
v_perp /= np.linalg.norm(v_perp)

b = v_span*np.sqrt(b_norm**2-residual_norm**2)+v_perp*residual_norm

# construct true solution
Ax_true = U@(U.T@b)

# Define sketching methods
sketch_methods = [
    {'name': 'Gaussian',
     'func': lambda k,rng: gaussian_sketch(n,k,rng)},
    {'name': 'Trig',
     'func': lambda k,rng: trig_sketch(n,k,rng)},
    {'name': 'Sparse (zeta=4)',
     'func': lambda k,rng: sparse_stack_sketch(n,k,4,rng)},
    {'name': 'Sparse (zeta=8)',
     'func': lambda k,rng: sparse_stack_sketch(n,k,8,rng)},
]

# Run experiment 
ks = np.geomspace(2*d,3000,10,dtype=int)

n_repeat = 20
results = {}

for sketch_method in sketch_methods:
    method_name = sketch_method['name']

    errors = np.zeros((n_repeat,len(ks)))
    for i in range(n_repeat):
        rng = np.random.RandomState(i)
        for j,k in enumerate(ks):
        
            S = sketch_method['func'](k,rng)
            
            x = np.linalg.lstsq(S@A,S@b,rcond=None)[0]
            errors[i,j] = np.linalg.norm(Ax_true-A@x)

    results[method_name] = {
        'error': errors,
    }

# Plot the results
σ = 0.1

fig,axs = plt.subplots(1,1,figsize=(10,4))

for sketch_method in sketch_methods:
    method_name = sketch_method['name']
    # Get the error data for this problem and method
    error = results[method_name]['error']
    
    bot, mid, top = np.quantile(error, [σ, .5, 1-σ], axis=0)
    plt.plot(ks/d, mid, label=method_name, linewidth=2)
    plt.fill_between(ks/d, bot, top, alpha=.2)

# Theoretical bound
plt.plot(ks/d, np.sqrt(d/(ks-d-1))*residual_norm, ls=':', color='k', label=r'Gaussian expectation')

plt.ylabel(r'accuracy: $\|A(x^*-\widehat{x})\|^2$')
plt.xlabel('Sketch size ($k/d$)')
plt.yscale('log')
plt.xscale('log')
plt.legend()

plt.savefig('sketch_and_solve.svg')
plt.close()

Performance comparison plot showing accuracy vs sketch size for different sketching methods in sketch-and-solve. Log-log plot shows error decreasing as sketch size increases, with all methods (Gaussian, Trigonometric, Sparse variants) performing similarly and following the theoretical Gaussian expectation bound. The y-axis shows solution accuracy and x-axis shows sketch size relative to problem dimension.

Similar to as we observed in our numerical study of the embedding dimension, all sketching distributions behave very similarly to the Gaussian sketch.

Randomized Numerical Linear Algebra with Examples

Factorization-based Solvers

Randomized Numerical Linear Algebra with Examples

Sketch and Precondition

4.2Sketch and Solve

Active Regression¶

Analysis¶

The Gaussian case¶

Bounds based on Subspace Embeddings¶

Numerical Experiment¶