Skip to article frontmatterSkip to article content

The active regression problem is a variant of the standard linear regression problem.

Most of the algorithms presented in Chapter 4 on linear regression require reading the entire vector b\vec{b}. However, in the active regression problem, where we measure cost by the number of entries of b\vec{b} that get observed, so such methods are off the table.

This section outlines basic sampling-based approaches to the active regression problem that aim to use a small number of entry evaluations.

Leverage score sampling

Recall leverage-dist(A)\Call{leverage-dist}(\vec{A}) is the distribution that corresponds to sampling an index from {1,,n}\{1, \ldots, n\} proportional to the Leverage-scores (1,,n)(\ell_1, \ldots, \ell_n) of A\vec{A}.

Analysis

Note that Algorithm 7.1 is nothing more than Algorithm 4.3 (sketch-and-solve) using the Leverage-score Sketch. Theorem 2.9 guarantees that the leverage-score sketch is a subspace embedding for A\vec{A}. However, we cannot immediately apply the analysis techniques from used in the analysis of Sketch and Solve, because these require that the sketch is a subspace embedding for [A,b][\vec{A},\vec{b}]. The standard approach to the analysis is to make use of Approximate Matrix Multiplication guarantee.

Proof

See Raphel’s wiki.