Probability - Randomized Numerical Linear Algebra with Examples

Properties of the expectation¶

The expectation operator has several useful properties that are frequently used in the analysis of randomized algorithms.

The first is the law of total expectation, which is useful for accounting for separate sources of randomness separately.

Matrix equations¶

Familiar facts about scalars also extend to matrices. The following results are particularly useful for analyzing the performance of randomized matrix algorithms.

Proof

We use the algebraic identity $(a-b)^2 = (a-c)^2 + (c-b)^2 + 2(a-c)(c-b)$ with $a = \vec{A}$ , $b = \vec{X}$ , and $c = \EE[\vec{X}]$ :

\begin{align*} \|\vec{A} - \vec{X}\|_\F^2 &= \|\vec{A} - \EE[\vec{X}]\|_\F^2 + \|\EE[\vec{X}] - \vec{X}\|_\F^2 + 2\langle\vec{A} - \EE[\vec{X}], \EE[\vec{X}] - \vec{X}\rangle_\F, \end{align*}

where $\langle \vec{X},\vec{Y}\rangle_\F := \tr(\vec{X}^\T\vec{Y})$ . Next, by the linearity of expectation,

\begin{aligned} \EE\left[ \langle\vec{A} - \EE[\vec{X}], \EE[\vec{X}] - \vec{X}\rangle_\F \right] &= \langle\vec{A} - \EE[\vec{X}], \EE[\EE[\vec{X}] - \vec{X}]\rangle_\F \\&= \langle\vec{A} - \EE[\vec{X}], \EE[\vec{X}] - \EE[\vec{X}]\rangle_\F \\&= 0. \end{aligned}

The result follows by taking the expectation.

We can define the variance of a random matrix similar to the variance of a random scalar.

The bias-variance decomposition shows that the error of any estimator can be separated into systematic bias and random variance components.

In many RandNLA algorithms, we average iid copies of a random matrix estimator to reduce variance. Similar to the scalar case, the variance decreases proportional to the number of copies averaged.

Proof

By Theorem 1.4 and Definition 1.6, without loss of generality, we can assume $\EE[\vec{X}] = \vec{0}$ . Then, expanding and using linearity of expectation and that $\vec{X}_i$ and $\vec{X}_j$ are independent (if $i\neq j$ ).

\begin{aligned} \VV\left[\frac{1}{m}\sum_{i=1}^m \vec{X}_i\right] &= \EE\left[ \left\| \frac{1}{m} \sum_{i=1}^{m} \vec{X}_i \right\|_\F^2 \right] \\&= \frac{1}{m^2} \sum_{i=1}^{m} \sum_{j=1}^{m} \EE[\langle\vec{X}_i,\vec{X}_j\rangle_\F]. \\&= \frac{1}{m^2} \sum_{i=1}^{m} \EE[\|\vec{X}_i\|_\F^2]. \\&= \frac{1}{m} \EE[\|\vec{X}\|_\F^2] \\&= \VV[\vec{X}]. \end{aligned}

Properties of probability¶

We sometimes work with probabilties.

Concentration inequalities¶

Concentration inequalities tell us how close a random variable is to some value. The most basic concentration inequality is Markov’s.

Proof

Let $f_X(x)$ be the density of $X$ . We have

\begin{aligned} \EE[X] &= \int_0^\infty x f_X(x) \, \d{x} \\ &= \int_0^t x f_X(x) \, \d{x} + \int_t^\infty x f_X(x) \, \d{x} \\ &\geq \int_t^\infty x f_X(x) \, \d{x} \\ &\geq t \int_t^\infty f_X(x) \, \d{x} \\ &= t \PP[X \geq t]. \end{aligned}

Rearranging gives the result.

Markov’s inequality implies that a random variable is unlikely to be far from its mean (relative to the variance).

Proof

Apply Markov to $(\vec{X} - \EE[\vec{X}])^2$ .

There are many other concentration inequalities (e.g. Bernstein, Chernoff, Hoeffding, etc.), which provide better dependence on $t$ for “nice” random variables. Such bounds are used in many RandNLA analyses, but will not be particularly important in this book.

Randomized Numerical Linear Algebra with Examples

Numerical Linear Algebra

Randomized Numerical Linear Algebra with Examples

The Cost of Numerical Linear Algebra