Skip to article frontmatterSkip to article content

Properties of the expectation

The expectation operator has several useful properties that are frequently used in the analysis of randomized algorithms.

The first is the law of total expectation, which is useful for accounting for separate sources of randomness separately.

Matrix equations

Familiar facts about scalars also extend to matrices. The following results are particularly useful for analyzing the performance of randomized matrix algorithms.

Proof

We use the algebraic identity (ab)2=(ac)2+(cb)2+2(ac)(cb)(a-b)^2 = (a-c)^2 + (c-b)^2 + 2(a-c)(c-b) with a=Aa = \vec{A}, b=Xb = \vec{X}, and c=E[X]c = \EE[\vec{X}]:

AXF2=AE[X]F2+E[X]XF2+2AE[X],E[X]XF,\begin{align*} \|\vec{A} - \vec{X}\|_\F^2 &= \|\vec{A} - \EE[\vec{X}]\|_\F^2 + \|\EE[\vec{X}] - \vec{X}\|_\F^2 + 2\langle\vec{A} - \EE[\vec{X}], \EE[\vec{X}] - \vec{X}\rangle_\F, \end{align*}

where X,YF:=tr(XTY)\langle \vec{X},\vec{Y}\rangle_\F := \tr(\vec{X}^\T\vec{Y}). Next, by the linearity of expectation,

E[AE[X],E[X]XF]=AE[X],E[E[X]X]F=AE[X],E[X]E[X]F=0.\begin{aligned} \EE\left[ \langle\vec{A} - \EE[\vec{X}], \EE[\vec{X}] - \vec{X}\rangle_\F \right] &= \langle\vec{A} - \EE[\vec{X}], \EE[\EE[\vec{X}] - \vec{X}]\rangle_\F \\&= \langle\vec{A} - \EE[\vec{X}], \EE[\vec{X}] - \EE[\vec{X}]\rangle_\F \\&= 0. \end{aligned}

The result follows by taking the expectation.

We can define the variance of a random matrix similar to the variance of a random scalar.

The bias-variance decomposition shows that the error of any estimator can be separated into systematic bias and random variance components.

In many RandNLA algorithms, we average iid copies of a random matrix estimator to reduce variance. Similar to the scalar case, the variance decreases proportional to the number of copies averaged.

Proof

By Theorem 1.4 and Definition 1.6, without loss of generality, we can assume E[X]=0\EE[\vec{X}] = \vec{0}. Then, expanding and using linearity of expectation and that Xi\vec{X}_i and Xj\vec{X}_j are independent (if iji\neq j).

V[1mi=1mXi]=E[1mi=1mXiF2]=1m2i=1mj=1mE[Xi,XjF].=1m2i=1mE[XiF2].=1mE[XF2]=V[X].\begin{aligned} \VV\left[\frac{1}{m}\sum_{i=1}^m \vec{X}_i\right] &= \EE\left[ \left\| \frac{1}{m} \sum_{i=1}^{m} \vec{X}_i \right\|_\F^2 \right] \\&= \frac{1}{m^2} \sum_{i=1}^{m} \sum_{j=1}^{m} \EE[\langle\vec{X}_i,\vec{X}_j\rangle_\F]. \\&= \frac{1}{m^2} \sum_{i=1}^{m} \EE[\|\vec{X}_i\|_\F^2]. \\&= \frac{1}{m} \EE[\|\vec{X}\|_\F^2] \\&= \VV[\vec{X}]. \end{aligned}

Properties of probability

We sometimes work with probabilties.

Concentration inequalities

Concentration inequalities tell us how close a random variable is to some value. The most basic concentration inequality is Markov’s.

Proof

Let fX(x)f_X(x) be the density of XX. We have

E[X]=0xfX(x)dx=0txfX(x)dx+txfX(x)dxtxfX(x)dxttfX(x)dx=tP[Xt].\begin{aligned} \EE[X] &= \int_0^\infty x f_X(x) \, \d{x} \\ &= \int_0^t x f_X(x) \, \d{x} + \int_t^\infty x f_X(x) \, \d{x} \\ &\geq \int_t^\infty x f_X(x) \, \d{x} \\ &\geq t \int_t^\infty f_X(x) \, \d{x} \\ &= t \PP[X \geq t]. \end{aligned}

Rearranging gives the result.

Markov’s inequality implies that a random variable is unlikely to be far from its mean (relative to the variance).

Proof

Apply Markov to (XE[X])2(\vec{X} - \EE[\vec{X}])^2.

There are many other concentration inequalities (e.g. Bernstein, Chernoff, Hoeffding, etc.), which provide better dependence on tt for “nice” random variables. Such bounds are used in many RandNLA analyses, but will not be particularly important in this book.