Familiar facts about scalars also extend to matrices. The following results are particularly useful for analyzing the performance of randomized matrix algorithms.
Proof
We use the algebraic identity (a−b)2=(a−c)2+(c−b)2+2(a−c)(c−b) with a=A, b=X, and c=E[X]:
We can define the variance of a random matrix similar to the variance of a random scalar.
The bias-variance decomposition shows that the error of any estimator can be separated into systematic bias and random variance components.
In many RandNLA algorithms, we average iid copies of a random matrix estimator to reduce variance.
Similar to the scalar case, the variance decreases proportional to the number of copies averaged.
Proof
By Theorem 1.4 and Definition 1.6, without loss of generality, we can assume E[X]=0.
Then, expanding and using linearity of expectation and that Xi and Xj are independent (if i=j).
Markov’s inequality implies that a random variable is unlikely to be far from its mean (relative to the variance).
Proof
Apply Markov to (X−E[X])2.
There are many other concentration inequalities (e.g. Bernstein, Chernoff, Hoeffding, etc.), which provide better dependence on t for “nice” random variables.
Such bounds are used in many RandNLA analyses, but will not be particularly important in this book.