7.1 Approximate Matrix Multiplication
Approximate matrix-multiplication is the quintessential example of a sampling-based RandNLA algorithm Drineas et al. , 2006 .
Consider the matrix-matrix product
A B T = ∑ i = 1 n a i b i T , A = [ ∣ ∣ a 1 ⋯ a n ∣ ∣ ] , B : = [ ∣ ∣ b 1 ⋯ b n ∣ ∣ ] . \vec{A}\vec{B}^\T = \sum_{i=1}^{n}
\vec{a}_i \vec{b}_i^\T
,\qquad
\vec{A}
= \begin{bmatrix}
| & & | \\
\vec{a}_1 & \cdots & \vec{a}_n \\
| & & |
\end{bmatrix}
,\quad
\vec{B}:=
\begin{bmatrix}
| & & | \\
\vec{b}_1 & \cdots & \vec{b}_n \\
| & & |
\end{bmatrix}. A B T = i = 1 ∑ n a i b i T , A = ⎣ ⎡ ∣ a 1 ∣ ⋯ ∣ a n ∣ ⎦ ⎤ , B := ⎣ ⎡ ∣ b 1 ∣ ⋯ ∣ b n ∣ ⎦ ⎤ . We can approximate (7.1) by sampling.
In particular, consider a probability distribution P : = ( p 1 , … , p n ) \mathcal{P} := (p_1, \ldots, p_n) P := ( p 1 , … , p n ) on the indices { 1 , … , n } \{1, \ldots, n\} { 1 , … , n } such that if s ∼ P s\sim \mathcal{P} s ∼ P then P [ s = i ] = p i \PP[s=i] = p_i P [ s = i ] = p i .
Then clearly,
E [ 1 p s a s b s T ] = A B T . \EE\left[ \frac{1}{p_s}\vec{a}_s \vec{b}_s^\T \right]
= \vec{A}\vec{B}^\T. E [ p s 1 a s b s T ] = A B T . This suggests a simple estimator.
Fix an integer m ≥ 1 m\geq 1 m ≥ 1 . The approximate matrix multiplication estimator corresponding to the distribution P \mathcal{P} P is
AMM m ( A , B ) : = 1 m ∑ i = 1 m 1 p s i a s i b s i T , \Call{AMM}_m(\vec{A},\vec{B}) :=
\frac{1}{m} \sum_{i=1}^{m} \frac{1}{p_{s_i}} \vec{a}_{s_i} \vec{b}_{s_i}^\T, AMM m ( A , B ) := m 1 i = 1 ∑ m p s i 1 a s i b s i T , where s 1 , … , s m \vec{s}_1,\ldots, \vec{s}_m s 1 , … , s m are iid samples from P \mathcal{P} P .
Variance Bound ¶ One can directly compute the variance of the approximate matrix multiplication estimator.
For any A \vec{A} A and B \vec{B} B , E [ AMM m ( A , B ) ] = A B T \EE\left[\Call{AMM}_m(\vec{A},\vec{B}) \right] = \vec{A}\vec{B}^\T E [ AMM m ( A , B ) ] = A B T and
E [ ∥ A B − AMM m ( A , B ) ∥ F 2 ] = 1 m ( ∑ i = 1 n p i − 1 ∥ a i ∥ 2 ∥ b i ∥ 2 − ∥ A B ∥ F 2 ) . \EE\left[ \| \vec{A}\vec{B} - \Call{AMM}_m(\vec{A},\vec{B}) \|_\F^2 \right]
= \frac{1}{m}\left( \sum_{i=1}^{n} p_i^{-1} \| \vec{a}_i \|^2 \| \vec{b}_i \|^2 - \|\vec{A}\vec{B}\|_\F^2\right). E [ ∥ AB − AMM m ( A , B ) ∥ F 2 ] = m 1 ( i = 1 ∑ n p i − 1 ∥ a i ∥ 2 ∥ b i ∥ 2 − ∥ AB ∥ F 2 ) . Proof
Clearly the estimator is unbiased, and by basic properties of the expectation ,
E [ ∥ A B − AMM m ( A , B ) ∥ F 2 ] = m − 1 E [ ∥ A B − AMM 1 ( A , B ) ∥ F 2 ] = m − 1 ( E [ AMM 1 ( A , B ) ∥ F 2 ] − ∥ A B ∥ F 2 ) . \begin{aligned}
\EE[ \| \vec{A}\vec{B} - \Call{AMM}_m(\vec{A},\vec{B}) \|_\F^2]
&= m^{-1} \EE[ \| \vec{A}\vec{B} - \Call{AMM}_1(\vec{A},\vec{B}) \|_\F^2]
\\&= m^{-1} (\EE[ \Call{AMM}_1(\vec{A},\vec{B}) \|_\F^2] - \|\vec{A}\vec{B}\|_\F^2).
\end{aligned} E [ ∥ AB − AMM m ( A , B ) ∥ F 2 ] = m − 1 E [ ∥ AB − AMM 1 ( A , B ) ∥ F 2 ] = m − 1 ( E [ AMM 1 ( A , B ) ∥ F 2 ] − ∥ AB ∥ F 2 ) . Now, by direct computation,
E [ AMM 1 ( A , B ) ∥ F 2 ] = E [ ∥ p s − 1 a s b s T ∥ F 2 ] = ∑ i = 1 n p i ∥ p i − 1 a i b i T ∥ F 2 = ∑ i = 1 n p i − 1 ∥ a i ∥ 2 ∥ b i ∥ 2 . \begin{aligned}
\EE[ \Call{AMM}_1(\vec{A},\vec{B}) \|_\F^2]
&= \EE\left[ \left\| p_s^{-1}\vec{a}_s \vec{b}_s^\T \right\|_\F^2\right]
\\&= \sum_{i=1}^{n} p_i \left\| p_i^{-1}\vec{a}_i \vec{b}_i^\T \right\|_\F^2
\\&= \sum_{i=1}^{n} p_i^{-1} \| \vec{a}_i \|^2 \| \vec{b}_i \|^2.
\end{aligned} E [ AMM 1 ( A , B ) ∥ F 2 ] = E [ ∥ ∥ p s − 1 a s b s T ∥ ∥ F 2 ] = i = 1 ∑ n p i ∥ ∥ p i − 1 a i b i T ∥ ∥ F 2 = i = 1 ∑ n p i − 1 ∥ a i ∥ 2 ∥ b i ∥ 2 . In particular, note that if p i ∝ ∥ a i ∥ 2 p_i \propto \|\vec{a}_i\|^2 p i ∝ ∥ a i ∥ 2 (or p i ∝ ∥ b i ∥ 2 p_i \propto \|\vec{b}_i\|^2 p i ∝ ∥ b i ∥ 2 ), then
E [ ∥ A B − AMM m ( A , B ) ∥ F 2 ] = 1 m ( ∥ A ∥ F 2 ∥ B ∥ F 2 − ∥ A B ∥ F 2 ) . \EE\left[ \| \vec{A}\vec{B} - \Call{AMM}_m(\vec{A},\vec{B}) \|_\F^2 \right]
= \frac{1}{m}\left( \|\vec{A}\|_\F^2\|\vec{B}\|_\F^2 - \|\vec{A}\vec{B}\|_\F^2\right). E [ ∥ AB − AMM m ( A , B ) ∥ F 2 ] = m 1 ( ∥ A ∥ F 2 ∥ B ∥ F 2 − ∥ AB ∥ F 2 ) .
Drineas, P., Kannan, R., & Mahoney, M. W. (2006). Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication. SIAM Journal on Computing , 36 (1), 132–157. 10.1137/s0097539704442684