The “median trick” (sometimes called “boosting”) is a clever way of turning a (scaler valued) randomized algorithm that succeeds with some constant probability (e.g. 2/3) into one that succeeds with arbitrarily high probability 1−δ with only logarithmic overhead.
The idea is simple: suppose we have a randomized algorithm that produces an approximation x to x∗ satisfying P[∣x∗−x∣<ε]>2/3.
We can boost the success probability by running the algorithm multiple times and taking the median of the results.
Run the algorithm m times to get estimates x1,x2,…,xm
Let x=median(x1,…,xm)
Fact. For m=O(log(1/δ)), we have that P[∣x∗−x∣<ε]>1−δ.
Notably, the dependence on the target failure probability δ is only logarithmic.
The proof is simple and standard, so we will just provide a high level overview.
Proof.
By definition of the median, half of the estimates xi are below x and half are above.
Thus, the only way for x to be outside of [x∗−ε,x∗+ε] is if at least half of the estimates xi are outside of this interval.
However, since each xi live within this interval with probability at least 2/3, it is (exponentially) unlikely that half of them live outside of the interval. □
High dimensions
What about in high dimensions?
I.e. if we have a randomized algorithm that produces an approximation x to x∈Rd satisfying P[∥x−x∥<ε]>2/3?
While the median trick in 1d appears in almost all course notes for a class on randomized algorithms, finding a statement of a high-dimensional analog is surprisingly much more difficult.
The simplest approach is to apply the one-dimensional estimator along each dimension.
The main issue with this is that ∥x−x∥ being small is not necessarily the same thing as each coordinate being close (i.e. ∥x−x∥∞ being small).
So we will lose something in the accuracy from converting between norms.
For instance, if ∥⋅∥ is the standard Euclidan norm, then we will lose a factor of d in the accuracy.
A better estimator
Here is a better approach that only loses a factor of 3 on the accuracy.
I was introduced to this approach by Chris Musco, but I’d expect its a folklore type result.
Make independent copies x1,…,xm of x
For all i,j∈[m], define d(i,j)=∥xi−xj∥
For all i∈[m], define c(i)=median(d(i,1),…,d(i,t))
i∗=argmini∈[m]c(i)
x=xi∗
Fact. For m=O(log(1/δ)), we have that P[∣x∗−x∣<3ε]>1−δ.
Proof.
Let G={i:∥x∗−xi∥≤ε}.
Since E[∣G∣]>2m/3, it follows from a standard Chernoff Bound that Pr(∣G∣≤t/2)≤δ for m=O(log(1/δ)).
Therefore, it suffices to show that if ∣G∣>m/2, then ∥x∗−x∥<3ε.
Suppose ∣G∣>m/2.
By the triangle inequality, for each i,j∈[m],
d(i,j)=∥xi−xj∥≤∥x∗−xi∥+∥x∗−xj∥.
Therefore, for each i∈G{j∈[m]:d(i,j)≤2ε}>t/2,
and hence,
c(i)=median(d(i,1),…,d(i,t))≤2ε.
It follows that c(i∗)=minic(i)≤2ε, and since c(i∗)=median(d(i∗,1),…,d(i∗,t)), we have
{j∈[m]:d(i∗,j)≤2ε}≥t/2.
But also ∣G∣>t/2.
So, by the pigeonhole principle, there is at lest one index j∗∈G for which
d(i∗,j∗)≤2ε.
Therefore, by the triangle inequality
∥x∗−xi∗∥≤∥x∗−xj∗∥+∥xi∗−xj∗∥≤ε+2ε=3ε,
as desired.□