Skip to article frontmatterSkip to article content

While CholeskyQR has a high flop efficiency, it is unstable if A\vec{A} is not well-conditioned due to working with the Gram matrix ATA\vec{A}^\T\vec{A}. Fortunately, we can fix this by preconditioning the problem so that it is better conditioned.

Subspace Embedding for approximate QR

We begin with an extremely simple sketching-based algorithm.

This algorithm is easily implemented in Numpy. Here we use a SparseStack sketch.

def sketched_qr(A,k,zeta,rng):

    n, d = A.shape
    S = sparse_stack_sketch(n,k,zeta,rng) 
    Y = S @ A 
    R = np.linalg.qr(Y, mode='r') 
    Q = sp.linalg.solve_triangular(R.T,A.T,lower=True).T 
    
    return Q, R

As long as the sketching matrix S\vec{S} gives a subspace embedding, then the output of Algorithm 3.2 will be an approximate QR factorization.

Proof

By definition,

xrange(A):(1ε)x2Sx2(1+ε)x2.\forall \vec{x}\in\range(\vec{A}): (1-\varepsilon)\|\vec{x}\|_2 \leq \|\vec{S}\vec{x}\|_2 \leq (1+\varepsilon)\|\vec{x}\|_2.

We can reparameterize x=AR1c=Qc\vec{x} = \vec{A}\vec{R}^{-1}\vec{c} = \vec{Q}\vec{c} for some cRd\vec{c}\in\R^d. Since SAR1=Q\vec{S}\vec{A}\vec{R}^{-1} = \vec{Q}' has orthonormal columns, we have Sx=SAR1c2=c2\|\vec{S}\vec{x}\| = \|\vec{S}\vec{A}\vec{R}^{-1}\vec{c}\|_2 = \|\vec{c}\|_2. Thus,

(1ε)Qc2c2(1+ε)Qc2.(1-\varepsilon) \|\vec{Q}\vec{c}\|_2 \leq \|\vec{c}\|_2 \leq (1+\varepsilon)\|\vec{Q}\vec{c}\|_2.

Hence,

σmin(Q)=minc0Qc2c211+ε,σmax(Q)=maxc0Qc2c211ε.\smin(\vec{Q}) = \min_{\vec{c}\neq 0} \frac{\|\vec{Q}\vec{c}\|_2}{\|\vec{c}\|_2} \geq \frac{1}{1+\varepsilon} ,\qquad \smax(\vec{Q}) = \max_{\vec{c}\neq 0} \frac{\|\vec{Q}\vec{c}\|_2}{\|\vec{c}\|_2} \leq \frac{1}{1-\varepsilon}.

Unfortunately, oblivious sketching methods require k=Ω(d/ε2)k = \Omega(d/\varepsilon^2). The polynomial dependence on ε\varepsilon means that to get a very well-conditioned basis, we need a very large sketching dimension kk, which is unpractical for ε\varepsilon close to machine precision.

Preconditioning for QR

While Algorithm 3.2 doesn’t produce a highly orthogonal basis, it does produce a well-conditioned basis. Therefore, we can use the Cholesky QR method to orthonormalize the output of Algorithm 3.2, which will mitigate the stability issues of Cholesky QR are mitigated.

We can now use our implementation of Algorithm 3.2 to implement Algorithm 3.3 in Numpy.

def randomized_cholesky_QR(A,k,zeta,rng):

    Q1,R1 = sketched_qr(A,k,zeta,rng)
    Q,R2 = cholesky_QR(Q1)
    R = R2@R1
    
    return Q,R

Numerical Experiment

Let’s compare the performance and accuracy of different QR factorization methods, including our new randomized Cholesky QR approach. We’ll use the same numerical experiment as in the previous notebook, but now we include the randomized Cholesky QR method.

k = int(1.5*d)
zeta = 4

# Define QR factorization methods
qr_methods = [
    {'name': 'Numpy',
     'func': lambda: np.linalg.qr(A, mode='reduced')},
    {'name': 'Cholesky QR',
     'func': lambda: cholesky_QR(A)},
    {'name': 'Rand. Cholesky QR',
     'func': lambda: randomized_cholesky_QR(A,k,zeta,rng)}
]
Source
rng = np.random.default_rng(0) 

# Time the QR factorization methods
n_repeat = 10  # Number of repetitions for averaging

results = []

for qr_method in qr_methods:
    method_name = qr_method['name']
    
    # Time the method
    start = time.time()
    for _ in range(n_repeat):
        Q, R = qr_method['func']()
    end = time.time()
    
    avg_time = (end - start) / n_repeat
    
    # Compute accuracy metrics
    results.append({
        'method': method_name,
        'time (s)': avg_time,
        'orthogonality': np.linalg.norm(Q.T @ Q - np.eye(d)),
        'reconstruction': np.linalg.norm(A - Q @ R)
    })

# Create DataFrame and compute relative performance
results_df = pd.DataFrame(results)
results_df['speedup'] = results_df['time (s)'].max() / results_df['time (s)']

# Display results with formatting
results_df.reindex(columns=['method','time (s)','speedup','orthogonality','reconstruction']).style.format({
    'time (s)': '{:.4f}',
    'orthogonality': '{:1.1e}',
    'reconstruction': '{:1.1e}',
    'speedup': '{:.1f}x',
})
Loading...

We observe that randomized Cholesky QR is able to get most of the speedups of Cholesky QR, while producing an accurate QR factorization!