Skip to article frontmatterSkip to article content

The main downside to classical QR algorithms is that they are sequential. As we have seen, we get a much higher flop rate with matrix-multiplication than with matrix factorization. We can use this to our advantage by computing a QR factorization using the Cholesky factorization of the Gram matrix ATA\vec{A}^\T\vec{A}. Such an approach

This algorithm is easily implemented in Numpy.

def cholesky_QR(A):

    X = A.T@A
    R = np.linalg.cholesky(X).T
    Q = sp.linalg.solve_triangular(R.T,A.T,lower=True).T

    return Q,R

The cost of Algorithm 3.1 is dominated by the matrix-matrix product in the first line, which costs O(nd2)O(nd^2) operations. Thus, we might hope that this algorithm runs faster in practice than standard QR factorization algorithms, since matrix-matrix multiplication has a very high flop-rate. In addition, Algorithm 3.1 is mathematically exact; i.e. in exact arithmetic, it will produce a true QR factorization.

Proof

By construction R\vec{R} is upper triangular and A=QR\vec{A} = \vec{Q}\vec{R}. Since R\vec{R} is the Cholesky factorization of ATA\vec{A}^\T\vec{A}, we have that RTR=ATA\vec{R}^\T\vec{R} = \vec{A}^\T\vec{A}. This means that

QTQ=RTATAR1=RTRTRR1=I,\vec{Q}^\T \vec{Q} = \vec{R}^{-\T}\vec{A}^\T\vec{A}\vec{R}^{-1} = \vec{R}^{-\T}\vec{R}^\T\vec{R}\vec{R}^{-1} = \vec{I},

so Q\vec{Q} is orthogonal.

Numerical Experiment

Let’s try to understand the performance of Cholesky QR relative to the default QR factorization method in Numpy. In addition to the runtime, we will compute the orthogonality and reconstruction errors

QTQIandAQR,\|\vec{Q}^\T\vec{Q} - \vec{I}\| \quad\text{and}\quad \|\vec{A} - \vec{Q}\vec{R}\|,

which are both zero for a perfect QR factorization.

# Generate a random matrix with controlled condition number
n = 5000
d = 300

U,s,Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1e-4,1,d) # define singular values
A = U@np.diag(s)@Vt
# Define QR factorization methods
qr_methods = [
    {'name':'Householder QR',
     'func': lambda: np.linalg.qr(A,mode='reduced')},
    {'name':'Cholesky QR',
     'func': lambda: cholesky_QR(A)}
]
Source
# Time the QR factorization methods
n_repeat = 10  # Number of repetitions for averaging

results = []

for qr_method in qr_methods:

    method_name = qr_method['name']

    # Time the method
    start = time.time()
    for _ in range(n_repeat):
        Q, R = qr_method['func']()
    end = time.time()
    
    avg_time = (end - start) / n_repeat
    
    # Compute accuracy metrics
    results.append({
        'method': method_name,
        'time (s)': avg_time,
        'orthogonality': np.linalg.norm(Q.T @ Q - np.eye(d)),
        'reconstruction': np.linalg.norm(A - Q @ R)
    })

# Create DataFrame and compute relative performance
results_df = pd.DataFrame(results)
results_df['speedup'] = results_df['time (s)'].max() / results_df['time (s)']

# Display results with formatting
results_df.reindex(columns=['method','time (s)','speedup','orthogonality','reconstruction']).style.format({
    'time (s)': '{:.4f}',
    'speedup': '{:.1f}x',
    'orthogonality': '{:1.1e}',
    'reconstruction': '{:1.1e}',
})
Loading...

As expected, Cholesky QR is much faster than the standard Householder QR factorization.

Too good to be true?

While CholeskyQR is faster than a standard QR method, our numerical experiment reveals that the Q\vec{Q} matrix produced by Cholesky SQR is much less orthogonal! The numerical instability can be traced back to presence of the Gram matrix ATA\vec{A}^\T\vec{A}. Indeed, cond(ATA)=cond(A)2\cond(\vec{A}^\T\vec{A}) = \cond(\vec{A})^2, so by forming the Gram matrix we end up making the conditioning way worse 💔.

In the next section, we will explore how RandNLA can be used to produce a more accurate approximation, while maintaining the efficiency of Cholesky QR.