This chapter focuses on the (tall) linear regression problem.
Direct Methods¶
Classical direct solvers for Linear Regression such as the LAPACK method underlying np.linalg.lstsq
work in two stages.
First, is factorized (e.g QR factorization).
Second, the factorization is used to solve the linear system.
When implemented properly, such algorithms are numerically stable and require operations.
However, the dominant cost of this approach is computing a matrix factorization which, as we have seen in our discussion on the cost of linear algebra, does not have a particularly high flop-rate. Moreover, the total number of flops is , which might be too expensive when .
Iterative Methods¶
Iterative methods such as LSQR begin with an initial guess and iteratively produce a sequence of approximate solutions , where ideally is close to the solution of Linear Regression. At each iteration, such methods perform a matrix-vector product with and , in addition to some vector operations. Thus, the matrix-vector products are typically the dominant cost, and require operations per iteration. While iterative methods are able to take advantage of sparsity in , they may require many iterations to converge when is ill-conditioned problems.
Further Reading¶
Sarlos (2006), Rokhlin & Tygert (2008): Early works in RandNLA that respectively introduce sketch-and-solve and sketch
-and -precondition. Clarkson & Woodruff (2013), Chenakkod et al. (2024),Chenakkod et al. (2025): Line of work in TCS on sparse sketches that give algorithms for regression that run in time close to the optimal .
Epperly (2024), Epperly et al. (2025): Line of work on the stability fo sketch-and-precondition type algorithms
- Sarlos, T. (2006). Improved Approximation Algorithms for Large Matrices via Random Projections. 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), 143–152. 10.1109/focs.2006.37
- Rokhlin, V., & Tygert, M. (2008). A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105(36), 13212–13217. 10.1073/pnas.0804869105
- Clarkson, K. L., & Woodruff, D. P. (2013). Low rank approximation and regression in input sparsity time. Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, 81–90. 10.1145/2488608.2488620
- Chenakkod, S., Dereziński, M., Dong, X., & Rudelson, M. (2024). Optimal Embedding Dimension for Sparse Subspace Embeddings. Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 1106–1117. 10.1145/3618260.3649762
- Chenakkod, S., Dereziński, M., & Dong, X. (2025). Optimal Subspace Embeddings: Resolving Nelson-Nguyen Conjecture Up to Sub-Polylogarithmic Factors. https://arxiv.org/abs/2508.14234
- Epperly, E. N. (2024). Fast and Forward Stable Randomized Algorithms for Linear Least-Squares Problems. SIAM Journal on Matrix Analysis and Applications, 45(4), 1782–1804. 10.1137/23m1616790
- Epperly, E., Meier, M., & Nakatsukasa, Y. (2025). Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers. Communications on Pure and Applied Mathematics.