Approximating f(A)b

Tyler Chen

Computational approaches to today’s most pressing and world-changing questions are reliant on subroutines for fundamental linear algebraic tasks. There is a lot of work on the design and analysis of algorithms for an increasingly prevalent subset of such tasks: those involving matrix functions of Hermitian (or real symmetric) matrices. Here A\mathbf{A} will be a n×nn\times n Hermitian matrix with eigenvalues Λ:={λi}i=0n1\Lambda := \{ \lambda_i \}_{i=0}^{n-1} and (orthonormal) eigenvectors {ui}i=0n1\{ \mathbf{u}_i\}_{i=0}^{n-1}; i.e., A=i=0n1λiuiuiT. \mathbf{A} = \sum_{i=0}^{n-1} \lambda_i \mathbf{u}_i \mathbf{u}_i^\mathsf{T}. A matrix function transforms the eigenvalues of a Hermitian (or symmetric) matrix according to some scalar function, while leaving the eigenvectors untouched.

The matrix function f(A)f(\mathbf{A}), induced by f:RRf:\mathbb{R}\to\mathbb{R} and A\mathbf{A}, is defined as f(A):=i=0n1f(λi)uiuiT. f(\mathbf{A}):= \sum_{i=0}^{n-1} f(\lambda_i) \mathbf{u}_i \mathbf{u}_i^\mathsf{T}.

Perhaps the most well known example of a matrix function is the matrix inverse A1\mathbf{A}^{-1}, which corresponds to the inverse function f(x)=x1f(x) = x^{-1}. Other common matrix functions including the matrix sign, logarithm, exponential, square root, and inverse square root, each of which has many applications throughout the mathematical sciences.

A common task involving matrix functions is computing the product f(A)vf(\mathbf{A})\mathbf{v} of a matrix function f(A)f(\mathbf{A}) with a fixed vector v\mathbf{v}; for instance, the matrix inverse applied to a vector corresponds to the solution of a linear system of equations. Beyond the multitude of applications of linear systems, matrix functions applied to vectors are used for computing the overlap operator in quantum chromodynamics , solving differential equations in applied math , Gaussian process sampling in statistics , principle component projection and regression in data science , and a range of other applications .

Another related and especially interesting task involving matrix functions is estimating the , tr(f(A))=i=0n1f(λi). \operatorname{tr}(f(\mathbf{A})) = \sum_{i=0}^{n-1} f(\lambda_i). Applications of spectral sums include characterizing the degree of protein folding in biology , studying the thermodynamics of spin systems in quantum physics and chemistry , benchmarking quantum devices in quantum information theory , maximum likelihood estimation in statistics , designing better public transit in urban planning , and finding triangle counts and other structure in network science .

The trace of matrix functions is intimately related to the spectral measure of A\mathbf{A} which encodes the eigenvalues of A\mathbf{A}.

The cumulative empirical spectral measure (CESM) Φ:R[0,1]\Phi : \mathbb{R}\to [0,1], induced by A\mathbf{A}, is defined by Φ(x)=ΦA:=i=0n1n11[λix]. \Phi(x) = \Phi_{\mathbf{A}} := \sum_{i=0}^{n-1} n^{-1} 1[ \lambda_i \leq x]. Here 1[true]=11[ \texttt{true} ] =1 and 1[false]=01[\texttt{false}] = 0.

Not only is Φ(x)\Phi(x) itself a spectral sum for each xRx\in\mathbb{R}, but tr(f(A))=nfdΦ. \operatorname{tr}(f(\mathbf{A})) = n \int f\,\mathrm{d}\Phi. In this sense, approximating the CESM Φ\Phi is equivalent to approximating spectral sums. However, approximations to Φ\Phi are also useful in that they provide a global picture of the spectrum of A\mathbf{A}. Such coarse grained approximations are used in electronic structure computations and other tasks in physics1 , probing the behavior of neural networks in machine learning , load balancing modern parallel eigensolvers in numerical linear algebra , and computing the product of matrix functions with vectors .

  1. In physics, the ``density’’ dΦ/dx\mathrm{d}\Phi/\mathrm{d}x is often called the density of states (DOS).↩︎