# Approximating f(A)b

Computational approaches to today’s most pressing and world-changing questions are reliant on subroutines for fundamental linear algebraic tasks. There is a lot of work on the design and analysis of algorithms for an increasingly prevalent subset of such tasks: those involving matrix functions of Hermitian (or real symmetric) matrices. Here $\mathbf{A}$ will be a $n\times n$ Hermitian matrix with eigenvalues $\Lambda := \{ \lambda_i \}_{i=0}^{n-1}$ and (orthonormal) eigenvectors $\{ \mathbf{u}_i\}_{i=0}^{n-1}$; i.e., $\mathbf{A} = \sum_{i=0}^{n-1} \lambda_i \mathbf{u}_i \mathbf{u}_i^\mathsf{T}.$ A matrix function transforms the eigenvalues of a Hermitian (or symmetric) matrix according to some scalar function, while leaving the eigenvectors untouched.

The matrix function $f(\mathbf{A})$, induced by $f:\mathbb{R}\to\mathbb{R}$ and $\mathbf{A}$, is defined as $f(\mathbf{A}):= \sum_{i=0}^{n-1} f(\lambda_i) \mathbf{u}_i \mathbf{u}_i^\mathsf{T}.$

Perhaps the most well known example of a matrix function is the matrix inverse $\mathbf{A}^{-1}$, which corresponds to the inverse function $f(x) = x^{-1}$. Other common matrix functions including the matrix sign, logarithm, exponential, square root, and inverse square root, each of which has many applications throughout the mathematical sciences.

A common task involving matrix functions is computing the product $f(\mathbf{A})\mathbf{v}$ of a matrix function $f(\mathbf{A})$ with a fixed vector $\mathbf{v}$; for instance, the matrix inverse applied to a vector corresponds to the solution of a linear system of equations. Beyond the multitude of applications of linear systems, matrix functions applied to vectors are used for computing the overlap operator in quantum chromodynamics , solving differential equations in applied math , Gaussian process sampling in statistics , principle component projection and regression in data science , and a range of other applications .

Another related and especially interesting task involving matrix functions is estimating the , $\operatorname{tr}(f(\mathbf{A})) = \sum_{i=0}^{n-1} f(\lambda_i).$ Applications of spectral sums include characterizing the degree of protein folding in biology , studying the thermodynamics of spin systems in quantum physics and chemistry , benchmarking quantum devices in quantum information theory , maximum likelihood estimation in statistics , designing better public transit in urban planning , and finding triangle counts and other structure in network science .

The trace of matrix functions is intimately related to the spectral measure of $\mathbf{A}$ which encodes the eigenvalues of $\mathbf{A}$.

The cumulative empirical spectral measure (CESM) $\Phi : \mathbb{R}\to [0,1]$, induced by $\mathbf{A}$, is defined by $\Phi(x) = \Phi_{\mathbf{A}} := \sum_{i=0}^{n-1} n^{-1} 1[ \lambda_i \leq x].$ Here $1[ \texttt{true} ] =1$ and $1[\texttt{false}] = 0$.

Not only is $\Phi(x)$ itself a spectral sum for each $x\in\mathbb{R}$, but $\operatorname{tr}(f(\mathbf{A})) = n \int f\,\mathrm{d}\Phi.$ In this sense, approximating the CESM $\Phi$ is equivalent to approximating spectral sums. However, approximations to $\Phi$ are also useful in that they provide a global picture of the spectrum of $\mathbf{A}$. Such coarse grained approximations are used in electronic structure computations and other tasks in physics1 , probing the behavior of neural networks in machine learning , load balancing modern parallel eigensolvers in numerical linear algebra , and computing the product of matrix functions with vectors .

1. In physics, the density’’ $\mathrm{d}\Phi/\mathrm{d}x$ is often called the density of states (DOS).↩︎