Exponential families have some really cool features when seen under the lens of the Legendre Transform, which provide some explanation to otherwise obscure & almost magical properties of the log-partition function in statistical mechanics. Let me share some ideas about this…
Gaussians, Gamma, Poisson, Multinomial, and almost all distributions we care about can be put in exponential family form. Exponential family distributions come as families indexed by natural parameters b — which look quite unnatural, but bring in all the magic.
To make things simple, let’s consider the entropy of a discrete variable X (which is an exponential family), and consider its entropy parametrised on the probability vector p, denoted by H(p). This happens to be concave, so -H(p) is convex.
We know that the Legendre transform gives two dual objects: (i) a conjugate coordinate to each convex function, and (ii) a convex conjugate function. Can you guess which are the duals of -H(p)?
The dual coordinates are the natural parameters of the exponential family, and the convex conjugate function is the log-partition function! Why nobody told me this in school?!?
This holds in general: the log-partition function, as function of the natural parameters, is the convex conjugate of the (minus) entropy as function of the mean sufficient statistics of X. This has a number of nice consequences.
(0) The log-partition function could be considered to be the “free entropy”, as, in a similar way, the free energy is the Legendre transform of the energy. I’m not sure if this has consequences but sounds pretty cool…
(1) Because the Legendre transform establishes a bijection between the normal and dual coordinates, this implies that an exponential family can be equally well described by the natural parameters or by the average of it sufficient statistics.
(As a nice example of this, the Ising model can be described either by the spin-spin couplings plus the external magnetic field — the natural parameters, or by the spin means and covariance matrix)
(2) The derivative of the convex conjugate is by definition equal to the original coordinate. So, for a discrete X, the derivative of the log-partition function has to be equal to a component of the vector p = p(X=x).
This explains why, for an arbitrary exponential family, the derivative of the log-partition is equal to the mean value of a sufficient statistic. So this surprising feature seen in Stat Mech is just a consequence of exponential families and Legendre transforms…
(3) The Fisher information matrix of X parametrised by the natural parameters can be found to be equal to the Hessian of the log-partition function, which also is equal to the covariance matrix. Now, the Legendre transform tell us that the Hessian of the convex conjugate is…
equal to the inverse of the Hessian of the original function. This implies that the Hessian of the entropy as function of the sufficient statistics, which is also the Fisher information of X parametrised on the sufficient stats, is equal to the inverse of the covariance matrix.
This nicely explains how the Fisher information can sometimes be equal to the covariance matrix (when parametrised on the natural params) and sometimes be equal to its inverse (when parametrised on the sufficient stats)… which can be rather confusing!
All this suggests that Statistical Mechanics could be considered to be an “ode to the exponential family”…
You can follow @_fernando_rosas.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.