Statistical physics vs Bayesian inference

Statistical physics and Bayesian inference are closely related (see Andrew Gelman’s remarks here for example). A good way to illustrate the relationship is to simulate a statistical physics model using the “state-of-the-art full Bayesian statistical inference platform” stan. stan uses Hamiltonian Monte Carlo (HMC) to sample the typical set of the posterior distribution efficiently (Neal 2012) but without the main drawback of HMC, namely, the need to hand-tune the algorithm parameters (Betancourt 2017).

2D XY model

The 2D XY model has played a pivotal role in physics of the past half-century. The model describes a collection of interacting rotors or spins are free to rotate in the xy-plane. Each spin is associated with a site \bf{r} on a D \times D square lattice and is described it’s angle parameter \theta_{\bf{r}} \in [-\pi,\pi). Spins interact with their four nearest neighbours \bf{\boldsymbol{\tau}}. The energy of a particular configuration \{\theta_{\bf{r}}\} is

(1)   \begin{equation*} E = -\sum_{\bf{r} \boldsymbol{\tau}} \cos(\theta_{\bf{r}+\boldsymbol{\tau}}-\theta_{\bf{r}}) \end{equation*}

The negative sign in Eqn (1) implies that parallel arrangements of spins have lowest energy (“ferromagnet”). The likelihood of a configuration \{\theta_{\bf{r}}\} is given by the Boltzmann distribution of statistical physics. With uniform priors[*] the posterior joint probability distribution is

(2)   \begin{equation*} P\left( \{\theta_{\bf{r}}\} | \beta \right) \propto e^{-\beta E\left( \{\theta_{\bf{r}}\} \right)} \end{equation*}

where \beta is the inverse temperature.

In statistical physics the number of internal parameters N can be very large (e.g. Avogadro’s number for a 2D system N \sim 7 \times 10^{15}). On the other hand \beta is the sole external data value. This is an extreme limit of Bayesian inference where the inference dataset is small but there are an enormous number of model parameters. In statistical physics, interesting posterior distributions arise from the explicit interactions between parameters even in the absence of a rich inference dataset.

The statistical physics of the 2-D XY model is far from trivial. When \beta \rightarrow \infty spins align in some direction to minimise the energy [**].  For large but finite \beta spins are nearly parallel and the cosine terms in Eqn (1) can be expanded as a quadratic in the small angular differences between neighbouring spins. This “spin-wave” model can be solved analytically. For large N it gives “quasi long range order” i.e. power law decay of spin correlations with distance with exponent \eta = \frac{1}{2 \pi \beta}.

    \[\langle \cos\left(\theta_{\bf r^\prime}-\theta_{\bf r }\right) \rangle \sim | \bf{r}-\bf{r^\prime} |^{-\eta}\]

In the opposite weak coupling limit \beta \ll 1 entropy dominates and typical draws from the posterior distribution consist of nearly randomly oriented spins and short range (exponentially decaying) correlations. This suggests that a phase transition exists between a weak coupling disordered phase and a strong coupling phase with quasi long range order. The transition is known to occur at \beta_{BKT} \approx 1.11996 for very large N. As the transition is approached from below \eta \rightarrow \frac{1}{4}.

It is a remarkable fact that the phase transition in the 2D XY model is driven by a proliferation of vortices (topological defects). This discovery was made in the 1970s and awarded a Nobel prize in 2016. Some vortex configurations on an ordered background are illustrated below. Note that the spins return to their undisturbed state far from a vortex anti-vortex pair but not in the other cases shown. [***]

stan

It is an easy task to simulate Equations (1) & (2) in stan. The stan code (uniform priors and periodic boundary conditions) is

xy_model <- '
functions{
//function f to impose periodic boundary condition
int f(int i, int L){
if (i == 0)
return L;
else if (i == L+1)
return 1;
else
return i;
}

}

data {
int D; //dimension of square lattice
real beta; //coupling or inverse temperature
}

parameters {
matrix<lower=-pi(),upper=pi()>[D,D] theta; //the spin parameters
}

model {
//the site energy
matrix[D,D] energy;
for(i in 1:D)
for(j in 1:D)
energy[i,j] = cos(theta[f(i+1,D),f(j,D)]-theta[f(i,D),f(j,D)])
+ cos(theta[f(i-1,D),f(j,D)]-theta[f(i,D),f(j,D)])
+ cos(theta[f(i,D),f(j+1,D)]-theta[f(i,D),f(j,D)])
+ cos(theta[f(i,D),f(j-1,D)]-theta[f(i,D),f(j,D)]);

for(i in 1:D)
for(j in 1:D) {
target += exponential_lpdf(4-energy[i,j]| beta);
}
}
'

To run the model in R,

library(rstan)
xy_mod <- stan_model(model_code = xy_code)
Nchain <- 8
D <- 32
init <- lapply(1:Nchain, function(i) list(theta=matrix(runif(1,-pi,pi),D,D))) #random orientations
xy_samples <- sampling(xy_mod,chains=Nchain,init=init, data = list(D=D,beta=0.9), iter = 2000, warmup=1000)

Here are some posterior draws generated by the above code after warm-up for D=32 (N=1024).

Firstly, low-temperature “spin-wave” phase with power law correlations…

Secondly, the high temperature disordered phase with unbound vortices …

Finally, close to the transition, weakly bound vortex-ant-vortex pairs…

It is straightforward to compute quantities of physical interest from the HMC sample draws. The graph below shows mean energy per site, mean site magnetisation and mean square vorticity averaged over ~ 10,000 draws. Vortices proliferate in the vicinity of the transition as expected.

[*]Non-uniform priors can also be used in Eqn. (2). A simple choice is the circular Von Mises distribution

    \[p\left(\theta_{\bf r}\right) \propto e^{\kappa \cos \left(\theta_{\bf r}- \mu \right)}\]

which favours spin direction \mu. This is mathematically identical to uniform prior and an external “magnetic field” (H) term in Eqn. (1) with \kappa = \beta H.

[**] A very small external magnetic field can be used to select the direction.

[***] Net vorticity N_v of a region is measured by summing angular differences around it’s perimeter L, \sum_{i \in L} \Delta \theta_i = 2 \pi N_v (where each angular difference is wrapped to the interval -\pi,\pi. ) This formula can be applied to an individual plaquette to locate vortex cores, for example.

2+

Leave a Comment

Your email address will not be published. Required fields are marked *