The map shows tracks of drifting buoys deployed in the southern ocean between 2005 and 2016. The underlying dataset consists of 15-minute gps data collected from 4608 buoys, available from NOAA ftp://ftp.aoml.noaa.gov/pub/phod/buoydata/. Colour indicates speed of drift.

This is simply a visualisation of the raw drifter data with no further analysis or modelling. Nevertheless all of the well-known features of the surface circulation in the southern ocean emerge; Antarctic Circumpolar Drift, western boundary currents along the western coasts of South America (Brazil Current) and Africa (Agulhas), powerful Agulhas Return Current in the Southern Indian Ocean, South Atlantic gyre (centred NorthEast of the Falkland Islands) etc.

A dataframe *buoydata* was created using *fread(),*

library(data.table) buoydata <- fread("../buoydata_10001_15000.dat")

transformed to a stereographic coordinates using *spTransform()*

library(dplyr) library(sp) antarctic <- CRS("+proj=stere +datum=WGS84") lonlat <- CRS("+proj=longlat +datum=WGS84") buoydata.sp <- SpatialPointsDataFrame(select(buoydata, lon,lat),data = select(buoydata,-lon,-lat),proj4string=lonlat) buoy.df <- as.data.frame(spTransform(buoydata.sp,antarctic))

and plot using *geom_path()*

library(ggplot2) ggplot(buoy.df) + geom_path(aes(lon,lat,group=ID,colour=speed),alpha=0.15,size=0.2)

Note the use of transparency (alpha). This can be varied to highlight different aspects of the flow.

]]>

q-state Potts models are a simple type of interacting spin (discrete state) model. Each site on a graph or lattice has state in . The site ‘energy’ is determined by the number of neighbours which agree with . The total energy of a configuration is,

(1)

where if = and otherwise. The sum is over “neighbours” . According to the simple rules of statistical physics, the probability of a particular configuration follows Boltzmann’s law,

(2)

Instead of spins, the states can be re-interpreted as representing flavours of political opinion. The coupling determines how strongly associates influence each others’ opinions. If is close to zero, then opinions are independent and randomly distributed. When groupthink dominates. In a population of economists or social misfits who like to disagree with one another, . However is probably the most relevant and certainly most interesting case.

Much is known about Potts models and equations (1) & (2) are easy to simulate using Gibbs sampling (results below were obtained in *C++* linked to *R* using *Rcpp*). For simplicity, I took the case of a square lattice with nearest-neighbour coupling. This means that each opinion-holder has four associates, and each person’s network overlaps with four other networks. Results for the average agreement (i.e. the average number of associates with whom each person agrees, a number between 0 and 4) on 5050 and 100100 lattices are shown below. The coupling was slowly increased from to with up to 60,000 thermalisation sweeps at each step. At there is a sharp jump in the level of agreement (from about 2 to 3).

The transition can also be seen in opinion percentages. Below, opinion favourable to the blue candidate jumps from 12.5% to over 50% at the transition. All other candidates follow the pink line.

There is no model bias favouring the blue candidate. Their selection is a case of “spontaneous symmetry breaking”. The jump is already very sharp at and becomes a true mathematical discontinuity when .

In this model an opinion poll carried out at has nothing to say about which candidate would emerge if increased to by election day. Is this one reason why opinion polls often turn out mysteriously wrong?

]]>

The Peirce Quincunx (PQ) has nice features for the display of global weather and climate data. Firstly, as a conformal map, it does not distort the shapes of relatively localized objects such as the cyclones present in global weather patterns. The Mercator projection is also conformal, but is notorious for it’s large scale distortion. PQ scale distortion is generally smaller, becoming large only near the four equatorial singular points.

Secondly, PQ is pole-friendly. The poles are special points in the climate system because the jet streams circulate about them. It seems perverse to display climate information using map projections (such as geographic) which are singular at the poles. Other examples of pole-friendly projections include transverse Mercator and oblique cylindrical equal area.

The video shows 850mb PQ wind speed raster maps for October 2015 based on 6-hourly 0.25^{o} resolution gdas data. 1 video second corresponds to two days of weather data.

There is a nice discussion of PQ implementation in *R* here. To create raster maps, both forward and inverse projections were required. These functions are available in *tcl*‘s *mapproj* library and can be called from *R*. More details later.

*“Considering that these islands are placed directly under the equator, the climate is far from being excessively hot; this seems chiefly caused by the singularly low temperature of the surrounding water, brought here by the great southern Polar current.. very little rain falls, and even then it is irregular**.*” – Darwin (Voyage of the Beagle) describing the climate of the Galapagos.

Ascension Island is located in the tropical South Atlantic (8S). A visitor expecting to find lush tropical vegetation is suprised, just as Darwin was surprised by the climate of the Galapagos. Saint Helena, 1300km to the South-West (16S) is also dry. So too is the island of Santiago, Capo Verde 2700km to the North (15N). On a recent trip to these islands I was left wondering where the tropical rainfall was.

Tropical rainfall is tied to the intertropical convergence zone (ITCZ) where trade winds from Northern and Southern hemispheres (NH & SH) converge. Convergence implies uplift and uplift of moist air produces convective rainfall. ITCZ tends to follow maximum sunshine seasonally about the equator, which explains the timing of the wet season at a given location. This begs the question, *why are some tropical oceanic islands almost entirely dry while others at similar latitude are very wet?*

I used historical atmospheric circulation data from ERA-interim reanalysis to try to shed light on this puzzle. Mean monthly divergence of surface (10m) wind field was computed at 0.75 resolution based on the years 1979-2015. The ITCZ corresponds to the band of strongly negative divergence (convergence) near the equator (in mathematical notation ).

ITCZ moves with seasons as expected. However ERA data show that over much of the Atlantic and Pacific oceans, it is *(a)* narrow, and *(b)* shifted significantly towards the NH (by about 5). This is why the convergence manages to miss Capo Verde, Ascension and Saint Helena almost entirely which accounts for the surprisingly dry climate of these islands. On the other hand, ITCZ dips far enough South to bring heavy rain to Fernando de Noronha (3S off the coast of Brazil) during April.

The large NH bias of ITCZ[1] is a fundamental fact of the climate system. For example, it is a factor in the rarity of hurricanes in the SH. NH bias is believed to be related to the observation that SH is 1.25C cooler than NH, which is in turn related to imbalance of ocean heat transport between hemispheres.

]]>The video shows the evolution of the deadly category 4 *H**urricane Joaquin*. Isobars are shown on a background of 250mb wind speeds (i.e. near top of troposphere). *Joaquin* develops from a tropical Atlantic depression North-East of the Bahamas on 28 September. It is evident the hurricane intensified in a region of low wind shear.

data: http://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/

data handling/graphics: wgrib2, *R*, ggplot2, ffmpeg

hurricane physics: Kerry Emmanuel

]]>At each time step , the non-stationary Markov process is governed by a transition matrix

(1)

where . can be coupled to the external covariate using logistic functions, which preserve the stochastic matrix property. For instance, choose:

(2)

Also, assume that varies harmonically with timescale (angular period) i.e.

(3)

The graph below the expected response (a number between 1 & 2) as a function of obtained by simulating 200 Markov chains of length using Equations (1)-(3).

The striking feature is the appearance of an hysteresis loop. Hysteresis, where the mean response has separate branches for increasing and decreasing , becomes pronounced when varies rapidly (small ). On the other hand, when varies slowly, the mean response collapses to a single branch.

Starting from some initial state, a markov chain approaches it’s equilibrium distribution after a *mixing time*. The mixing time is longer when there are bottlenecks e.g.when in (1). When the covariate varies on a time-scale which is long compared to the mixing time, the system remains close to equilibrium at all times. Hysteresis is absent in this case. Conversely, when the covariate varies rapidly the transition matrix changes too quickly for the system to reach equilibrium. This leads to hysteresis.

Logistic regression is valid as long as the mixing time is short compared to the timescale over which the covariate varies. However, if the mixing time is long, multiple branches of the response are present. This aspect of Markov chain dynamics is missing from ordinary logistic regression. This conclusion generalises to the N-state/multinomial regression case.

A good way simulate non-stationary Markov chains in *R* is to pass a list of transition matrices to a C++ function (*MCsim1*) using *Rcpp*. For example:

Rcpp::sourceCpp('~/markov.cpp') x <- sin(1:100000/4) T.ts <- lapply(x,function(d) Tmat(1/(1+exp(-2*(1-2*d))),1/(1+exp(-2*(1+2*d)))) ) simulation <- MCsim1(T.ts,start=1)

where “~/markov.cpp” contains:

// [[Rcpp::depends(RcppArmadillo)]] #include <RcppArmadilloExtensions/sample.h> using namespace Rcpp ; // [[Rcpp::export]] IntegerVector MCsim1(Rcpp::List t,int start) { NumericMatrix t1 = t[1]; int nrow = t1.nrow(); int n = t.length(); NumericMatrix s; IntegerVector states = seq_len(nrow)-1; IntegerVector sim(n); sim(0) = start; for (int j = 1; j < n; j++) { NumericMatrix s=t[j]; NumericVector probs = s(sim(j-1), _); IntegerVector newstate = RcppArmadillo::sample(states, 1, false, probs); sim(j) = newstate(0); } for (int j = 0; j < n; j++) sim(j) = sim(j)+1; return sim; }

` `

In an efficient market the price of a commodity reflects all available information. Did the coffee price assimilate long-range weather forecast information available in 2014 ?

The above chart show monthly rainfall (crosses) and average rainfall (black line). Rainfall totals were extracted from ERA-interim reanalysis. Periods of deficit relative to average rainfall are indicated in red. Rainfall was less than 50% of average in both January and February 2014.

Despite dryness, coffee futures actually trended slightly lower during January 2014 (above). However this situation reversed dramatically after January 29 (indicated by the red arrow). It is as though the market abruptly woke up to the fact that drought would continue well into February and that this would impact Arabica coffee fruit development.

In fact, well in advance of January 29, long range weather forecasts were indicating a high probability of continued drought in February. The graph below shows a large ensemble of CFSv2 rainfall forecasts for Minas Gerais for December, January and February. Such forecasts[*] are made every 6 hours up to 9 months prior to the forecast month. A high probability of anomalous rainfall for the months of January and February is evident some 2-3 weeks in advance.

This analysis points to a surprising conclusion. For perhaps two weeks, world coffee market prices did not properly reflect probabilistic information available from long range weather forecasts.

[*] Raw forecasts, not bias corrected.

]]>Rainfall derives from uplift of moist air. In mountainous areas, uplift is forced by terrain and so depends on the horizontal wind vector . The dependence of rainfall on wind direction is illustrated below for a fictitious range of hills (dashed contours indicate the hills).

The “upslope” model for rainfall () at location is:

(1)

is the gradient of the terrain height and is related to the saturated water vapour density. is the background rainfall rate.

In reality, Equation (2) is a poor approximation. Rain spreads downwind because there is a delay before raindrops can form, and another delay before fallout. Airflow dynamics also spreads the influence of an obstacle upwind. These effects can be included in a Fourier transformed version of equation (1):

(2)

is the Fourier transform of the terrain height and is a form factor which includes the effects of advection and mountain wave dynamics. Equation (2) reduces to the upslope model when .

The figure illustrates orographic rainfall for a fictitious range of gaussian hills. It is very quick to compute (2) using Discrete FFT. If you use *R*, things can be speeded up by constructing the integrand in C++ using *Rcpp* before passing it to *fft*. This is very fast even for realistic landscapes.

Links to some of the figures from the report are given below.

]]>The output of the best predictive models (determined by cross-validation for example) always shows less variance than the observations. This fact is called *shrinkage*.

Shrinkage can be understood from an identity known in weather forecasting as *Murphy-Epstein decomposition[*].*

is the correlation between forecasts and observations, and and are standard deviations of the observations and forecasts respectively.

To maximise skill, the second term needs to be made as small as possible. For example, requires .

Having low variance compared to the observations may seem strange. It makes your predictive model seem a less realistic description of reality. Yet shrinkage is a feature of any imperfect () but optimised predictive model.

[*] Statistical Analysis in Climate Research, H. Storch and F Zwiers, Cambridge University Press, 2002

]]>