Flux Towers: Part II

West of Ireland oak woodland - October 11

Flux tower eddy covariance instruments measure CO2 flux between an ecosystem and the atmosphere. Observed CO2 fluxes (also called NEE, Net Ecosystem Exchange) can be correlated with locally measured solar intensity, temperature, humidity etc. With enough data, a detailed picture of the local response of the ecosystem to environmental variables to be built up. Flux towers are one of the important tools for quantitative understanding the sensitivity of vegetation to climatic variability.

CO2 flux data  are noisy, to say the least. If you want to understand ecosystem reponse, you need a statistical model.  This post gives some simple ideas and R code showing how such a model can be developed. The example used is Harvard Forest [1], Massachussetts, a temperate deciduous broadleaf forest for which a large amount of flux tower data (collected by Munger & Wofsy) is available. It could equally be applied to tropical forest, grassland, cropland etc at one of the 200+ fluxnet sites worldwide. The model falls short of the full scientific rigour given in the references, but it can easily be developed into a more complete tool.

The main interest here is ecosystem response to sunlight i.e. photosynthesis. The light dependent part of the NEE is GEP (Gross Ecosystem Production), while the light independent part is RE (ecosystem respiration).


As always, the first steps are about getting the data into the right form. We use half-hourly data for 1991-2007  for Harvard Forest  i.e Ameriflux Level 2, site code USHa-1. This raw data has not been subject to any gap-filling statistical treatment. Here are the pre-processing steps:

  1. download the annual data files and create a dataframe harvard using the rbind() combining all of the data
  2. replace missing data (e.g. -9999) with “NA”
  3. add a new fornight index, indicating to which fortnight of the year each row of data belongs
  4. select a subset of the Ameriflux variables of interest. e.g. column headings c(“YEAR”,”DTIME”,”FN”,”TA”,”NEE”,”TS1″,”VPD”,”PAR_in”)
  5. drop rows of data when one or more variables are undefined
  6. group the data according to fortnights index i.e. create a list of 26 elements where each element is a dataframe containing all data sharing the same fortnight index.
  7. drop “dark” data i.e. rows of data where PAR < 30.

The environmental variables retained at step 4 are TA (atmospheric temperature), TS1  (soil temperature), VPD (vapour pressure deficit) and PAR (photosynthetically active radiation). PAR is solar radiation in the range 0.4-0.7 microns which is energetic enough to contribute to photosynthesis. It has units μ mols s-1 m-2. (1 μ mole = 6 x 1017.)

Here is what the R code looks like:
#download the annual data files and create a dataframe harvard
loc=file.path("ftp://cdiac.ornl.gov/pub/ameriflux/data/Level2/Sites_ByID/US-Ha1/with_gaps/usmaharv_1991_L2.csv"); # Ameriflux US_Ha1 download.file(loc,"temp.dat");
ha_1991=read.csv("temp.dat", header=FALSE,skip=20); .........................
loc=file.path("ftp://cdiac.ornl.gov/pub/ameriflux/data/Level2/Sites_ByID/US-Ha1/with_gaps/usmaharv_2007_L2.csv"); # Ameriflux US_Ha1 download.file(loc,"temp.dat");
ha_2007=read.csv("temp.dat", header=FALSE,skip=20); harvard <- rbind(ha_1991,ha_1992,ha_....._2006,ha_2007); # merged dataframes
#replace missing data with "NA"
temp <- (harvard[,] == -9999 | harvard[,] == -6999); #replace -9999 or -6999 with NAs
#add a new fortnight index
lims <- seq(from=1,to=365,by=14); # fortnightly data
lims[27] <- 367; # bound adjust, last fortnight has extra days fortnight
<- function(d) {sum(ifelse(d<lims,0,1))}  # convert from DOY to fortnights
harvard <- data.frame( "FN" = sapply(harvardDOY,fortnight),harvard)  # add fortnight labels 1..26 column #select a subset of the Ameriflux variables of interest sub<- c("YEAR","DTIME","FN","TA","NEE","TS1","VPD","PAR_in") harvardS <- harvard[,colnames(harvard) %in% sub, drop=FALSE] # subset dataframe <span style="color: #ff6600;">#  drop data when U* < 0.2 m/s</span> u_cut <- (harvardUST < 0.2|is.na(harvardUST)); <span style="color: #ff6600;">#drop every row of data when one or more variables are undefined</span> x <- is.na(harvardSTA);
y <- is.na(harvardSNEE); z <- is.na(harvardSTS1);
u <- is.na(harvardSVPD); v <- is.na(harvardSPAR_in);
w <- !(x|y|z|u|v|u_cut); #boolean. w is true when all variables are defined and U* > 0.2 m/s
harvardS <- data.frame("YEAR"=harvardYEAR[w],"FN"=harvardFN[w],"TA"=harvardTA[w],"NEE" = harvardNEE[w], "TS1" = harvardTS1[w], "VPD" = harvardVPD[w],"PAR" = harvardPAR_in[w]) <span style="color: #ff6600;">#group the data according to fortnights index</span> getF <- function(i) {subset(harvardS,FN==i)} # list of dataframes according to fortnight index harvardF <- lapply(seq(1:26),getF) # generate list <span style="color: #ff6600;">#drop "dark" data i.e. rows of data where PAR < 30.</span> harvardS.light <- subset(harvardS,PAR>30)#day only getF.light <- function(i) {subset(harvardS.light,FN==i)} <span style="color: #ff6600;"># list of dataframes according to fortnight value</span> harvardF.light <- lapply(seq(1:26),getF.light) # generate list</code>  <em>harvardF.light</em> contains the required clean daytime half-hourly data grouped by fortnight index. <h3>Model-Building</h3> The plots below show NEE versus PAR. Fortnights 8 & 22 (mid April & late October) are winter months with positive NEE and little evidence of photosynthetic response. Fortnights 14 & 18 (early July and late August) show photosynthesis (-ve NEE) and a strong response to PAR. <a href="http://joewheatley.net/wp-content/uploads/2009/10/gep.png"><img class="size-large wp-image-1005  aligncenter" title="gep" src="http://joewheatley.net/wp-content/uploads/2009/10/gep-1024x958.png" alt="gep" width="717" height="671" /></a>  The NEE-PAR curves during summer months saturate at high light levels. This is not too surprising. Photosynthesis is limited by water and CO2 availability and so cannot increase indefinitely with increasing PAR. A simple light saturation model is [2] latex \mathrm{NEE} = \mathrm{R_E} -{{\alpha \beta \mathrm{PAR}} \over {\beta + \alpha \mathrm{PAR}}}  The second term on the right-hand-side represents GEP.  At low light levels GEP ≈ &alpha;PAR. The parameter &alpha; is called the "photosynthetic efficiency". At high light levels,  GEP ≈ &beta; and so &beta; is the light saturation GEP.  This 3-parameter simple light saturation model can be fit to extract parameters for each fortnightly period. However, a better model recognises that photosynthesis responds to changes in moisture availability and temperature. This effect can be introduced by replacinglatex \alpha \rightarrow \alpha \epsilonin the numerator of GEP, where &epsilon; depends on atmospheric temperature TA and vapour pressure deficit VPD. &epsilon;=1 under optimal temperature and humidity conditions, otherwise &epsilon;<1. We need to input a reasonable model for how quickly GEP falls away about the optimal values of TA (at about 25C) and VPD (at zero  kPa). After some experimenting, the following model was found to be reasonable[3] latex \epsilon = {\lambda^2 \over {\mathrm{\left(TA-25 \right)}^2 + \lambda^2} } {\mu^2 \over {\mathrm{VPD}^2 + \mu^2}}latex \lambdaandlatex \muare two additional parameters which must be determined. This 5-parameter model can be fit for each of the photosynthetically active periods 11-20.  The nls() function in R carries out a numerical maximum likelihood fit using least squares. Unlike ordinary linear least squares, it is not guaranteed to converge. The model and initial parameter values have to be chosen carefully. A good approach is to use parameters from the simple light saturation model as initial values in nls(). Here is the code needed to do the 5-parameter model fits  <code> #fits.mid is a list of fitted R objects for(i in start_period:end_period) { fits.mid[[i]] <- nls(NEE~ gamma - alpha*beta*(lambda^2/((TA-25)^2+lambda^2))*mu^2/(VPD^2+mu^2)*PAR/(beta+alpha*PAR), data = harvardF.light[[i]], start=list(gamma=1,alpha=0.02,beta=40,lambda=40,mu=1),trace=F); </code>  In fact, &lambda; and &mu; are not expected to vary much during the summer period, since they depend on leaf properties at the cellular level.  Indeed they turn out to be roughly constant for fortnight periods 11-20. It is reasonable to replace them by their average values &mu;<sub>av</sub> = 2.1kPa and &lambda;<sub>av</sub>=34 &mu; mols s<sup>-1</sup> m<sup>-2</sup>. With &epsilon; determined in this way, a final 3-parameter model fit is carried out. Once again the three parameters are RE, &alpha; and &beta; but this time the model takes better account of temperature and moisture sensitivity, via &epsilon;. <h3>Results</h3> The model suggests that VPD is an important factor limiting GEP. e.g. vapour pressure deficit of 1kPa reduces GEP by 20% whenlatex \mu = 2kPa. This effect is seen in the plots below. <a href="http://joewheatley.net/wp-content/uploads/2009/10/nee.png"><img class="size-large wp-image-1015  aligncenter" title="nee" src="http://joewheatley.net/wp-content/uploads/2009/10/nee-1024x813.png" alt="nee" width="819" height="650" /></a>  The implication of this is that about 30% of the apparent "light saturation" of the NEE PAR response is due to water stress (VPD). The remainder is due to CO2 availability and possibly other constraints.  The plot below shows the variation of parameters R_E , &beta; and &alpha; with fortnight index. Only &beta; shows a strong variation during the summer months, with a peak in early June <a href="http://joewheatley.net/wp-content/uploads/2009/10/params.png"><img class="size-large wp-image-1024  aligncenter" title="params" src="http://joewheatley.net/wp-content/uploads/2009/10/params-1024x481.png" alt="params" width="1024" height="481" /></a>latex \alpha^{-1}$ corresponds to the number of  photosynthetically active photons  arriving at the forest canopy for every CO2 molecule trapped by photosynthesis. This number is about 14.

A common feature of future climate scenarios is competition between increased CO2 availability and increased water stress layed out on a global scale. Clearly flux tower data are good place to look for these effects.

The R script for the model and plots can be found here.


[1] Factors controlling CO2 exchange on timescales from hourly to decadal at Harvard Forest, Urbanski, S., C. Barford, S. Wofsy, C. Kucharik, E. Pyle, J. Budney, K.

McKain, D. Fitzjarrald, M. Czikowsky, J. W. Munger http://www.agu.org/pubs/crossref/2007.../2006JG000293.shtml

[2] See for example, An evaluation of models for partitioning eddy covariance-measured
net ecosystem exchange into photosynthesis and respiration
P. Stoy et al, http://www.geos.ed.ac.uk/homes/pstoy/Stoy_06b.pdf

[3] More detailed models are give in An empirical model simulating diurnal and seasonal CO2 flux for diverse vegetation types and climate conditions, M. Saito, S. Maksyutov, R. Hirata, and A. D. Richardson. http //www.biogeosciences.net/6/585/2009/bg-6-585-2009.html & references.