Posts Tagged ‘ecosystem model’

Mapping Biomes

Recently (2008) the European Space Agency produced GlobCover (ESA GlobCover Project, led by MEDIAS-France), the highest resolution (300m) global land cover map to date. GlobCover uses 21 primary land cover classes and many more sub-classes. Land cover classification (LCC) schemes divide the earth into biomes. Biomes are the simplest way to classify vegetation which can be applied globally. LCC makes sense because the boundaries between different ecosystems (ecotones) are sharp. However, definitions vary and there is no agreed standard set of biomes.[1]

GlobCover Example

Puntarenas is a province on the pacific coast of Costa Rica.  The province has a typical mix of tropical land cover. This includes some spectacular examples of Pacific Rainforest, notably on the Osa Peninsula. Puntarenas has an area of ≈ 11,000 sq. km or about 120,000 GlobCover pixels.

13 land cover types are present in the GlobCover map below. The barplot on the right shows the total amounts present in each class.



The GlobCover legend (above) has mixed land cover classes, where more than one biome occurs  inside a map pixel. This is especially true in the man-made biomes (agriculture) . For example, there are three cropland land cover types depending on the relative amounts of other vegetation present.

GlobCover map making in R

R is a programming language, not a specialised geographic information system (GIS) such as GRASS or commercial packages. However applications of R to spatial problems is a growth industry.[2]

A GlobCover map similar to the above can be produced for any area of interest. The Geospatial Data Abstraction Library (GDAL) should be installed on your system. FWTools is the place to go. You also need R packages sp and rgdal installed. The regional GLOBCOVER map for Central America can be downloaded from ESA here. GlobCover is in GeoTiff format i.e. a Tiff image file which contains georeferencing information. The following GDAL command (from command line, or run from R using shell) creates a 4° x 4° submap centred on Costa Rica.

gdalwarp GLOBCOVER_200412_200606_V2.2_CentralAmerica_Reg.tif -te -86 8 -82 12 costaRica.tif

costaRica.tif is read into R using the rgdal package:

costa <- readGDAL("costaRica.tif")

costa has class SpatialGridDataFrame, which is a class defined in the package sp (loaded when rgdal is loaded).

Administative boundaries for Costa Rica were obtained from Global Administrative Areas (see Revolution R blog post)

con <- url("")

Costa Rican provinces are now contained in the object gadm of class SpatialPolygonsDataFrame. The boundaries of Puntarenas province (excluding Cocos Island) are extracted as follows:

temp <- SpatialPolygons(list(temp),proj4string=CRS(proj4string(gadm)))
punt.sp <- SpatialPolygonsDataFrame(temp, data.frame(cbind(2,2), row.names=c("puntarenas")))  # puntarenas

The overlay() method  is used to extract the land cover map puntarenas from costa:

puntarenas <- costa
puntarenas.overlay <- overlay(costa,punt.sp)  # 1 in interior of puntarenas polygons, 0 outside
puntarenasband1 <- costaband1*puntarenas.overlay

Unfortunately overlay() is rather slow, because it applies to the entire raster. Eventually puntarenas appears as a SpatialGridDataFrame which can be plotted using standard R tools such as image().

The code needed to generate the above plot is here.


[1] For example, the International Biosphere-Geosphere Programme (IGBP) land cover legend used 17 biomes. The University of Maryland map used 14 biomes. At much lower resolution, numerical weather forecasting models such as US National Center for Climate Prediction Global Forecasting System (NCEP-GFS) also use alternative schemes.


[2] Roger S. Bivand, Edzer J. Pebesma, and Virgilio Gómez-Rubio. Applied Spatial Data Analysis with R. Springer, New York, 2008

Flux Towers: Part II


West of Ireland oak woodland - October 11

Flux tower eddy covariance instruments measure CO2 flux between an ecosystem and the atmosphere. Observed CO2 fluxes (also called NEE, Net Ecosystem Exchange) can be correlated with locally measured solar intensity, temperature, humidity etc. With enough data, a detailed picture of the local response of the ecosystem to environmental variables to be built up. Flux towers are one of the important tools for quantitative understanding the sensitivity of vegetation to climatic variability.

CO2 flux data  are noisy, to say the least. If you want to understand ecosystem reponse, you need a statistical model.  This post gives some simple ideas and R code showing how such a model can be developed. The example used is Harvard Forest [1], Massachussetts, a temperate deciduous broadleaf forest for which a large amount of flux tower data (collected by Munger & Wofsy) is available. It could equally be applied to tropical forest, grassland, cropland etc at one of the 200+ fluxnet sites worldwide. The model falls short of the full scientific rigour given in the references, but it can easily be developed into a more complete tool.

The main interest here is ecosystem response to sunlight i.e. photosynthesis. The light dependent part of the NEE is GEP (Gross Ecosystem Production), while the light independent part is RE (ecosystem respiration).


As always, the first steps are about getting the data into the right form. We use half-hourly data for 1991-2007  for Harvard Forest  i.e Ameriflux Level 2, site code USHa-1. This raw data has not been subject to any gap-filling statistical treatment. Here are the pre-processing steps:

  1. download the annual data files and create a dataframe harvard using the rbind() combining all of the data
  2. replace missing data (e.g. -9999) with “NA”
  3. add a new fornight index, indicating to which fortnight of the year each row of data belongs
  4. select a subset of the Ameriflux variables of interest. e.g. column headings c(“YEAR”,”DTIME”,”FN”,”TA”,”NEE”,”TS1″,”VPD”,”PAR_in”)
  5. drop rows of data when one or more variables are undefined
  6. group the data according to fortnights index i.e. create a list of 26 elements where each element is a dataframe containing all data sharing the same fortnight index.
  7. drop “dark” data i.e. rows of data where PAR < 30.

The environmental variables retained at step 4 are TA (atmospheric temperature), TS1  (soil temperature), VPD (vapour pressure deficit) and PAR (photosynthetically active radiation). PAR is solar radiation in the range 0.4-0.7 microns which is energetic enough to contribute to photosynthesis. It has units μ mols s-1 m-2. (1 μ mole = 6 x 1017.)

Here is what the R code looks like:
#download the annual data files and create a dataframe harvard
loc=file.path(""); # Ameriflux US_Ha1 download.file(loc,"temp.dat");
ha_1991=read.csv("temp.dat", header=FALSE,skip=20); .........................
loc=file.path(""); # Ameriflux US_Ha1 download.file(loc,"temp.dat");
ha_2007=read.csv("temp.dat", header=FALSE,skip=20); harvard <- rbind(ha_1991,ha_1992,ha_....._2006,ha_2007); # merged dataframes
#replace missing data with "NA"
temp <- (harvard[,] == -9999 | harvard[,] == -6999); #replace -9999 or -6999 with NAs
#add a new fortnight index
lims <- seq(from=1,to=365,by=14); # fortnightly data
lims[27] <- 367; # bound adjust, last fortnight has extra days fortnight
<- function(d) {sum(ifelse(d<lims,0,1))}  # convert from DOY to fortnights
harvard <- data.frame( "FN" = sapply(harvardDOY,fortnight),harvard)  # add fortnight labels 1..26 column #select a subset of the Ameriflux variables of interest sub<- c("YEAR","DTIME","FN","TA","NEE","TS1","VPD","PAR_in") harvardS <- harvard[,colnames(harvard) %in% sub, drop=FALSE] # subset dataframe <span style="color: #ff6600;">#  drop data when U* < 0.2 m/s</span> u_cut <- (harvardUST < 0.2|; <span style="color: #ff6600;">#drop every row of data when one or more variables are undefined</span> x <-;
y <-; z <-;
u <-; v <-;
w <- !(x|y|z|u|v|u_cut); #boolean. w is true when all variables are defined and U* > 0.2 m/s
harvardS <- data.frame("YEAR"=harvardYEAR[w],"FN"=harvardFN[w],"TA"=harvardTA[w],"NEE" = harvardNEE[w], "TS1" = harvardTS1[w], "VPD" = harvardVPD[w],"PAR" = harvardPAR_in[w]) <span style="color: #ff6600;">#group the data according to fortnights index</span> getF <- function(i) {subset(harvardS,FN==i)} # list of dataframes according to fortnight index harvardF <- lapply(seq(1:26),getF) # generate list <span style="color: #ff6600;">#drop "dark" data i.e. rows of data where PAR < 30.</span> harvardS.light <- subset(harvardS,PAR>30)#day only getF.light <- function(i) {subset(harvardS.light,FN==i)} <span style="color: #ff6600;"># list of dataframes according to fortnight value</span> harvardF.light <- lapply(seq(1:26),getF.light) # generate list</code>  <em>harvardF.light</em> contains the required clean daytime half-hourly data grouped by fortnight index. <h3>Model-Building</h3> The plots below show NEE versus PAR. Fortnights 8 & 22 (mid April & late October) are winter months with positive NEE and little evidence of photosynthetic response. Fortnights 14 & 18 (early July and late August) show photosynthesis (-ve NEE) and a strong response to PAR. <a href=""><img class="size-large wp-image-1005  aligncenter" title="gep" src="" alt="gep" width="717" height="671" /></a>  The NEE-PAR curves during summer months saturate at high light levels. This is not too surprising. Photosynthesis is limited by water and CO2 availability and so cannot increase indefinitely with increasing PAR. A simple light saturation model is [2] latex \mathrm{NEE} = \mathrm{R_E} -{{\alpha \beta \mathrm{PAR}} \over {\beta + \alpha \mathrm{PAR}}}  The second term on the right-hand-side represents GEP.  At low light levels GEP ≈ &alpha;PAR. The parameter &alpha; is called the "photosynthetic efficiency". At high light levels,  GEP ≈ &beta; and so &beta; is the light saturation GEP.  This 3-parameter simple light saturation model can be fit to extract parameters for each fortnightly period. However, a better model recognises that photosynthesis responds to changes in moisture availability and temperature. This effect can be introduced by replacinglatex \alpha \rightarrow \alpha \epsilonin the numerator of GEP, where &epsilon; depends on atmospheric temperature TA and vapour pressure deficit VPD. &epsilon;=1 under optimal temperature and humidity conditions, otherwise &epsilon;<1. We need to input a reasonable model for how quickly GEP falls away about the optimal values of TA (at about 25C) and VPD (at zero  kPa). After some experimenting, the following model was found to be reasonable[3] latex \epsilon = {\lambda^2 \over {\mathrm{\left(TA-25 \right)}^2 + \lambda^2} } {\mu^2 \over {\mathrm{VPD}^2 + \mu^2}}latex \lambdaandlatex \muare two additional parameters which must be determined. This 5-parameter model can be fit for each of the photosynthetically active periods 11-20.  The nls() function in R carries out a numerical maximum likelihood fit using least squares. Unlike ordinary linear least squares, it is not guaranteed to converge. The model and initial parameter values have to be chosen carefully. A good approach is to use parameters from the simple light saturation model as initial values in nls(). Here is the code needed to do the 5-parameter model fits  <code> #fits.mid is a list of fitted R objects for(i in start_period:end_period) { fits.mid[[i]] <- nls(NEE~ gamma - alpha*beta*(lambda^2/((TA-25)^2+lambda^2))*mu^2/(VPD^2+mu^2)*PAR/(beta+alpha*PAR), data = harvardF.light[[i]], start=list(gamma=1,alpha=0.02,beta=40,lambda=40,mu=1),trace=F); </code>  In fact, &lambda; and &mu; are not expected to vary much during the summer period, since they depend on leaf properties at the cellular level.  Indeed they turn out to be roughly constant for fortnight periods 11-20. It is reasonable to replace them by their average values &mu;<sub>av</sub> = 2.1kPa and &lambda;<sub>av</sub>=34 &mu; mols s<sup>-1</sup> m<sup>-2</sup>. With &epsilon; determined in this way, a final 3-parameter model fit is carried out. Once again the three parameters are RE, &alpha; and &beta; but this time the model takes better account of temperature and moisture sensitivity, via &epsilon;. <h3>Results</h3> The model suggests that VPD is an important factor limiting GEP. e.g. vapour pressure deficit of 1kPa reduces GEP by 20% whenlatex \mu = 2kPa. This effect is seen in the plots below. <a href=""><img class="size-large wp-image-1015  aligncenter" title="nee" src="" alt="nee" width="819" height="650" /></a>  The implication of this is that about 30% of the apparent "light saturation" of the NEE PAR response is due to water stress (VPD). The remainder is due to CO2 availability and possibly other constraints.  The plot below shows the variation of parameters R_E , &beta; and &alpha; with fortnight index. Only &beta; shows a strong variation during the summer months, with a peak in early June <a href=""><img class="size-large wp-image-1024  aligncenter" title="params" src="" alt="params" width="1024" height="481" /></a>latex \alpha^{-1}$ corresponds to the number of  photosynthetically active photons  arriving at the forest canopy for every CO2 molecule trapped by photosynthesis. This number is about 14.

A common feature of future climate scenarios is competition between increased CO2 availability and increased water stress layed out on a global scale. Clearly flux tower data are good place to look for these effects.

The R script for the model and plots can be found here.


[1] Factors controlling CO2 exchange on timescales from hourly to decadal at Harvard Forest, Urbanski, S., C. Barford, S. Wofsy, C. Kucharik, E. Pyle, J. Budney, K.

McKain, D. Fitzjarrald, M. Czikowsky, J. W. Munger

[2] See for example, An evaluation of models for partitioning eddy covariance-measured
net ecosystem exchange into photosynthesis and respiration
P. Stoy et al,

[3] More detailed models are give in An empirical model simulating diurnal and seasonal CO2 flux for diverse vegetation types and climate conditions, M. Saito, S. Maksyutov, R. Hirata, and A. D. Richardson. http // & references.