Surface weather stations are an important source of climatic information. They provide standardised daily instrumental time-series with records going back decades or even centuries in some cases. This post shows how to retrieve station data and extract information about climatic variability using the remarkable statistical computing language R.
The U.S. National Climate Data Center (NCDC) maintain a large dataset called the Global Historical Climate Network (ftp site GHCN) – temperature and other data collected at more than 40,000 weather stations. Unfortunately, individual weather stations come and go and there are many gaps in the data. A much smaller database of about 1,000 stations is maintained by the Global Climate Observing System (GCOS) sponsored by World Meteorological Organisation (WMO) and other organisations. We focus on this subset because these stations are up-to-date data and reliable.
The locations of GCOS Global Surface Network (GSN) stations are shown in Google Earth/Maps below.
To see more stations, zoom in to a region of interest and switch to Satellite view. Click on the GCOS logo placemark to see GHCN station ID and other station metadata. If Google Earth is installed on your system, it may be better to copy the text file gcosgsn, paste into notepad and save as gcosgsn.kml. Clicking on this .kml file opens GE on your system with all GSN stations visible.
gscosgsn was generated using an excel spreadsheet. Improved GSN geo-location data were obtained by cross-referencing the GCOS list with NOAA’s GSOD (0.001o accuracy). Other metadata, such as station start year, can easily be added.
How can the climatic variability at a particular GSN station be examined? To do this R will need to be installed on your system. Install it from CRAN if required.
In what follows, a monthly time-series analysis is carried out on a user-selected station. Daily GHCN station data are read into R courtesy of a data-scraping script from Climate Audit. anything.ts indicates an R time-series object by the way.
- Start R. You will see the friendly command line prompt in the R console.
- Enter the following command to load required R script.
- Select a station of interest in GE/GE Plugin. Copy the GHCN station ID from the placemark balloon and enter the following command to load the GHCN data file into R
gsn1 <-read.ghcnd("paste_id_here")
- Plot the mean monthly temperature by copying and paste the following into R
gsn1.ts <- getTMean.ts(gsn1);
plot(gsn1.ts, main="Mean Monthly Temperature UCCLE Belgium", ylab="o C",xlab="Year",font.axis=2,font.lab=2)
lines(getTrend.ts(gsn1.ts),col=2,lwd=2);
maxTrend <- max(getTrend.ts(gsn1.ts));
maxLine.ts <- ts( rep(maxTrend,length(gsn1.ts)), start= start(gsn1.ts), deltat=1/12)
lines(maxLine.ts, col=2,lty=2); - To see the climatic variability
plot(getResidual.ts(gsn1.ts), main="Monthly Temperature Anomaly UCCLE Belgium", ylab="o C",xlab="Year",font.axis=2,font.lab=2,col=3);
>
source("https://joewheatley.net/wp-content/uploads/2009/05/ghcntor.txt")
For example, station ID BE000006447 (UCCLE Belgium) generates the plots below.
The periodic black line in the first plot is the monthly average station temperature. Data continuity is from 1831 in this instance. The red line is the long-term trend when seasonal variations and residual random variations are extracted by means of the R stl function. As you can see it indicates that mean temperatures at UCCLE are the highest they have been since 1831. Compared to the average temperature for 1831-2009, this anomaly is about 1.36o. A prime candidate for this is global warming, but local factors such as land use change or increased urbanisation also affect individual station data.
The temperature anomaly plot (green line) shows the residual temperature variations at UCCLE. Of course these have a larger year-to-year influence than the trend line variations.
If you have followed the above steps you have done something at once both cool and complex. You have combined the most reliable climate data with the best available statistical resource.