The impacts of drought depend on time-scale. On short time-scales, drought means dry soil. On long time-scales, it means dry rivers and empty reservoirs. A region may simultaneously experience dry conditions on one time-scale and wet conditions on another e.g. wet soil but low streamflow or visa versa.

**Standardized Precipitation Index** (SPI) is a widely used measure of drought which can be defined for any time-scale of interest. For any location, SPI is normally distributed with zero mean and unit standard deviation. Index values > 2 indicate exceptionally wet conditions for that location, values < -2 indicate exceptionally dry conditions for that location, etc. Historical precipitation is the only input needed to compute SPI.

Australia experienced drought between 2002 and 2007. The image below shows SPI computed for a location in the drought-prone Murray-Darling basin of New South Wales. The time-series run from Jan 1948 to Jan 2010 and the index was calculated for time-scales from 1 to 12 months. Precipitation data is from **NCEP Reanalysis** [1] in a 1.875° × 1.875° grid cell centred at 30°S 145°E.

The drought of 2002 to 2007 shows up very clearly. It was preceeded by a wet period between 2005 and 2001. While 2009 showed an episode of severe drought at short time-scales, SPI at was normal/wet at longer time-scales during 2009. Agricultural yields recovered.

## Calculating SPI-*M*

Empirical rainfall probability distributions are far from normal (gaussian) and often approximate a shifted gamma distribution. The empirical cumulative probability distributions are used to transform the rainfall time-series into time-series of percentile probabilities. A normally distributed precipitation index is found by pretending that these percentile probabilities derive from a standard cumulative normal distribution and inverting to find the index values.

This is simple in *R*. If the vector *data *contains rainfall infall data, then:

`fit.cdf <- ecdf(data)`

cdfs <- sapply(data,fit.cdf)

SPI <- qnorm(cdfs)

Tha rainfall data are M-month moving averages (current and previous months). A separate index is calculated for each calendar month to remove seasonality. The R code used to compute SPI values (based in NCEP Reanalysis or other data sets such as **GCPC**) is **here**.

[1] The NCEP/NCAR 40-year reanalysis project, Bull. Amer. Meteor. Soc., 77, 437-470, 1996

**Noted Added 11 October 2011:** I have uploaded a slightly improved SPI R script** here**. The function *getPrecOnTimescale(precipitation,k)* takes a vector of monthly precipitation values and returns a k-month average (i.e current month and prior k-1 months). *getSPIfromPrec(precip.k)* takes k-month precipitation values and returns the corresponding vector of SPI values.

## 11 Comments

[…] graph below shows the correlation between VCI and 3-month Standardized Precipitation Index (SPI3) computed for Africa by calendar month. In this analysis GIMMS NDVI was upscaled to 200km to […]

Hi

I tried to run your R code. I used the same data as you did (but only till Dec 2009, so 744 months).

First, I had to make a little correction, as you didn’t define N_timescale. I defined N_timescale <- Nt (I hope so far, I understood it correctly).

The point now: I get the same plot as result, but with the warning message "Warning message: In matrix(unlist(spi.m), Nt, 12) : data length [9804] is not a sub-multiple or multiple of the number of rows [744]".

I guess the mistake lies within the calculation of the spi. Do you know this error message already? If yes, can you possibly help me?

Thanks, Claudia

Claudia,

my apologies. replace N_timescale with 12 to reproduce the plot in the post i.e 12 SPI indices SPI1, SPI2, SPI3 … SPI12 are calculated for each month. This should deal with the warning message.

As Claudia indicated I get the following message after replacing N_timescale by 12. How you got the # 756?

Warning message:

In matrix(unlist(spi.m), 756, 12) :

data length [9216] is not a sub-multiple or multiple of the number of rows [756]

Hi Peter,

I uploaded a better version of the SPI script. It takes a vector of precipitation value e.g. from an individual weather station. You might find it more useful.

How can I get your better version of the SPI script ?

please send e-mail

Hi Joe,

Would it be possible for me to get an update version of the SPI script that you refer to above, please?

Many thanks.

Laura

Hi Laura,

you can find it here http://joewheatley.net/20th-century-droughts/

I’m beginning with R, I’m trying to these command

fit.cdf <- ecdf(data)

However I got the following error

Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/n, method = "constant", :

zero non-NA points

In addition: Warning message:

In xy.coords(x, y) : NAs introduced by coercion

My data archive is formed by a single column of precipitation data, what criteria does my file need to meet to run that command? (values, decimal point, undefined data)

ecdf(x) is a base R function which takes a numeric vector x (see ?ecdf). stackoverflow is a good place for general R queries. regards.

Hi Joe,

I figured out, thanks actually stackoverflow is “the” page

Best regards (and happy new year)