[Previous Article] [Next Article]
A Canonical Correlation Analysis Model to Predict
South African 1997-98 Summer Rainfall and Temperature
contributed by Willem Landman
South African Weather Bureau, Pretoria, South Africa
The Research Group for Statistical Climate Studies (RGSCS) of the South African Weather
Bureau issues long range forecasts for South African rainfall and temperature for 1- and 3-month
periods using several approaches. A general circulation model is used for the forthcoming 1
month, while the canonical correlation analysis (CCA) and optimal climate normals (OCN)
methods are used for 3-month mean forecasts. CCA is a multivariate regression tool, and OCN is
a simple extrapolation of the trends observed over the last K years (where K might be 10, or
something similar) for the season in question. Here, forecasts for South Africa climate of 1997-98
using the CCA method are presented . Our CCA system scheme is based on the work of Barnett
and Preisendorfer (1987), and is similar to that implemented in the U.S. for ENSO prediction
(Barnston and Ropelewski 1992) and U.S. surface climate prediction (Barnston 1994). It includes
a cross-validation design to estimate forecast skill largely without skill inflation due to overfitting.
The predictor for the CCA forecasts is the near-global (45N-45S) SST field, spanning four
consecutive prior 3-month periods. For forecasts for Jan-Feb-Mar 1998 at a 1-month lead time
(skipping 1 month before the forecast period begins), for example, the four predictor periods
would be DJF 1996-97, MAM, JJA and SON 1997. By providing multiple predictor periods,
evolutionary features such as a warming Indian Ocean or a cooling Pacific Ocean can help
determine the forecast. The predictands are rainfall indices of eight homogeneous rainfall regions,
based on clustering with Euclidian distances and a minimum variance criterion. Monthly data for
418 stations for the 1961-90 period are used, on which EOF analysis is performed based on
standardized monthly data. Eight principal components are selected from the EOF analysis.
Weighted averaged of standardized monthly rainfall for all stations within a region were
computed, using Thiessen polygons to derive the weights. Predictions are issued only for districts
having demonstrable skill for the predictand period in question; otherwise, the climatological
distribution is considered as the best forecast.
The skill of the CCA model, determined using a 45-year (1950-51 to 1994-95) research (or
training) period, is expressed as a probability associated with a categorical real-time forecast. A
categorical system with 3 climatologically equiprobable classes (below, near, and above normal) is
used, where the probabilities of the three classes would be 33.3%, 33.3% and 33.3%,
respectively, in the absence of any skill. Cross-validation is used to estimate the skill of the
forecasts for a given target season and region. The departure from the climatological probability
distribution is determined by the cross-validated skill estimate, as well as by the nature and
strength of the predictor patterns.
Figure 1 shows the temperature and precipitation forecasts for South Africa for Dec-Jan-Feb
1997-98 and for Feb-Mar-Apr 1998. The forecast category (B=below, N=near, A=above normal)
is shown for each predictand region, accompanied by the associated probability. The probability is
a function of the expected skill, being the categorical hit rate for the independent forecast tests
from 1985 onward. The farther above the chance hit rate of 33.3%, the higher the forecast skill.
Regions whose forecast skill has been found to be below a minimally usable threshold are labeled
"C". An asterisk is shown for regions having statistically significant skill.
The forecasts call for near normal to above normal temperature for mid-summer (Dec-Feb), and
near normal temperature for late summer and early autumn (Feb-Mar-Apr). Where near normal is
forecast, there is still a slight tilt in the probabilities toward above normal. Rainfall totals are
expected to be broadly lower than normal, due largely to the strong warm ENSO event in
progress. In certain regions, however, the outlook is more toward near-normal conditions, such as
in the northeast part of the country (especially later in the season) and the southwestern part of
the Western Cape.
Further discussion of the empirical relationships between SST predictors and the rainfall in the
target regions is found in Landman (1994, 1995). Greater detail about the forecasts presented
here is available in the RGSCS Bulletins issued by the South African Weather Bureau (RGSCS,
Room 5087, South African Weather Bureau, Private Bag X097, Pretoria 0010). This information
can also be accessed on the web site <http://cirrus.sawb.gov.za/www/rgscs/rgscs.htmll>.
Barnett, T.P. and R. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast
skill for United States surface air temperatures determined by canonical correlation analysis.
Mon. Wea. Rev., 115, 1825-1850.
Barnston, A.G., 1994: Linear Statistical Short-term Climate Predictive Skill in the Northern
Hemisphere. J. Climate, 7, 1514-1564.
Barnston, A.G. and C.F. Ropelewski, 1992: Prediction of ENSO episodes using canonical
correlation analysis. J. Climate, 5, 1316-1345.
Landman, W.A., 1994: A study of the rainfall variability of the summer rainfall regions of South
Africa, as revealed by principal component analysis. IRICP Pilot Project Final Report. Available
at RGSCS, Room 5087, South African Weather Bureau, Private Bag X097, Pretoria 0010, South
Africa.
Landman, W.A., 1995: Predicting South African seasonal rainfall by means of canonical
correlation analysis. Preprints, 6th International Meeting on Statistical Climatology, June 19-23,
1996, Galway, Ireland, 479-481.
Fig. 1. Temperature and precipitation CCA-based forecasts for Dec-Jan-Feb 1997-98 and
Feb-Mar-Apr 1998, using as predictors the near-global SST anomalies from late 1996 through
late 1997. These forecasts are produced by the South African Weather Bureau. Forecasts are
expressed categorically (B, N, A for below, near, or above normal) for each region, indicating the
probability of occurrence of each category. 33 is the climatological probability level, and
departures from 33 indicate the tilt of the odds for that category.