[Previous Article] [Next Article]


A Canonical Correlation Analysis Model to Predict

South African 1997-98 Summer Rainfall and Temperature

contributed by Willem Landman

South African Weather Bureau, Pretoria, South Africa

The Research Group for Statistical Climate Studies (RGSCS) of the South African Weather Bureau issues long range forecasts for South African rainfall and temperature for 1- and 3-month periods using several approaches. A general circulation model is used for the forthcoming 1 month, while the canonical correlation analysis (CCA) and optimal climate normals (OCN) methods are used for 3-month mean forecasts. CCA is a multivariate regression tool, and OCN is a simple extrapolation of the trends observed over the last K years (where K might be 10, or something similar) for the season in question. Here, forecasts for South Africa climate of 1997-98 using the CCA method are presented . Our CCA system scheme is based on the work of Barnett and Preisendorfer (1987), and is similar to that implemented in the U.S. for ENSO prediction (Barnston and Ropelewski 1992) and U.S. surface climate prediction (Barnston 1994). It includes a cross-validation design to estimate forecast skill largely without skill inflation due to overfitting.

The predictor for the CCA forecasts is the near-global (45N-45S) SST field, spanning four consecutive prior 3-month periods. For forecasts for Jan-Feb-Mar 1998 at a 1-month lead time (skipping 1 month before the forecast period begins), for example, the four predictor periods would be DJF 1996-97, MAM, JJA and SON 1997. By providing multiple predictor periods, evolutionary features such as a warming Indian Ocean or a cooling Pacific Ocean can help determine the forecast. The predictands are rainfall indices of eight homogeneous rainfall regions, based on clustering with Euclidian distances and a minimum variance criterion. Monthly data for 418 stations for the 1961-90 period are used, on which EOF analysis is performed based on standardized monthly data. Eight principal components are selected from the EOF analysis. Weighted averaged of standardized monthly rainfall for all stations within a region were computed, using Thiessen polygons to derive the weights. Predictions are issued only for districts having demonstrable skill for the predictand period in question; otherwise, the climatological distribution is considered as the best forecast.

The skill of the CCA model, determined using a 45-year (1950-51 to 1994-95) research (or training) period, is expressed as a probability associated with a categorical real-time forecast. A categorical system with 3 climatologically equiprobable classes (below, near, and above normal) is used, where the probabilities of the three classes would be 33.3%, 33.3% and 33.3%, respectively, in the absence of any skill. Cross-validation is used to estimate the skill of the forecasts for a given target season and region. The departure from the climatological probability distribution is determined by the cross-validated skill estimate, as well as by the nature and strength of the predictor patterns.

Figure 1 shows the temperature and precipitation forecasts for South Africa for Dec-Jan-Feb 1997-98 and for Feb-Mar-Apr 1998. The forecast category (B=below, N=near, A=above normal) is shown for each predictand region, accompanied by the associated probability. The probability is a function of the expected skill, being the categorical hit rate for the independent forecast tests from 1985 onward. The farther above the chance hit rate of 33.3%, the higher the forecast skill. Regions whose forecast skill has been found to be below a minimally usable threshold are labeled "C". An asterisk is shown for regions having statistically significant skill.

The forecasts call for near normal to above normal temperature for mid-summer (Dec-Feb), and near normal temperature for late summer and early autumn (Feb-Mar-Apr). Where near normal is forecast, there is still a slight tilt in the probabilities toward above normal. Rainfall totals are expected to be broadly lower than normal, due largely to the strong warm ENSO event in progress. In certain regions, however, the outlook is more toward near-normal conditions, such as in the northeast part of the country (especially later in the season) and the southwestern part of the Western Cape.

Further discussion of the empirical relationships between SST predictors and the rainfall in the target regions is found in Landman (1994, 1995). Greater detail about the forecasts presented here is available in the RGSCS Bulletins issued by the South African Weather Bureau (RGSCS, Room 5087, South African Weather Bureau, Private Bag X097, Pretoria 0010). This information can also be accessed on the web site <http://cirrus.sawb.gov.za/www/rgscs/rgscs.htmll>.

Barnett, T.P. and R. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis. Mon. Wea. Rev., 115, 1825-1850.

Barnston, A.G., 1994: Linear Statistical Short-term Climate Predictive Skill in the Northern Hemisphere. J. Climate, 7, 1514-1564.

Barnston, A.G. and C.F. Ropelewski, 1992: Prediction of ENSO episodes using canonical correlation analysis. J. Climate, 5, 1316-1345.

Landman, W.A., 1994: A study of the rainfall variability of the summer rainfall regions of South Africa, as revealed by principal component analysis. IRICP Pilot Project Final Report. Available at RGSCS, Room 5087, South African Weather Bureau, Private Bag X097, Pretoria 0010, South Africa.

Landman, W.A., 1995: Predicting South African seasonal rainfall by means of canonical correlation analysis. Preprints, 6th International Meeting on Statistical Climatology, June 19-23, 1996, Galway, Ireland, 479-481.

Fig. 1. Temperature and precipitation CCA-based forecasts for Dec-Jan-Feb 1997-98 and Feb-Mar-Apr 1998, using as predictors the near-global SST anomalies from late 1996 through late 1997. These forecasts are produced by the South African Weather Bureau. Forecasts are expressed categorically (B, N, A for below, near, or above normal) for each region, indicating the probability of occurrence of each category. 33 is the climatological probability level, and departures from 33 indicate the tilt of the odds for that category.



[Previous Article] [Next Article]