[Previous Article] [Next Article]
Using Screening Multiple Linear Regression
contributed by D. Unger
Climate Prediction Center, NOAA, Camp Springs, Maryland
Screening multiple linear regression (SMLR) is used to predict seasonal temperature and
precipitation amounts for locations over the mainland United States. Predictor data consist of
northern hemisphere 700-mb heights, near global SSTs and station values of mean temperature
and total precipitation amount from the 3-mo period prior to the forecast initial time of June 1,
1997. Forecasts for the mean temperature and total precipitation are made for a series of 13
overlapping 3-mo periods, at one month intervals, beginning with Jul-Aug-Sep 1997 and
extending through Jul-Aug-Sep 1998. Regression relationships were derived from data for the
1956-96 period. Forecasts were produced from single station equations for 59 stations
approximately evenly distributed throughout the U.S.
All predictors and predictands were expressed as standardized anomalies relative to the
developmental data. Precipitation amounts were transformed by taking their square roots prior to
standardization in order to help normalize their distribution. Twenty-five candidate predictors,
selected from gridpoint values in regions of known importance for climate prediction, were
offered for screening in the regression development. A few predictor locations were chosen on
the basis of data examination of the first 20 years of the sample, referred to here as the base
period. Information from the most recent 20 years was never used for selection of candidate
predictors (Unger, 1996a). One additional predictor, carbon-dioxide concentration from Mauna
Loa Observatory, was offered in order to capture long-term trends in the data. This crude trend
variable provides the screening procedure with a convenient predictor with which to identify
stations that have simple trends in their predictand values.
A variation of a retroactive real time (RRT) validation technique was used to estimate forecast
skill (Unger 1996b). To estimate skill by RRT, a forecast equation was derived from the base
period and applied to the next year's data to obtain independent data results. The case was then
added to the developmental sample, a new relationship was derived and applied to the following
year's data. Independent data statistics accumulate on a year by year basis in exactly the same
way as an operational forecast procedure, except retroactively. Forecasts were obtained for the
base period years by application of RRT in reverse. Bi-directional RRT (BRRT) validation
technique provides that each available case contribute to a skill estimate as independent data in a
way similar to cross-validation except with a great reduction in the distortion of results, due to
redundant sampling in cross-validation (Unger, 1996b).
A forward selection screening procedure was used for equation development. The top 5 terms
were selected for each equation. Separate statistics were accumulated for each equation length,
so that results for all the one, two, three, four and five term equations were calculated. The
optimum equation length was then estimated by an objective learning procedure that used the past
performance at each RRT trial to "predict" which equation would perform the best on the next.
Verification statistics from this "best guess" forecast were also kept separately and were used to
obtain the final skill estimate of the forecasts.
The verification is based on the temporal corr-elation coefficient between forecast and
observa-tion on the 40 independent cases at each of the 59 stations. Field significance was
measured by comparison of spatially averaged correlation coefficients from forecasts applied to
actual target years against those applied to 500 randomly shuffled target periods. Field
significance expresses the percentage of cases in which the random forecast series outperformed
the actual forecasts.
The final forecasts are post-processed to obtain an estimate of the likelihood of the above, normal,
or below class being observed, as defined by the terciles of the distribution for each forecast
element and location. A forecast is assigned a class on the basis of the forecast distribution and
skill. An estimate of the increased likelihood of a given class is made to place the forecast in a
format similar to the operational long lead forecasts issued by the CPC (O'Lenic, 1994).
The probability assignments for temperature forecasts are made by integration of the estimated
forecast error distribution against the 1961-90 temperature class limits. An estimate of the skill
for low and high temporal frequencies (obtained from the 10-yr moving average of the forecasts
and the residual of this value from the raw forecast, respectively) is used to estimate the forecast
error distribution (Unger, 1997). The class limit with the highest departure from climatology is
displayed with its anomaly contoured. Because precipitation trends are less pronounced,
precipitation probabilities are estimated on the basis of the empirical probabilities associated with
skill and forecast magnitude as determined from historical forecasts.
The forecasts for JAS 1997 are shown in Figs. 1 and 3 with the corresponding skill estimates for
each station shown in Figs. 2 and 4. Shading indicates areas of at least 3 percent probability
anomaly. Contours within the shaded areas on the forecast maps indicate the probability anomaly
contoured at 5 percent intervals.
The numbers plotted in Figs. 1 and 3 indicate station values of the post-processed regression
forecasts, damped according to the forecast-observation correlation on independent data to
minimize the squared error. Non-zero numbers plotted outside of shaded regions indicate
forecast anomalies of substantial magnitude at stations with some skill, but lower than the skill
threshold to choose a forecast category with confidence.
Regression forecasts for JAS (Fig. 1) show above normal temperatures over the Desert
Southwest, Florida, the Carolinas and Virginia extending northwestward into Michigan. Below
normal temperatures are indicated over southern Georgia and Alabama, the Florida panhandle, in
the upper Great Plains near South Dakota and Maine. Note that the probability anomalies for the
climate divisions in southern Nevada and Florida exceed 20 percent and largely reflect that the
1961-1990 mean temperatures are considerably lower than the means observed at these locations
over the recent decade. Because these are localized anomalies, the possibility exists that they may
be due to station changes within the climate divisions, rather than to large scale synoptic features.
The probability anomalies in surrounding areas may be a more realistic forecast for these regions,
which are between 10 and 15 percent.
Precipitation forecasts for JAS 1997 (Fig. 3) show much lower skill. Below median precip-itation
is forecast for western Montana and northern Minnesota. Above median precipitation is predicted
for Iowa. The field significance of this forecast is only about 11 percent, indicating that the
precipitation signal for the season is very weak.
Figure 5 shows the temperature forecast for OND 1997. Above normal temperatures are forecast
for southern Florida, much of the West Coast and the central Rockies. Some very weak hints of
below normal temperatures over Missouri, South Dakota, and in eastern Nevada are predicted.
REFERENCES
O'Lenic, E., 1994: A new paradigm for production and dissemination of the NWS's long lead-time
seasonal climate outlooks. Proceedings of the Nineteenth Annual Climate Diagnostics Workshop.
College Park, Maryland, November 14-18, 1994, 408-411.
Unger, D. A., 1996a: Long lead climate prediction using screening multiple linear regression.
Proceedings of the Twentieth Annual Climate Diagnostics Workshop. Seattle, Washington,
October 23-27, 1995, 425-428.
Unger, D. A., 1996b: Skill assessment strategies for screening regression predictions based on a
small sample size. Preprints, Thirteenth Conference on Probability and Statistics in the
Atmospheric Sciences. San Francisco, CA., February 21-23, 1996, 260-267.
Unger, D. A., 1997: Conversion of Long Lead Climate Predictions from Continuous to
Probabilistic Form. Proceedings of the Twenty-first Annual Climate Diagnostics and Prediction
Workshop. Huntsville, Alabama October 28-November 1, 1996. (44-47).
Figure 1. A 1-mo lead screening regression-based temperature forecast for JAS 1997. Contours
are estimated probability anomalies of the specified tercile. Shaded areas delineate the area of
sufficient skill to depart from climatology by at least 3 percent. Plotted numbers are station
values of the standardized anomaly.
Figure 2. Distribution of skill for the 1-mo lead regression forecast for JAS 1997 temperatures.
Both the plotted values and the contours are the correlation (x100) between forecast and
observation for the 1956-1996 period.
Figure 3. Same as Fig. 1 except for precipitation forecasts.
Figure 4. Same as Fig. 2 except for precipitation skill.
Figure 5. Same as Fig. 1 except for a 3-mo lead valid for OND 1997.
Figure 6. Same as Fig. 2 except for a 3-mo lead valid for OND 1997.