Screening multiple linear regression (SMLR) is used to predict seasonal temperature and precipitation amounts for locations over the mainland United States. Predictor data consist of northern hemisphere 700-mb heights, near global SSTs and station values of mean temperature and total precipitation amount from the 3-mo period prior to the forecast initial time of September 1, 1996. Forecasts for the mean temperature and total precipitation are made for a series of 13 overlapping 3-mo periods, at one month intervals, beginning with Oct-Nov-Dec 1996 and extending through Oct-Nov-Dec 1997. Regression relationships were derived from data for the 1955-95 period. Forecasts were produced from single station equations for 59 stations approximately evenly distributed throughout the U.S.
All predictors and predictands were expressed as standardized anomalies
relative to the developmental data. Precipitation amounts were transformed
by taking their square roots prior to standardization in order to help
normalize their distribution. Twenty-five candidate predictors, selected
from gridpoint values in regions of known importance for climate prediction,
were offered for screening in the regression development. A few predictor
locations were chosen on the basis of data examination of the first 20
years of the sample, referred to here as the base period. Information from
the most recent 20 years was never used for selection of candidate predictors
(Unger, 1996a).
Initial testing indicated that cross-validation cannot be used for SMLR
(Unger 1996b) so a variation of a retroactive real time (RRT) validation
technique was used. To estimate skill by RRT, a forecast equation was derived
from the base period and applied to the next year's data to obtain independent
data results. The case was then added to the developmental sample, a new
relationship was derived and applied to the following year's data. Independent
data statistics accumulate on a year by year basis in exactly the same
way as an operational forecast procedure, except retroactively. Forecasts
were obtained for the base period years by application of RRT in reverse:
deriving from the future years and applying to the most recent year in
the withheld period (now the first half of the sample). Each earlier case
was then included in the development sample, the relationships re-derived
and applied to the next earlier case. This bi-directional RRT (BRRT) validation
technique provides that each available case contribute to a skill estimate
as independent data in a way similar to cross-validation except with a
great reduction in the distortion of results, due to redundant sampling
in cross-validation (Unger, 1996b).
A forward selection screening procedure was used for equation development.
The top 5 terms were selected for each equation. Separate statistics were
accumulated for each equation length, so that results for all the one,
two, three, four and five term equations were calculated. The optimum equation
length was then estimated by an objective learning procedure that used
the past performance at each RRT trial to "predict" which equation
would perform the best on the next. Verification statistics from this "best
guess" forecast were also kept separately and were used to obtain
the final skill estimate of the forecasts.
The verification used was the temporal correlation coefficient between
forecast and observation on the 40 independent cases at each of the 59
stations. An average correlation coefficient was computed from the root
mean squared correlation coefficient with the signs retained both in the
squaring process and the final square root. Field significance was measured
by comparison of scores from actual target years against scores determined
from 500 randomly shuffled target periods. Field significance expresses
the percentage of cases in which the random forecast series outperformed
the actual forecasts.
The final forecasts are post-processed to obtain an estimate of the likelihood
of the above, normal, or below class being observed, as defined by the
terciles of the distribution for each forecast element and location. A
forecast is assigned a class on the basis of the forecast distribution
and skill. An estimate of the increased likelihood of a given class is
made to place the forecast in a format similar to the operational long
lead forecasts issued by the CPC (O'Lenic, 1994). Currently, these probability
assignments are obtained from the relationship between probability of a
given class being observed, the inflated SMLR forecast and the predictive
skill (inflation sets the forecasts variance equal to observed variance
at each station). This relationship is based on forecast performance on
independent data. If the correlation skill of the forecast is under approximately
.3, the forecast is not assigned to a class and is regarded as a climatological
forecast.
The forecasts for OND 1996 are shown in Figs. 1 and 3 with the corresponding
skill estimates for each station shown in Figs. 2 and 4. Shading indicates
areas of sufficient skill to assign a tercile category to the forecast.
Contours within the shaded areas on the forecast maps indicate an estimate
of a 5 and 10 percent probability anomaly for the category. Note that the
skill estimates are based on the actual forecasts, and not the post processed
category assignments, which are presented only for clarity of presentation.
The numbers plotted in Figs. 1 and 3 indicate station values of the regression
forecast for the standardized anomaly of temperature or the square root
of precipitation amount. Forecasts are damped according to the forecast-observation
correlation on independent data so that the squared error between forecast
and observation will be minimized. Non-zero numbers plotted outside of
shaded regions generally indicate forecast anomalies of substantial magnitude
at stations with some skill, but lower than the skill threshold to choose
a forecast category with confidence.
Regression forecasts (Fig. 1) show below normal temperatures over much
of the western U.S. and in the northern high plains region. Warmer than
normal temperatures are indicated for much of Texas and along most of the
Gulf coast and the southeastern Atlantic coast. The average correlation
for this forecast is .27 with a field significance of .000.
Precipitation forecasts for OND 1996 (Fig. 3) show considerably less skill
than for temperature forecasts, with an average correlation of .11 and
a field significance of .06. There are very slight indicatations of above
median precipitation for southern California into southern Nevada. Above
median precipitation amounts are also expected for western Washington State.
Below median precipitation amounts are ex-pected for extreme southern Texas
and a small area in Wyoming.
Figure 5 shows the temperature forecast for late winter (JFM 1997) in the
U.S. Southern Arizona and western Texas are expected to be warmer than
normal. There are indications of above normal temperatures in portions
of the southeastern U. S and northward along the Atlantic coast to southern
New Jersey. No areas of cool weather are predicted. The skill estimates
for this forecast are shown in Figure 6. The average correlation for this
forecast is .18 with a field significance of .012.
References
O'Lenic, E., 1994: A new paradigm for production and dissemination
of the NWS's long lead-time seasonal climate outlooks. Proceedings of the
Nineteenth Annual Climate Diagnostics Workshop. College Park, Maryland,
November 14-18, 1994, 408-411.
Unger, D. A., 1996a: Long lead climate prediction using screening multiple
linear regression. Proceedings of the Twentieth Annual Climate Diagnostics
Workshop. Seattle, Washington, October 23-27, 1995, 425-428.
Unger, D. A., 1996b: Skill assessment strategies for screening regression
predictions based on a small sample size. Preprints, Thirteenth Conference
on Probability and Statistics in the Atmospheric Sciences. San Francisco,
CA., February 21-23, 1996, 260-267.
Figures