[Previous Article]
[Next Article]
Forecasts of Surface Temperature and Precipitation Anomalies over the U.S.
Using Screening Multiple Linear Regression
contributed by D. Unger
Climate Prediction Center, NOAA, Camp Springs, Maryland
Screening multiple linear regression (SMLR) is used to
predict seasonal temperature and precipitation amounts for locations over
the mainland United States. Predictor data consist of northern hemisphere
700-mb heights, near global SSTs and station values of mean temperature
and total precipitation amount from the 3-mo period prior to the forecast
initial time of June 1, 1996. Forecasts for the mean temperature and total
precipitation are made for a series of 13 overlapping 3-mo periods, at
one month intervals, beginning with Jul-Aug-Sep 1996 and extending through
Jul-Aug-Sep 1997. Regression relationships were derived from data for the
1955-95 period. Forecasts were produced from single station equations for
59 stations approximately evenly distributed throughout the U.S.
All predictors and predictands were expressed as standardized
anomalies relative to the developmental data. Precipitation amounts were
transformed by taking their square roots prior to standardization in order
to help normalize their distribution. 25 candidate predictors, selected
from gridpoint values in regions of known importance for climate prediction,
were offered for screening in the regression development. A few predictor
locations were chosen on the basis of data examination of the first 20
years of the sample, referred to here as the base period. Information from
the most recent 20 years was never used for selection of candidate predictors
(Unger, 1996a).
Initial testing indicated that cross-validation cannot
be used for SMLR (Unger 1996b) so a variation of a retroactive real time
(RRT) validation technique was used. To estimate skill by RRT, a forecast
equation was derived from the base period and applied to the next year=s
data to obtain independent data results. The case was then added to the
developmental sample, a new relationship was derived and applied to the
following year=s data. Independent data statistics accumulate on a year
by year basis in exactly the same way as an operational forecast procedure,
except retroactively. Forecasts were obtained for the base period years
by application of RRT in reverse: deriving from the future years and applying
to the most recent year in the withheld period (now the first half of the
sample). Each earlier case was then included in the development sample,
the relationships re-derived and applied to the next earlier case. This
bi-directional RRT (BRRT) validation technique provides that each available
case contribute to a skill estimate as independent data in a way similar
to cross-validation except with a great reduction in the distortion of
results, due to redundant sampling (Unger, 1996b).
A forward selection screening procedure was used for equation
development. The top 5 terms were selected for each equation. Separate
statistics were accumulated for each equation length, so that results for
all the one, two, three, four and five term equations were calculated.
The optimum equation length was then estimated by an objective learning
procedure that used the past performance at each RRT trial to Apredict@
which equation would perform the best on the next. Verification statistics
from this Abest guess@ forecast were also kept separately and were used
to obtain the final skill estimate of the forecasts.
The verification used was the temporal correlation coefficient
between forecast and observation on the 40 independent cases at each of
the 59 stations. An average correlation coefficient was computed from the
root mean squared correlation coefficient with the signs retained both
in the squaring process and the final square root. Field significance was
measured by comparison of scores from actual target years against scores
determined from 500 randomly shuffled target periods. Field significance
expresses the percentage of time that the random forecast series outperformed
the actual forecasts.
The final forecasts are post-processed to obtain an estimate
of the likelihood of the above, normal, or below class being observed,
as defined by the terciles of the distribution for each forecast element
and location. A forecast is assigned a class on the basis of the forecast
distribution and skill. An estimate of the increased likelihood of a given
class is made to place the forecast in a format similar to the operational
long lead forecasts issued by the CPC (O=Lenic, 1994). Currently, these
probability assignments are obtained from the relationship between probability
of a given class being observed, the inflated SMLR forecast and the predictive
skill (inflation sets the forecasts variance equal to observed variance
at each station). This relationship is based on forecast performance on
independent data. If the correlation skill of the forecast is under approximately
.3, the forecast is not assigned to a class and is regarded as a climatological
forecast.
The forecasts for JAS 1996 are shown in Figs. 1 and 3
with the corresponding skill estimates for each station shown in Figs.
2 and 4. Shading indicates areas of sufficient skill to assign a tercile
category to the forecast. Contours within the shaded areas on the forecast
maps indicate an estimate of a 5 and 10 percent probability anomaly for
the category. Note that the skill estimates are based on the actual forecasts,
and not the post processed category assignments, which are presented only
for clarity of presentation.
The numbers plotted in Figs. 1 and 3 indicate station
values of the regression forecast for the standardized anomaly of temperature
or the square root of precipitation amount. Forecasts are damped according
to the forecast-observation correlation on independent data so that the
squared error between forecast and observation will be minimized. Non-zero
numbers plotted outside of shaded regions generally indicate forecast anomalies
of substantial magnitude at stations with some skill, but lower than the
skill threshold to choose a forecast category with confidence.
Temperature forecasts (Fig. 1) show cold anomalies in
the desert Southwest, and in eastern Georgia. Near normal temperatures
are predicted for the mid-Atlantic states. Warm anomalies are forecast
for southern Texas and a small area near the Great Lakes. The average correlation
for this forecast is .24 with a field significance of .004.
Precipitation forecasts for JAS 1996 (Fig. 3) show considerably
less skill than for temperature forecasts, with an average correlation
of .09 and a field significance of .27. The spatial distribution of skill
(Fig. 4) shows only isolated areas of apparent skill. That fact, taken
together with the low field significance for this season, indicates that
little, if any, usable information is contained in this forecast.
The forecast for fall precipitation (Fig. 5) has more
skill than summertime precipitation forecasts, with an average correlation
of .15 and a field significance of .060. Above median precipitation amounts
are indicated for the Pacific Northwest. The forecast pattern in the Eastern
U.S. is noisy, indicating that the signal is weak. The forecast magnitude
for above median precipitation amount in lower Michigan is quite high,
suggesting a higher confidence in this value than some of the other highlighted
areas in the East. The same is true for the forecast for dry conditions
in northern Vermont. However, an examination of the skill map for this
target season (Fig. 6) shows that stations surrounding these two locations
have little if any skill, suggesting that these forecasts be used with
caution.
References
O'Lenic, E., 1994: A new paradigm for production and dissemination
of the NWS=s long lead-time seasonal climate outlooks. Proceedings of the
Nineteenth Annual Climate Diagnostics Workshop. College Park, Maryland,
November 14-18, 1994, 408-411.
Unger, D. A., 1996a: Long lead climate prediction using
screening multiple linear regression. Proceedings of the Twentieth Annual
Climate Diagnostics Workshop. Seattle, Washington, October 23-27, 1995,
425-428.
Unger, D. A., 1996b: Skill assessment strategies for screening
regression predictions based on a small sample size. Preprints, Thirteenth
Conference on Probability and Statistics in the Atmospheric Sciences. San
Francisco, CA., February 21-23, 1996, 260-267.
Figure 1. A 1-mo lead screening regression-based temperature forecast for JAS 1996. Contours are estimated probability anomalies of the specified tercile. Shaded areas deliniate the area of correlation skill greater than 0.3. Plotted numbers are station values of the standardized anomaly.
Figure 2. Distribution of skill for the 1-mo lead regression forecast for JAS 1996 temperatures. The values shown are for the correlation between forecast and observation for the 1955-1995 period.
Figure 3. Same as Fig. 1 except for precipitation forecasts.
Figure 4. Same as Fig. 2 except for precipitation skill.
Figure 5. Same as Fig. 3 except for a 3-mo lead valid for SON 1996.
Figure 6. Same as Fig. 4 except for a 3-mo lead valid for SON 1996.