Understanding the Degree Day "Probability of Exceedance" Forecast
Graphs
Seasonal Format of the forecasts:
Degree day forecasts are presented only for the core heating and cooling times
of the year. Included in these are, for heating: The 5-month period Nov-Dec-Jan-Feb-Mar, and the
three embedded 3-month periods of Nov-Dec-Jan, Dec-Jan-Feb and Jan-Feb-Mar; and
for cooling: the 5-month period May-Jun-Jul-Aug-Sep, and the three embedded 3-month periods of May-Jun-Jul, Jun-Jul-Aug and Jul-Aug-Sep. The heating season degree day
forecasts are posted beginning the prior summer, and the
cooling season degree day forecasts are posted beginning the prior winter.
All degree day forecasts are based on the temperature
forecasts for the same target period, with the added factor of the
correspondence between temperature and degree day totals. That correspondence
is determined using 67 years of data, covering the 1931-1997 period.
During times of
the year when the mean temperature is far from 65 degrees F (e.g. in much of
the northern part of the U.S. in winter, and near the Gulf of Mexico and
southeastern states in summer), there is a one-to-one correspondence between
the degree day forecasts and the temperature forecasts. In locations and
times of the year when daily mean temperature may be on either side
of 65 degrees F, the correspondence between the
temperature forecasts and degree day forecasts is nonlinear, and this
correspondence is accounted for in the degree day forecasts presented here.
The correspondence between seasonal mean temperature and degree days is
derived empirically. A set of 5 to 7 of points of correspondence is developed from
the means of groups of approximately 10 cases apiece, where each group represents
one portion of the temperature distribution (e.g. the top 10%, the second decile, etc.) spanning from one tail of the distribution through the middle to the
other tail. The poorly sampled end points of the distribution are approximated
as a linear extrapolation of the results somewhat less out on the tails.
Visual inspection of the results indicates that the resulting fit is good, and,
in most cases, has little or no error of the fit to the raw data.
This text description for the degree day graphs is similar to that given for
the temperature and precipitation probability of exceedance graphs, but is
somewhat more abbreviated. The reader is encouraged to consult the
temperature and precipitation description if further explanation is needed.
Purpose of the graphs:
The "probability of exceedance" curves give the forecast probability that a degree day total, shown on the horizontal axis, will be exceeded at the location in
question,
for the given season at the given lead time. The information on these graphs is
consistent with the information given in the maps of temperature probability anomaly--a part of the multi-season climate outlook that has been issued since 1995.
Those forecast maps show the probability anomaly of the most favored tercile of
the climatological temperature distribution: below normal,
near normal, or above normal. The graphs shown here provide additional detail about the degree day forecast probability distribution at an individual location i.e., any one of 102
climate regions in the mainland U.S. The additional information comes about through the display of the entire probability distribution, as opposed
to just the probability anomaly of the most favored tercile. The probability of
exceedance of the temperature itself is available on another branch of this CPC
web site, < http://www.cpc.ncep.noaa.gov/pacdir/NFORdir/jo1.html >.
Although skill in temperature and degree day forecasting is in most cases modest in absolute terms, there is nonetheless justification to issue a complete forecast
probability distribution. Showing the distribution is our attempt to accurately
convey the sense of the forecast while also showing the degree of uncertainty (which is often high)
contained in that forecast.
What the curves mean:
Each graph contains four curves. One of the four is actually a set of two curves.
The first curve, shown in black, shows the
"normal",
or climatological probability distribution.
Sometimes no black line appears in the graph. When that is the case, the black line is hidden
underneath the thick red line, which shows the forecast distribution (to be explained below). The climatological distribution is derived by computing the average, and also computing a
measure of the amount of year-to-year variation around that average. The curve is therefore called a "fitted" curve, because it is defined using a formula that
makes it possible to construct
a smooth curve to the data. The data may not be so smooth and regular, but the formula only uses the average and the typical deviations from that average to define the curve. The center
(such as the average [mean] or the median) of this distribution is based on the
historical record of observations at the given location and season during the period used as the normal base
period. For example, in 1999 the normal base period is 1961-90. After reaching the year 2001 or 2002, the base period will be updated to 1971-2000. The value of the center of the
distribution, or normal, is printed numerically near the top of the graph. The variability, or range, of the climatological distribution is based on a period longer than the normal base period.
This is done in order to obtain a more accurate estimate of the variability. Getting an acceptably accurate estimate of the variability requires more cases than
getting an acceptably accurate
estimate of the center of the distribution. The additional years used in getting
the variability estimate occur immediately prior to the base period. For temperature and degree days, 10 extra years are used.
Gradual trends within the 40 years are not allowed to contribute to the variability, however; this is accomplished by using a sliding 30-year period, within the
40-year period, for the means about which the variations are defined.
For temperature, the value of the center of the distribution represents the mean, or average, and is the temperature at which the curve crosses
the 50% line from the vertical axis. For degree days this is the case also,
but only when excursions of the daily mean on the "wrong" side of the 65 degree
F threshold are rare or nonexistent. The "wrong" side implies above 65 during
the heating season, and below 65 during the cooling season.
This occurs in locations that are warm in the winter, such as southern Florida,
or that are not hot in the summer, such as the high country of Nevada, Montana or Idaho.
When such excursions occur,
the degree day distribution is no longer a linear, or symmetric, function of the temperature, but rather becomes a skewed function of the temperature. In that case,
the value at which the curve crosses the 50% line (the median) is considered the
normal. That is,
the average is not used to represent the normal for degree days,
because more cases tend to be on one side of the average than on the other--a feature of a skewed, or asymmetric, distribution.
The second curve, shown in yellow, labeled
"observed data",
is a probability of exceedance curve derived from the observed data without any
model fitting. It steps down by 3.33% every time one of the 30 observations in the normal base period
no longer exceeds the value shown on the x-axis.
This curve is displayed so that the user may observe how good a fit the smoothed
climatological curve is to the actual degree day data. The fitted curves are based on a Gaussian
distribution for temperature, and on an empirical temperature-versus-degree day
correspondence for degree day data, using data from the 1931-97 period of record.
The curve based directly on the data is expected to be somewhat rough and irregular, with gaps in some places
and clustering in others. This irregularity is caused by the lack of a very long
sampling period--i.e., 30 years rather than several hundred years of observations. If the same number of
observations were sampled from an earlier period and the underlying climate were
identical, the places having gaps and clusters would be expected to change. When the irregularities are
changeable from one sample to another and have equal chances of appearing in various places in the distribution, the smooth fitted climatological curve is thought to estimate the true
population distribution better than the curve formed from any single sampling of
the data. However, in some cases there may be a physical reason for deviations
from a smooth
distribution. In that case, sampling 500 years of data would not eliminate these
features of the curve. However, these features would be expected to appear somewhat more smoothly (less
"noisy" or jumpy) than features caused purely by sampling variations. For example, a tendency for a plateau of shallow slope might appear near the middle of the
"probability of
exceedance" distribution, where the steepest slope is usually found, or a steep
slope might be found off the center of the distribution. At CPC we believe that
in most cases the fitted curve
is a better representation of nature than the raw data curve. That is, most of the irregularities in the raw data curve occur by chance alone, and would not appear if it were possible to
sample a much larger set of cases.
The third curve, shown in red, labeled "final forecast",
represents the probability distribution of the final official CPC forecast. The
downward slope of the final forecast curve may
be slightly steeper than the slope of the climatological curve, in proportion to
the confidence associated with the final forecast. (Several aspects of the confidence are indicated in each
graph; these are described below.) This is because when a forecast is thought to
be relatively skillful (as, for example, when there is a strong ENSO event in progress and the location is
one in which an ENSO impact is anticipated), the range of possibilities is smaller than if no useful forecast knowledge were in hand. This represents a decrease
in the uncertainty, which
shows up as a narrower range of degree day values within which the probability of exceedance changes by a given amount. In some cases there is a shift of the forecast
curve relative to the normal curve, but without a steeper slope in the forecast
curve. This would indicate some confidence in the shift away from the normal, but without a decrease in the
range of possibilities in the shifted climate. This might occur when a trend, or
climate change relevant to the present decade as a whole, is believed to be occurring.
A fourth pair of curves, shown by thin red lines, represents an "error envelope". It is drawn on either side of the main final forecast curve, paralleling that
curve. These lines illustrate our
estimate of the amount of possible error associated with the forecast curve. The
forecast curve itself already conveys uncertainty about the forecast; this is why it is shown in a
probabilistic framework and usually has a downward slope that is not much steeper than the slope of the fitted climatological curve. In addition to this inherent uncertainty, there is also
some uncertainty related to other aspects of the forecast. Examples of these additional error sources are (1) errors in the most recent observed data used to determine the forecasts, (2)
errors in the forecasters' perception, judgement and understanding of the current climate state, and (3) imperfections in the fit of the climatological and forecast d istributions to the actual data (as
revealed by the differences between the yellow curve and the black curve).
All of these factors could result in some error in the positioning of the forecast curve as a whole. While an accurate evaluation
of the size of this error is not possible, an approximation is provided by the error envelope. The approximation is based on the expected sampling variability of the climatological probability
of exceedance using 45 years of data.
The resulting envelope is thought to be nonconservative--i.e. the size of the
error in the position of the forecast curve is more likely to be over-represented than
under-represented.
How to read a "probability of exceedance" curve:
As an example, suppose we first examine the "normal" curve in any one of the graphs. Like the other two curves, the normal curve begins near the 100% level in the upper left portion
of the graph. This indicates that the probability that the degree day total will
exceed the amount shown by the number given at the extreme left of the horizontal axis (near the
lower left corner of the graph) is close to 100%. This makes sense, because the
amount has been chosen to be far below the expected normal at the given location
and season--an amount that may never have been observed during the normal base period. This low
value is chosen because it is unlikely, but possible. As the value is increased
from the left toward the right side of the graph, the probability of it being exceeded begins to
decrease, decreasing most rapidly near the middle of the climatological distribution, shown near the middle portion of the graph. The boundaries between the climatologically lowest and
middle tercile, and between the middle and highest tercile, are indicated by vertical lines that intersect the normal curve where the probability of exceedance
is 66.7% and 33.3%,
respectively. Vertical lines indicating the median, or 50% probability of exceedance, and the 10% and 90% probabilities of exeedance, are also shown. In the right portion of the graph, the
"probability of exceedance" line continues to decline and approaches 0% as the horizontal axis values become so large as to be very unlikely to be exceeded. Because the curve
continues to decrease from near 100% to near 0% as the degree day total on the horizontal axis increases, the probability that the degree day total will be
between any two values on the graph can be determined by subtracting the lower probability of exceedance value from the higher probability of exceedance value.
In the case of the
"normal" curve, this probability is with respect to the normally expected climatology for the location and season. When the probability is determined with respect to the "final forecast"
curve, it is a statement of CPC's forecast probability. It is useful to compare
this probability with that for the normal climatology to appreciate the difference attributable to the
current climate state and climate outlook. Because of the modest level of skill
in many cases, this difference may often be minor, and, in the case of the "CL"
(climatological probability)
temperature forecast, there is no difference for degree days at all.
Automatic Probability Evaluation:
In subtracting two "probability of exceedance" values in order to evaluate the probability of occurrence between a lower and upper limit, it is often difficult
to obtain an accurate visual
estimate of the probability of exceedance from the graph. For this purpose, the
process will be automated for users' convenience in the future. The option to use this
utility will be provided on each graph. The user will only need to select a region, a lead time, and
the lower and upper limits of degree days within which a probability is to be evaluated. The answer will be computed with respect to both the climatological (normal) and the "final forecast" probability of
exceedance curves.
Caution required for the tails of the curves:
Each of the curves is constructed on the basis of historical observations, and/or the nature and strength of the impacts of the estimated current and future climate state. Near the middle of
the distribution there has been plentiful data sampled, because the middle of the distribution is most likely and most frequently observed. On the other hand, in the tails, or extremes, of the
distribution, there have only been a few cases. Sometimes there may have been no
cases in a large portion of a tail, and then just a single observation far out
in the extreme part of that tail.
Whatever the exact configuration of the observations, the tails are less certain
than the middle and the shoulders of the distribution. Therefore, conclusions based on the extreme tails of
the distribution are particularly dangerous, and should be made with caution.
A warning about the upper and lower
7% tails of the curves is posted on each graph. The middle 86% of the probability distribution, ranging from the 93% to the 7% probability of exceedance values,
is considered to be
reasonably well sampled, in contrast with the outer 7% tails. [Technical note: These cutoffs span the ±1.5 standard deviation interval with respect to the mean for a Gaussian distribution,
within which usable numbers of observations are thought to have been sampled.]
Numerical information printed on the graphs:
Above and to the right of the probability of exceedance curves, selected summary
information is printed.
The block of text at the upper left shows the point forecast,
or best guess of the numerical degree day forecast. While the point forecast is
expressed as an exact number, the certainty that the result
will turn out to be this number is extremely low.
The point forecast is only given to represent the middle, or center, of the forecast distribution, and does not
imply that we can make an accurate numerical forecast.
Underneath the point forecast, the anomaly forecast,
or the departure of the point forecast from the normal, is shown. Beneath the anomaly forecast is the normal (or center)
, based on the observations during the normal base period as discussed above. The normal is represented by the degree day total that corresponds to the mean temperature.
To the right of the above set of numbers, the percentile (%ile) shows the percentage of cases in the fitted
climatological distribution that would be expected to be lower than or equal to
the point forecast. For example, if the %ile is 60.0, the point forecast ranks at or above 60.0 percent of the
climatological distribution, and less than 40.0 percent of it. It is a forecast
for a somewhat, but not highly, above normal degree day total. Because of the fairly large degree of
uncertainty in the outcome of our forecasts, it is not common for our point forecasts to rank in the top or bottom 20 percent of the climatological distribution. Beneath the %ile of the
forecast, the forecast percentage of the normal degree day total is shown.
Next is given the linear correlation between the mean temperature and the
degree day total for this location and season. When the mean temperature is
far enough away from 65 degrees F (e.g. Minneapolis in winter, or Miami
in summer), there is a -1.00 correlation between temperature and heating
degree days and a +1.00 correlation between temperature and cooling degree
days. This implies that the degree day forecast is completely determined
by the temperature forecast. When daily temperatures within the season
may be on the opposite side of 65 degrees than it normally is, the linear
correlation with temperature is degraded and degree day totals vary more
slowly and less predictably as a function of seasonal mean temperature than would otherwise be the case. The degree to which the degree day versus temperature
relationship departs from linearity is shown by the "Cor with T" statistics.
The first correlation is that of all 67 years in the 1931-97 period. The
second correlation ("mid20%") is the correlation using only the years whose
temperature was in the middle quintile of the distribution (the
40 to 60 percentile range). In restricting the
range of the temperature in this way, deviations of the correlation from
plus or minus 1.00 show up more easily (although the end of the distribution
whose temperature is closest to, or crosses, 65 degrees
generally has lower correlation than the other end), giving the user a more
sensitive indicator of lack of linearity in the heart of the distribution.
In addition to the standard versions of the point forecast, the anomaly,
and the percentage of normal, a weighted version of each of these is also
given, shown in parentheses and indicated with "w:". While the standard
version is based solely on the point forecast, the weighted version
takes the entire degree day distribution into account, incorporating the
possible nonlinear portion of the correspondence with temperature. It
integrates the degree day forecast across the its range of associated
temperatures, weighting by the probability. When the degree day versus
temperature relationship is completely linear (i.e. when daily temperatures
never cross 65 degrees), the standard and weighted degree day forecasts
should be identical. When there is some nonlinearity (e.g. southern Florida
heating degree days in Nov-Dec-Jan), then the weighted results come out
more conservative (less anomalous) than the standard results. When the two
versions differ, the version of choice depends upon the users' preference
and philosophy.
On the upper right side of the graph, 50% and 90% confidence intervals are given for the forecast. The 50% confidence interval gives lower and upper degree day totals. The lower amount
corresponds to the 25%ile (75% probability of exceedance) of the forecast distribution, and the upper amount is the amount that corresponds to the 75%ile (25% probability of exceedance)
of the forecast distribution. The values falling between these two limits form an interval that CPC believes has a 50 percent chance of occurring. The 90% confidence interval covers a
wider range of values, ranging from the 5%ile (95% probability of exceedance) to
the 95%ile (5% probability of exceedance) of the forecast. The ranges of amounts covered by the 50%
and 90% confidence intervals give an idea of the expected error associated with
the point forecast.
When the degree day versus temperature correspondence is nonlinear (skewed), the
distance upward to the top of the confidence interval is not equal to
the distance downward to the lower boundary of the confidence interval.
The ranges covered by the 50% and 90% confidence intervals are usually fairly wide, in keeping with the uncertainty associated with the point forecast. Note that there
is some uncertainty for the confidence intervals themselves, just as there is uncertainty for the probability of exceedance curves themselves (as indicated by the "error envelope").
Directly below the forecast confidence intervals, three types of measures of
confidence in the forecast
are posted,
both numerically and verbally. These are estimates of three aspects of
the skill expected in the case of the particular forecast. The first, the confidence in shift direction, is a confidence that the degree day total will deviate
from the normal in the direction indicated,
without regard to the size of the deviation. The second, the confidence in the point forecast (contraction of the forecast distribution), is confidence that the
degree day total will be close to the
point forecast. The third, the integrated confidence, is confidence that the probability distribution as a whole will be different from the climatological probability distribution. The verbal
descriptions that accompany these confidence levels are, in ascending order: none, low, fair,
moderate and high. Plus and minus signs appear for cases close to the categorical boundaries. Each of the
three aspects of confidence is further described next, enabling the user to decide which confidence (or confidences) is most applicable to their needs.
Confidence in shift direction:
This is a measure of how confident we are that the degree day total will deviate
from the normal in the direction specified, whether below or above the normal.
The measure, more specifically, is
the ratio of the estimated probability that the climate will deviate in the forecast direction to the estimated probability that it will deviate in the opposite
direction.
For example, if the confidence in the shift direction is 2.00, it indicates that
we believe there is twice the probability of a deviation in that direction than in the opposite direction. If
the forecast direction is below the normal, a 2.00 confidence would mean that the probability of below normal
conditions is 66.7% and the probability of above normal conditions is 33.3%. If
there is no confidence whatsoever regarding which side of the normal will occur,
the ratio is exactly 1. Note
that the ratio is not that of the probability of the more favored outer tercile
to the other, but rather a ratio of the forecast probabilities of occurrence of
one half of the climatological
distribution to the other half. The dividing line between the two halves is the
numerical value of the normal (or center of the distribution) that is posted in
the upper left corner of the graph.
When the "climatological probabilities" (CL) forecast is issued, the shift direction confidence is at its minimum of 1. It may also be 1, however, for a non-"climatological probabilities"
forecast--when there is confidence in other aspects of the forecast. One example
would be when the likelihood of the near normal category is higher than would be expected
climatologically--and the chance of above normal and below normal are both reduced from the climatological chance. This could occur, for example, in a season
and at a location having
fairly high sensitivity to the state of the ENSO, in a case when the ENSO condition is expected to be very close to normal (i.e. neither an El Niño nor a La Niña tendency is expected). In
such a case, although the chances of large deviations from normal are reduced, the direction of the shift from normal is just as uncertain as it would be without any knowledge of the ENSO
condition. Forecasts for directional shifts may be considered somewhat useful when this confidence measure exceeds 1.5, and more clearly useful when it exceeds
1.8 or even 2 (which is
uncommon). It should also be noted that a high confidence in the shift direction
usually, but not always, means that the size of the shift is expected to be large. The amount of the expected
shift can be seen in the anomaly of the point forecast. In cases with high confidence in the point forecast (another type of confidence, described below), the confidence in the shift direction
may be high even though the predicted size of the shift is only moderate. This i
s possible because the shift direction refers to any amount of shift in the indicated direction, regardless of size.
Confidence in the point forecast (contraction of forecast distribution):
This is a measure of how narrow, or limited, the distribution of possibilities about the point forecast value is believed
to be, compared with the distribution of the historical observations about the normal value. Given our current state-of-the-art in climate prediction, confidence in the point forecast is
often small. When the forecast distribution has the same, or nearly the same, width as the climatological distribution, this indicates a relative absence of forecast knowledge that would
limit the range of possibilities. The measure, more specifically, is the fraction of the width [standard deviation] of the forecast distribution to the width of the climatological distribution. When the forecast
distribution is no narrower than the observed climatological distribution, the confidence is 1. Confidence values of less than 0.9 are considered somewhat helpful, and below 0.8, while rare,
are still more helpful. This confidence measure is increased in locations and seasons when climate conditions are known to be related to governing forces (such
as ENSO), and the status
of these forces is able to be somewhat correctly anticipated for the period being forecast. An example of this would be the degree day total during the winter in Minnesota and other regions in the
northern Plains of the U.S., which is partly determined by the ENSO state, given
that the ENSO state itself is somewhat predictable for forecasts made after the
preceding summer. In such a forecast,
the possibilities for Minnesota degree days are somewhat more limited than they
would be with no knowledge of the influence of ENSO or no knowledge of what the
ENSO state would likely
be during the future period being forecast. This particular confidence measure is not related to the amount of shift of the point forecast from the normal; rather, only the width of the
probability distribution about its own central value (the point forecast) is relevant here. Therefore, forecasts that are close to the normal still may rate relatively high on this confidence
measure.
Likewise, in some cases there may be a noticeable shift of the point forecast from the normal, but little or no narrowing of the distribution. This could occur,
for example, when
there is a gradual, long-term trend that is used in determining the forecast, but when there is little or no information about differences between the climate this year and the last few years of
the same season. In that case, all recent years would be affected by the general
trend approximately equally, but their large differences from one another related to factors besides the trend
are poorly forecast.
[Technical note: This confidence measure is the standard error of estimate in a linear
regression model. For example, when it is 0.866, the expected skill of the
forecast is describable with a linear correlation coefficient of 0.5.]
Integrated confidence: an integrated distributional difference from climatology.
This is a measure of the estimated totality of all differences between the forecast distribution and the
climatological distribution. It includes both distributional shifts and narrowing (i.e. confidence in the point forecast over and above the confidence that would be associated with a
climatological forecast), as in the context of the two confidence parameters described above. It would also include distributional deviations of other
types that may prove to be
possible to predict in the future, such as a widening of the distribution (e.g.,
as related to an expectation of greater than normal intraseasonal variation), or asymmetric or irregular features
of the distribution as may be related to specific climate conditions in certain
geographical locations (e.g. involving terrain, or land vs. water). This measure, specifically, is estimated as the
total of the differences in probabilities of exceedance between the climatological distribution and the forecast distribution over the 9 points on the climatological distribution
corresponding to its 0.90, 0.80, 0.70, ....., 0.20, and 0.10 probability of exceedance values. This sum of the differences is then scaled with respect to the result which would be
attained when the forecast distribution is completely separated from the climatological distribution. In the case of complete separation, the climatological probability of exceedance remains
at 1 (or 100%), or at 0 (or 0%), while the forecast distribution moves through all of its intermediate values. Complete separation, which is currently unattainable given today's
state-of-the-art in climate prediction, would produce a integrated confidence score of 1, while a total absence of separation (as in the case of the "climatological forecast") would produce a
score of 0. Integrated confidence values of 0.2 are considered moderately useful
by today's standards, and values of 0.3 are clearly useful. In examining the integrated
confidence values that accompany the graphs, it becomes clear that distributional shifts tend to account for the majority of the integrated confidence value, while distribution narrowing
contributes to a lesser degree. This characteristic implies that occurrences of
strong climate forcing conditions, whether related to ENSO, strong decadal trends in progress, or other
factors, represent "forecasts of opportunity", and that forecast skill (and utility) are not constant from year to year for a given location, season and lead time. Of the three confidence
measures discussed here, the only one that remains nearly constant from year to
year is the confidence in the point forecast, showing the narrowness of the forecast distribution relative
to the climatological distribution.
Fortunately, our current lack of strong point forecast confidence does not prevent us from having fairly
high shift direction confidence under certain circumstances.
The middle block of text
on the right side of the graph provides
estimated probabilities, for the final forecast, of selected categorical outcomes with respect to the climatological degree day distribution.
Included are the probability of the highest 10% of the climatological distribution, the highest third (called "above normal" in the traditional maps of temperature forecast probability anomaly), the middle third ("near normal"), the lowest
third ("below normal") and the lowest 10% of the climatological distribution. The probabilities are indicated on the right side of each line of text, and are given to the nearest tenth of
one percent. The boundaries of the degree day totals that define these categories, are shown in parentheses in each text line.
When the "CL" (climatological probability) is given on the map of temperature forecast probability anomaly, the probabilities shown in this block of
text will show just that: 10.0%,
33.3%, 33.3%, 33.3%, and 10.0%.