CPC "Probability of Exceedance" Temperature Forecast

Understanding the Degree Day "Probability of Exceedance" Forecast Graphs

Seasonal Format of the forecasts:

Degree day forecasts are presented only for the core heating and cooling times of the year. Included in these are, for heating: The 5-month period Nov-Dec-Jan-Feb-Mar, and the three embedded 3-month periods of Nov-Dec-Jan, Dec-Jan-Feb and Jan-Feb-Mar; and for cooling: the 5-month period May-Jun-Jul-Aug-Sep, and the three embedded 3-month periods of May-Jun-Jul, Jun-Jul-Aug and Jul-Aug-Sep. The heating season degree day forecasts are posted beginning the prior summer, and the cooling season degree day forecasts are posted beginning the prior winter. All degree day forecasts are based on the temperature forecasts for the same target period, with the added factor of the correspondence between temperature and degree day totals. That correspondence is determined using 67 years of data, covering the 1931-1997 period. During times of the year when the mean temperature is far from 65 degrees F (e.g. in much of the northern part of the U.S. in winter, and near the Gulf of Mexico and southeastern states in summer), there is a one-to-one correspondence between the degree day forecasts and the temperature forecasts. In locations and times of the year when daily mean temperature may be on either side of 65 degrees F, the correspondence between the temperature forecasts and degree day forecasts is nonlinear, and this correspondence is accounted for in the degree day forecasts presented here.

The correspondence between seasonal mean temperature and degree days is derived empirically. A set of 5 to 7 of points of correspondence is developed from the means of groups of approximately 10 cases apiece, where each group represents one portion of the temperature distribution (e.g. the top 10%, the second decile, etc.) spanning from one tail of the distribution through the middle to the other tail. The poorly sampled end points of the distribution are approximated as a linear extrapolation of the results somewhat less out on the tails. Visual inspection of the results indicates that the resulting fit is good, and, in most cases, has little or no error of the fit to the raw data.

This text description for the degree day graphs is similar to that given for the temperature and precipitation probability of exceedance graphs, but is somewhat more abbreviated. The reader is encouraged to consult the temperature and precipitation description if further explanation is needed.

Purpose of the graphs:

The "probability of exceedance" curves give the forecast probability that a degree day total, shown on the horizontal axis, will be exceeded at the location in question, for the given season at the given lead time. The information on these graphs is consistent with the information given in the maps of temperature probability anomaly--a part of the multi-season climate outlook that has been issued since 1995. Those forecast maps show the probability anomaly of the most favored tercile of the climatological temperature distribution: below normal, near normal, or above normal. The graphs shown here provide additional detail about the degree day forecast probability distribution at an individual location i.e., any one of 102 climate regions in the mainland U.S. The additional information comes about through the display of the entire probability distribution, as opposed to just the probability anomaly of the most favored tercile. The probability of exceedance of the temperature itself is available on another branch of this CPC web site, < http://www.cpc.ncep.noaa.gov/pacdir/NFORdir/jo1.html >. Although skill in temperature and degree day forecasting is in most cases modest in absolute terms, there is nonetheless justification to issue a complete forecast probability distribution. Showing the distribution is our attempt to accurately convey the sense of the forecast while also showing the degree of uncertainty (which is often high) contained in that forecast.

What the curves mean:

Each graph contains four curves. One of the four is actually a set of two curves.

The first curve, shown in black, shows the "normal", or climatological probability distribution. Sometimes no black line appears in the graph. When that is the case, the black line is hidden underneath the thick red line, which shows the forecast distribution (to be explained below). The climatological distribution is derived by computing the average, and also computing a measure of the amount of year-to-year variation around that average. The curve is therefore called a "fitted" curve, because it is defined using a formula that makes it possible to construct a smooth curve to the data. The data may not be so smooth and regular, but the formula only uses the average and the typical deviations from that average to define the curve. The center (such as the average [mean] or the median) of this distribution is based on the historical record of observations at the given location and season during the period used as the normal base period. For example, in 1999 the normal base period is 1961-90. After reaching the year 2001 or 2002, the base period will be updated to 1971-2000. The value of the center of the distribution, or normal, is printed numerically near the top of the graph. The variability, or range, of the climatological distribution is based on a period longer than the normal base period. This is done in order to obtain a more accurate estimate of the variability. Getting an acceptably accurate estimate of the variability requires more cases than getting an acceptably accurate estimate of the center of the distribution. The additional years used in getting the variability estimate occur immediately prior to the base period. For temperature and degree days, 10 extra years are used. Gradual trends within the 40 years are not allowed to contribute to the variability, however; this is accomplished by using a sliding 30-year period, within the 40-year period, for the means about which the variations are defined. For temperature, the value of the center of the distribution represents the mean, or average, and is the temperature at which the curve crosses the 50% line from the vertical axis. For degree days this is the case also, but only when excursions of the daily mean on the "wrong" side of the 65 degree F threshold are rare or nonexistent. The "wrong" side implies above 65 during the heating season, and below 65 during the cooling season. This occurs in locations that are warm in the winter, such as southern Florida, or that are not hot in the summer, such as the high country of Nevada, Montana or Idaho. When such excursions occur, the degree day distribution is no longer a linear, or symmetric, function of the temperature, but rather becomes a skewed function of the temperature. In that case, the value at which the curve crosses the 50% line (the median) is considered the normal. That is, the average is not used to represent the normal for degree days, because more cases tend to be on one side of the average than on the other--a feature of a skewed, or asymmetric, distribution.

The second curve, shown in yellow, labeled "observed data", is a probability of exceedance curve derived from the observed data without any model fitting. It steps down by 3.33% every time one of the 30 observations in the normal base period no longer exceeds the value shown on the x-axis. This curve is displayed so that the user may observe how good a fit the smoothed climatological curve is to the actual degree day data. The fitted curves are based on a Gaussian distribution for temperature, and on an empirical temperature-versus-degree day correspondence for degree day data, using data from the 1931-97 period of record. The curve based directly on the data is expected to be somewhat rough and irregular, with gaps in some places and clustering in others. This irregularity is caused by the lack of a very long sampling period--i.e., 30 years rather than several hundred years of observations. If the same number of observations were sampled from an earlier period and the underlying climate were identical, the places having gaps and clusters would be expected to change. When the irregularities are changeable from one sample to another and have equal chances of appearing in various places in the distribution, the smooth fitted climatological curve is thought to estimate the true population distribution better than the curve formed from any single sampling of the data. However, in some cases there may be a physical reason for deviations from a smooth distribution. In that case, sampling 500 years of data would not eliminate these features of the curve. However, these features would be expected to appear somewhat more smoothly (less "noisy" or jumpy) than features caused purely by sampling variations. For example, a tendency for a plateau of shallow slope might appear near the middle of the "probability of exceedance" distribution, where the steepest slope is usually found, or a steep slope might be found off the center of the distribution. At CPC we believe that in most cases the fitted curve is a better representation of nature than the raw data curve. That is, most of the irregularities in the raw data curve occur by chance alone, and would not appear if it were possible to sample a much larger set of cases.

The third curve, shown in red, labeled "final forecast", represents the probability distribution of the final official CPC forecast. The downward slope of the final forecast curve may be slightly steeper than the slope of the climatological curve, in proportion to the confidence associated with the final forecast. (Several aspects of the confidence are indicated in each graph; these are described below.) This is because when a forecast is thought to be relatively skillful (as, for example, when there is a strong ENSO event in progress and the location is one in which an ENSO impact is anticipated), the range of possibilities is smaller than if no useful forecast knowledge were in hand. This represents a decrease in the uncertainty, which shows up as a narrower range of degree day values within which the probability of exceedance changes by a given amount. In some cases there is a shift of the forecast curve relative to the normal curve, but without a steeper slope in the forecast curve. This would indicate some confidence in the shift away from the normal, but without a decrease in the range of possibilities in the shifted climate. This might occur when a trend, or climate change relevant to the present decade as a whole, is believed to be occurring. A fourth pair of curves, shown by thin red lines, represents an "error envelope". It is drawn on either side of the main final forecast curve, paralleling that curve. These lines illustrate our estimate of the amount of possible error associated with the forecast curve. The forecast curve itself already conveys uncertainty about the forecast; this is why it is shown in a probabilistic framework and usually has a downward slope that is not much steeper than the slope of the fitted climatological curve. In addition to this inherent uncertainty, there is also some uncertainty related to other aspects of the forecast. Examples of these additional error sources are (1) errors in the most recent observed data used to determine the forecasts, (2) errors in the forecasters' perception, judgement and understanding of the current climate state, and (3) imperfections in the fit of the climatological and forecast d istributions to the actual data (as revealed by the differences between the yellow curve and the black curve). All of these factors could result in some error in the positioning of the forecast curve as a whole. While an accurate evaluation of the size of this error is not possible, an approximation is provided by the error envelope. The approximation is based on the expected sampling variability of the climatological probability of exceedance using 45 years of data. The resulting envelope is thought to be nonconservative--i.e. the size of the error in the position of the forecast curve is more likely to be over-represented than under-represented.

How to read a "probability of exceedance" curve:

As an example, suppose we first examine the "normal" curve in any one of the graphs. Like the other two curves, the normal curve begins near the 100% level in the upper left portion of the graph. This indicates that the probability that the degree day total will exceed the amount shown by the number given at the extreme left of the horizontal axis (near the lower left corner of the graph) is close to 100%. This makes sense, because the amount has been chosen to be far below the expected normal at the given location and season--an amount that may never have been observed during the normal base period. This low value is chosen because it is unlikely, but possible. As the value is increased from the left toward the right side of the graph, the probability of it being exceeded begins to decrease, decreasing most rapidly near the middle of the climatological distribution, shown near the middle portion of the graph. The boundaries between the climatologically lowest and middle tercile, and between the middle and highest tercile, are indicated by vertical lines that intersect the normal curve where the probability of exceedance is 66.7% and 33.3%, respectively. Vertical lines indicating the median, or 50% probability of exceedance, and the 10% and 90% probabilities of exeedance, are also shown. In the right portion of the graph, the "probability of exceedance" line continues to decline and approaches 0% as the horizontal axis values become so large as to be very unlikely to be exceeded. Because the curve continues to decrease from near 100% to near 0% as the degree day total on the horizontal axis increases, the probability that the degree day total will be between any two values on the graph can be determined by subtracting the lower probability of exceedance value from the higher probability of exceedance value. In the case of the "normal" curve, this probability is with respect to the normally expected climatology for the location and season. When the probability is determined with respect to the "final forecast" curve, it is a statement of CPC's forecast probability. It is useful to compare this probability with that for the normal climatology to appreciate the difference attributable to the current climate state and climate outlook. Because of the modest level of skill in many cases, this difference may often be minor, and, in the case of the "CL" (climatological probability) temperature forecast, there is no difference for degree days at all.

Automatic Probability Evaluation:

In subtracting two "probability of exceedance" values in order to evaluate the probability of occurrence between a lower and upper limit, it is often difficult to obtain an accurate visual estimate of the probability of exceedance from the graph. For this purpose, the process will be automated for users' convenience in the future. The option to use this utility will be provided on each graph. The user will only need to select a region, a lead time, and the lower and upper limits of degree days within which a probability is to be evaluated. The answer will be computed with respect to both the climatological (normal) and the "final forecast" probability of exceedance curves.

Caution required for the tails of the curves:

Each of the curves is constructed on the basis of historical observations, and/or the nature and strength of the impacts of the estimated current and future climate state. Near the middle of the distribution there has been plentiful data sampled, because the middle of the distribution is most likely and most frequently observed. On the other hand, in the tails, or extremes, of the distribution, there have only been a few cases. Sometimes there may have been no cases in a large portion of a tail, and then just a single observation far out in the extreme part of that tail. Whatever the exact configuration of the observations, the tails are less certain than the middle and the shoulders of the distribution. Therefore, conclusions based on the extreme tails of the distribution are particularly dangerous, and should be made with caution. A warning about the upper and lower 7% tails of the curves is posted on each graph. The middle 86% of the probability distribution, ranging from the 93% to the 7% probability of exceedance values, is considered to be reasonably well sampled, in contrast with the outer 7% tails. [Technical note: These cutoffs span the �1.5 standard deviation interval with respect to the mean for a Gaussian distribution, within which usable numbers of observations are thought to have been sampled.]

Numerical information printed on the graphs:

Above and to the right of the probability of exceedance curves, selected summary information is printed.
The block of text at the upper left shows the point forecast, or best guess of the numerical degree day forecast. While the point forecast is expressed as an exact number, the certainty that the result will turn out to be this number is extremely low. The point forecast is only given to represent the middle, or center, of the forecast distribution, and does not imply that we can make an accurate numerical forecast. Underneath the point forecast, the anomaly forecast, or the departure of the point forecast from the normal, is shown. Beneath the anomaly forecast is the normal (or center) , based on the observations during the normal base period as discussed above. The normal is represented by the degree day total that corresponds to the mean temperature. To the right of the above set of numbers, the percentile (%ile) shows the percentage of cases in the fitted climatological distribution that would be expected to be lower than or equal to the point forecast. For example, if the %ile is 60.0, the point forecast ranks at or above 60.0 percent of the climatological distribution, and less than 40.0 percent of it. It is a forecast for a somewhat, but not highly, above normal degree day total. Because of the fairly large degree of uncertainty in the outcome of our forecasts, it is not common for our point forecasts to rank in the top or bottom 20 percent of the climatological distribution. Beneath the %ile of the forecast, the forecast percentage of the normal degree day total is shown. Next is given the linear correlation between the mean temperature and the degree day total for this location and season. When the mean temperature is far enough away from 65 degrees F (e.g. Minneapolis in winter, or Miami in summer), there is a -1.00 correlation between temperature and heating degree days and a +1.00 correlation between temperature and cooling degree days. This implies that the degree day forecast is completely determined by the temperature forecast. When daily temperatures within the season may be on the opposite side of 65 degrees than it normally is, the linear correlation with temperature is degraded and degree day totals vary more slowly and less predictably as a function of seasonal mean temperature than would otherwise be the case. The degree to which the degree day versus temperature relationship departs from linearity is shown by the "Cor with T" statistics. The first correlation is that of all 67 years in the 1931-97 period. The second correlation ("mid20%") is the correlation using only the years whose temperature was in the middle quintile of the distribution (the 40 to 60 percentile range). In restricting the range of the temperature in this way, deviations of the correlation from plus or minus 1.00 show up more easily (although the end of the distribution whose temperature is closest to, or crosses, 65 degrees generally has lower correlation than the other end), giving the user a more sensitive indicator of lack of linearity in the heart of the distribution. In addition to the standard versions of the point forecast, the anomaly, and the percentage of normal, a weighted version of each of these is also given, shown in parentheses and indicated with "w:". While the standard version is based solely on the point forecast, the weighted version takes the entire degree day distribution into account, incorporating the possible nonlinear portion of the correspondence with temperature. It integrates the degree day forecast across the its range of associated temperatures, weighting by the probability. When the degree day versus temperature relationship is completely linear (i.e. when daily temperatures never cross 65 degrees), the standard and weighted degree day forecasts should be identical. When there is some nonlinearity (e.g. southern Florida heating degree days in Nov-Dec-Jan), then the weighted results come out more conservative (less anomalous) than the standard results. When the two versions differ, the version of choice depends upon the users' preference and philosophy. On the upper right side of the graph, 50% and 90% confidence intervals are given for the forecast. The 50% confidence interval gives lower and upper degree day totals. The lower amount corresponds to the 25%ile (75% probability of exceedance) of the forecast distribution, and the upper amount is the amount that corresponds to the 75%ile (25% probability of exceedance) of the forecast distribution. The values falling between these two limits form an interval that CPC believes has a 50 percent chance of occurring. The 90% confidence interval covers a wider range of values, ranging from the 5%ile (95% probability of exceedance) to the 95%ile (5% probability of exceedance) of the forecast. The ranges of amounts covered by the 50% and 90% confidence intervals give an idea of the expected error associated with the point forecast. When the degree day versus temperature correspondence is nonlinear (skewed), the distance upward to the top of the confidence interval is not equal to the distance downward to the lower boundary of the confidence interval. The ranges covered by the 50% and 90% confidence intervals are usually fairly wide, in keeping with the uncertainty associated with the point forecast. Note that there is some uncertainty for the confidence intervals themselves, just as there is uncertainty for the probability of exceedance curves themselves (as indicated by the "error envelope").

Directly below the forecast confidence intervals, three types of measures of confidence in the forecast are posted, both numerically and verbally. These are estimates of three aspects of the skill expected in the case of the particular forecast. The first, the confidence in shift direction, is a confidence that the degree day total will deviate from the normal in the direction indicated, without regard to the size of the deviation. The second, the confidence in the point forecast (contraction of the forecast distribution), is confidence that the degree day total will be close to the point forecast. The third, the integrated confidence, is confidence that the probability distribution as a whole will be different from the climatological probability distribution. The verbal descriptions that accompany these confidence levels are, in ascending order: none, low, fair, moderate and high. Plus and minus signs appear for cases close to the categorical boundaries. Each of the three aspects of confidence is further described next, enabling the user to decide which confidence (or confidences) is most applicable to their needs.

Confidence in shift direction: This is a measure of how confident we are that the degree day total will deviate from the normal in the direction specified, whether below or above the normal. The measure, more specifically, is the ratio of the estimated probability that the climate will deviate in the forecast direction to the estimated probability that it will deviate in the opposite direction. For example, if the confidence in the shift direction is 2.00, it indicates that we believe there is twice the probability of a deviation in that direction than in the opposite direction. If the forecast direction is below the normal, a 2.00 confidence would mean that the probability of below normal conditions is 66.7% and the probability of above normal conditions is 33.3%. If there is no confidence whatsoever regarding which side of the normal will occur, the ratio is exactly 1. Note that the ratio is not that of the probability of the more favored outer tercile to the other, but rather a ratio of the forecast probabilities of occurrence of one half of the climatological distribution to the other half. The dividing line between the two halves is the numerical value of the normal (or center of the distribution) that is posted in the upper left corner of the graph. When the "climatological probabilities" (CL) forecast is issued, the shift direction confidence is at its minimum of 1. It may also be 1, however, for a non-"climatological probabilities" forecast--when there is confidence in other aspects of the forecast. One example would be when the likelihood of the near normal category is higher than would be expected climatologically--and the chance of above normal and below normal are both reduced from the climatological chance. This could occur, for example, in a season and at a location having fairly high sensitivity to the state of the ENSO, in a case when the ENSO condition is expected to be very close to normal (i.e. neither an El Ni�o nor a La Ni�a tendency is expected). In such a case, although the chances of large deviations from normal are reduced, the direction of the shift from normal is just as uncertain as it would be without any knowledge of the ENSO condition. Forecasts for directional shifts may be considered somewhat useful when this confidence measure exceeds 1.5, and more clearly useful when it exceeds 1.8 or even 2 (which is uncommon). It should also be noted that a high confidence in the shift direction usually, but not always, means that the size of the shift is expected to be large. The amount of the expected shift can be seen in the anomaly of the point forecast. In cases with high confidence in the point forecast (another type of confidence, described below), the confidence in the shift direction may be high even though the predicted size of the shift is only moderate. This i s possible because the shift direction refers to any amount of shift in the indicated direction, regardless of size.

Confidence in the point forecast (contraction of forecast distribution): This is a measure of how narrow, or limited, the distribution of possibilities about the point forecast value is believed to be, compared with the distribution of the historical observations about the normal value. Given our current state-of-the-art in climate prediction, confidence in the point forecast is often small. When the forecast distribution has the same, or nearly the same, width as the climatological distribution, this indicates a relative absence of forecast knowledge that would limit the range of possibilities. The measure, more specifically, is the fraction of the width [standard deviation] of the forecast distribution to the width of the climatological distribution. When the forecast distribution is no narrower than the observed climatological distribution, the confidence is 1. Confidence values of less than 0.9 are considered somewhat helpful, and below 0.8, while rare, are still more helpful. This confidence measure is increased in locations and seasons when climate conditions are known to be related to governing forces (such as ENSO), and the status of these forces is able to be somewhat correctly anticipated for the period being forecast. An example of this would be the degree day total during the winter in Minnesota and other regions in the northern Plains of the U.S., which is partly determined by the ENSO state, given that the ENSO state itself is somewhat predictable for forecasts made after the preceding summer. In such a forecast, the possibilities for Minnesota degree days are somewhat more limited than they would be with no knowledge of the influence of ENSO or no knowledge of what the ENSO state would likely be during the future period being forecast. This particular confidence measure is not related to the amount of shift of the point forecast from the normal; rather, only the width of the probability distribution about its own central value (the point forecast) is relevant here. Therefore, forecasts that are close to the normal still may rate relatively high on this confidence measure. Likewise, in some cases there may be a noticeable shift of the point forecast from the normal, but little or no narrowing of the distribution. This could occur, for example, when there is a gradual, long-term trend that is used in determining the forecast, but when there is little or no information about differences between the climate this year and the last few years of the same season. In that case, all recent years would be affected by the general trend approximately equally, but their large differences from one another related to factors besides the trend are poorly forecast. [Technical note: This confidence measure is the standard error of estimate in a linear regression model. For example, when it is 0.866, the expected skill of the forecast is describable with a linear correlation coefficient of 0.5.]

Integrated confidence: an integrated distributional difference from climatology. This is a measure of the estimated totality of all differences between the forecast distribution and the climatological distribution. It includes both distributional shifts and narrowing (i.e. confidence in the point forecast over and above the confidence that would be associated with a climatological forecast), as in the context of the two confidence parameters described above. It would also include distributional deviations of other types that may prove to be possible to predict in the future, such as a widening of the distribution (e.g., as related to an expectation of greater than normal intraseasonal variation), or asymmetric or irregular features of the distribution as may be related to specific climate conditions in certain geographical locations (e.g. involving terrain, or land vs. water). This measure, specifically, is estimated as the total of the differences in probabilities of exceedance between the climatological distribution and the forecast distribution over the 9 points on the climatological distribution corresponding to its 0.90, 0.80, 0.70, ....., 0.20, and 0.10 probability of exceedance values. This sum of the differences is then scaled with respect to the result which would be attained when the forecast distribution is completely separated from the climatological distribution. In the case of complete separation, the climatological probability of exceedance remains at 1 (or 100%), or at 0 (or 0%), while the forecast distribution moves through all of its intermediate values. Complete separation, which is currently unattainable given today's state-of-the-art in climate prediction, would produce a integrated confidence score of 1, while a total absence of separation (as in the case of the "climatological forecast") would produce a score of 0. Integrated confidence values of 0.2 are considered moderately useful by today's standards, and values of 0.3 are clearly useful. In examining the integrated confidence values that accompany the graphs, it becomes clear that distributional shifts tend to account for the majority of the integrated confidence value, while distribution narrowing contributes to a lesser degree. This characteristic implies that occurrences of strong climate forcing conditions, whether related to ENSO, strong decadal trends in progress, or other factors, represent "forecasts of opportunity", and that forecast skill (and utility) are not constant from year to year for a given location, season and lead time. Of the three confidence measures discussed here, the only one that remains nearly constant from year to year is the confidence in the point forecast, showing the narrowness of the forecast distribution relative to the climatological distribution. Fortunately, our current lack of strong point forecast confidence does not prevent us from having fairly high shift direction confidence under certain circumstances.

The middle block of text on the right side of the graph provides estimated probabilities, for the final forecast, of selected categorical outcomes with respect to the climatological degree day distribution. Included are the probability of the highest 10% of the climatological distribution, the highest third (called "above normal" in the traditional maps of temperature forecast probability anomaly), the middle third ("near normal"), the lowest third ("below normal") and the lowest 10% of the climatological distribution. The probabilities are indicated on the right side of each line of text, and are given to the nearest tenth of one percent. The boundaries of the degree day totals that define these categories, are shown in parentheses in each text line. When the "CL" (climatological probability) is given on the map of temperature forecast probability anomaly, the probabilities shown in this block of text will show just that: 10.0%, 33.3%, 33.3%, 33.3%, and 10.0%.