CPC - Introduction to Probability of Exceedance

www.nws.noaa.gov

About Us



Contact Us

HOME > Outlooks > Monthly to Seasonal Outlooks > Probability of Exceedance Forecast > Introduction to Probability of Exceedance

Understanding the "Probability of Exceedance" Forecast Graphs for Temperature and Precipitation

(For a tutorial on the degree day "probability of exceedance" graphs, go back and then click on the DEGREE DAY option)

Purpose of the Graphs

The "probability of exceedance" curves give the forecast probability that a temperature or precipitation quantity, shown on the horizontal axis, will be exceeded at the location in question, for the given season at the given lead time. The information on these graphs is consistent with the information given in the forecast maps of probability anomaly (the multi-season climate outlook) that have been issued since the beginning of 1995. Those forecast maps show the probability anomaly of the most favored tercile of the climatological distribution: below normal, near normal, or above normal. The graphs shown here are intended to provide additional detail about the forecast probability distribution at an individual location i.e., any one of 102 climate regions in the mainland U.S., or an individual station in other regions. The additional information comes about through the display of the entire probability distribution, as opposed to just the probability anomaly of the most favored tercile. With the entire distribution, users may select any cutoffs or categories that are of particular interest to them, and are not limited to pre-established tercile categories. Although skill in climate forecasting is in most cases modest in absolute terms, there is nonetheless justification to issue a complete forecast probability distribution. Showing the distribution is our attempt to accurately convey the sense of the forecast while also showing the degree of uncertainty (which is often high) contained in that forecast. In an additional new web site facility, a flexible user-prescribed probability assessment will be provided in the future so that the "probability of exceedance" curve can be used to automatically determine the forecast probability of occurrence with respect to the users's own upper and lower limits.

What the Curves Mean

Each graph contains four curves. One of the four is actually a set of two curves.

The first curve , shown in black, shows the "normal" , or climatological probability distribution. Sometimes no black line appears in the graph. When that is the case, the black line is hidden underneath the thick red line, which shows the forecast distribution (to be explained below). The climatological distribution is derived by computing the average, and also computing a measure of the degree of year-to-year variation around that average. The curve is therefore called a "fitted" curve, because it is defined using a formula that makes it possible to construct a smooth curve to the data. The data may not be so smooth and regular, but the formula only uses the average and the typical deviations from that average to define the curve. The center (such as the average [mean] or the median) of this distribution is based on the historical record of observations at the given location and season during the period used as the normal base period. For example, in 1999 the normal base period is 1961-90. After reaching the year 2001, the base period is expected to be updated to 1971-2000. The value of the center of the distribution, or normal, is printed numerically near the top of the graph. The variability, or range, of the climatological distribution is based on a period longer than the normal base period. This is done in order to obtain a more accurate estimate of the variability. Getting an acceptably accurate estimate of the variability requires more cases than getting an acceptably accurate estimate of the center of the distribution. The additional years used in getting the variability estimate occur immediately prior to the base period. For temperature, 10 extra years are used, while 20 extra years are used for precipitation. Gradual trends are not allowed to contribute to the breadth, however; this is accomplished by using a sliding 30-year period, within the 40- or 50-year period, for the means about which the variations are defined. For temperature, the value of the center of the distribution represents the mean, or average, and is the temperature at which the curve crosses the 50% line from the vertical axis. For precipitation, whose distribution is often not symmetric, the value at which the curve crosses the 50% line (the median) is considered the normal. For precipitation, the average is normally greater than the median, and is not used to represent the normal the way it is for temperature. The average is not used to represent the normal for precipitation because, quite often, more cases are below the average than are above it. The average precipitation is higher than the typical amount because the wettest few cases are much farther above the average than the driest cases are below it. This is the main feature of a positively skewed, or asymmetric, distribution.

The second curve , shown in yellow, labeled "observed data" , is a probability of exceedance curve derived from the observed data without any model fitting. It steps down every time an observed datum no longer exceeds the value shown on the x-axis. Each of the 30 years in the base period is used for this stepped curve. Each step, therefore, represents a 3.33% change in the probability of exceedance. This curve is displayed so that the user may observe how good a fit the smoothed climatological curve is to the actual data. The fitted curves are based on a Gaussian distribution for temperature, and on a Gaussian distribution for precipitation after using a flexible power transformation to eliminate the asymmetry of the original precipitation data. (More is said about the fitting of precipitation near the bottom of this web site.) The curve based directly on the data is expected to be somewhat rough and irregular, with gaps in some places and clustering in others. This irregularity is caused by the lack of a very long sampling period–i.e., 30 years rather than several hundred years of observations. If the same number of observations were sampled from an earlier period and the underlying climate were identical, the places having gaps and clusters would be expected to change. When the irregularities are changeable from one sample to another and have equal chances of appearing in various places in the distribution, the smooth fitted climatological curve is thought to estimate the true population distribution better than the curve formed from any single sampling of the data. However, in some cases there may be a physical reason for deviations from a smooth distribution. In that case, sampling 500 years of data would not eliminate these features of the curve. However, these features would be expected to appear somewhat more smoothly (less "noisy" or jumpy) than features caused purely by sampling variations. For example, a tendency for a plateau of shallow slope might appear near the middle of the "probability of exceedance" distribution, where the steepest slope is usually found, or a steep slope might be found off the center of the distribution. At CPC we believe that in most cases the fitted curve is a better representation of nature than the raw data curve. That is, most of the irregularities in the raw data curve occur by chance alone, and would not appear if it were possible to sample a much larger set of cases.

The third curve , shown in red, labeled "final forecast" , represents the probability distribution of the final official CPC forecast. This curve is consistent with the probability anomaly maps that it is designed to accompany, which have been issued since early 1995. Thus, the probability printed for the most favored tercile in this new product should correspond to the probability anomaly shown in the probability anomaly maps. When the maps indicate "CL", or climatological probabilities, this product displays a final forecast curve that coincides with the normal curve (and the final forecast curve hides the normal curve). The final forecast curve incorporates all information leading to the forecast, including (if applicable) the ENSO state, gradual trends, the NAO state, and other factors including indications provided by individual dynamical or statistical forecast tools. The downward slope of the final forecast curve may be slightly steeper than the slope of the climatological curve, in proportion to the confidence associated with the final forecast. (Several aspects of the confidence are indicated in each graph; these will be described below.) This is because when a forecast is thought to be relatively skillful (as, for example, when there is a strong ENSO event in progress and the location is one in which an ENSO impact is anticipated), the range of possibilities is smaller than if no useful forecast knowledge were in hand. This represents a decrease in the uncertainty, which shows up as a narrower range of temperature or precipitation values within which the probability of exceedance changes by a given amount. In some cases there is a shift of the forecast curve relative to the normal curve, but without a steeper slope in the forecast curve. This would indicate some confidence in the shift away from the normal, but without a decrease in the range of possibilities in the shifted climate. This might occur when a trend, or climate change relevant to the present decade as a whole, is believed to be occurring.

A fourth pair of curves , shown by thin red lines, represents an "error envelope" . It is drawn on either side of the main final forecast curve, paralleling that curve. These lines illustrate our estimate of the amount of possible error associated with the forecast curve. The forecast curve itself already conveys uncertainty about the forecast; this is why it is shown in a probabilistic framework and usually has a downward slope that is not much steeper than the slope of the fitted climatological curve. In addition to this inherent uncertainty, there is also some uncertainty related to other aspects of the forecast. Examples of these additional error sources are (1) errors in the most recent observed data used to determine the forecasts, (2) errors in the forecasters' perception, judgement and understanding of the current climate state, and (3) (3) imperfections in the fit of the climatological and fore cast distributions to the actual data (as revealed by the differences between the yellow curve and the black curve). All of these factors could result in some error in the positioning of the forecast curve as a whole. While an accurate evaluation of the size of this error is not possible, an approximation is provided by the error envelope. The approximation is based on the expected sampling variability of the climatological probability of exceedance using 45 years of data. The resulting envelope is thought to be nonconservative–i.e. the size of the error in the position of the forecast curve is more likely to be over-represented than under-represented. In fact, during most seasons at most U.S. locations the probability anomaly for the favored tercile would need to be 6 to 9 percent in order for the error envelope to exclude the fitted climatological (normal) probability of exceedance curve over a major portion of the range of the forecast curve. We believe that a large error envelope is prudent in underscoring the need for caution and conservatism on the users' part. The error envelope is smaller at the tails of the distribution than near the middle. However, it is much larger at the tails in terms of percentage of the difference of the forecast probability from 100% for the left tail, and from 0% for the right tail. This is a reminder that conclusions based on the tails of the forecast curve are dangerous, such as a statement that the chance of being in the 1% tail is 4 times as much as it would be climatologically. The forecast's error envelope shows that quantitative statements about the probability of such an extreme event are not warranted.

Near the bottom of the graph, below the horizontal "0%" line, the observations of the last 10 years (but 15 years for precipitation) are shown by the last two digits of the year. (For example, a "98" would indicate the observation for the year 1998.) These digits provide information about the climate at the given location and season during recent years. The average (or median for precipitation) of the years shown is indicated by an asterisk on the horizontal "0%" line. The purpose of the display is to show how the most recent observations compare with the overall distribution. In some cases the climate of the recent years may tend to be different from the "normal" shown by the entire curve--perhaps mainly lower or mainly higher. Or, there may be more extremes on both sides of the distribution's center than would be expected. If they appear to be unrepresentative of the normal in any respect, the user is faced with the question of whether the climate to occur for the forecast period will follow the tendency of the recent years. At CPC this issue is examined thoroughly, and is considered as a potentially major component of the forecast. In some cases a strong trend is believed to exist, and is clearly reflected in the forecast, while in other cases a difference in the climate of recent years may be considered only a random occurrence. The lowest block of printed information to the right of the curves gives some descriptive information about the tendency of the climate in recent years, including the difference of the mean (or median) from the overall normal.

How to Read a "Probability of Exceedance" Curve

As an example, suppose we first examine the "normal" curve in any one of the graphs. Like the other two curves, the normal curve begins very near the 100% level in the upper left portion of the graph. This indicates that the probability that the temperature or precipitation will exceed the amount shown by the number given at the extreme left of the horizontal axis (near the lower left corner of the graph) is very close to 100%. This makes sense, because the amount has been chosen to be far below the expected normal at the given location and season–an amount that probably has never been observed during the normal base period (except for the occasional case of zero precipitation at certain locations at certain times of the year). This low value is chosen because it is highly unlikely, but not impossible. As the value is increased from the left toward the right side of the graph, the probability of it being exceeded begins to decrease, decreasing most rapidly near the middle of the climatological distribution, shown near the middle portion of the graph. The boundaries between the climatologically lowest and middle tercile, and between the middle and highest tercile, are indicated by vertical lines that intersect the normal curve where the probability of exceedance is 66.7% and 33.3%, respectively. Vertical lines indicating the median, or 50% probability of exceedance, and the 10% and 90% probabilities of exeedance, are also shown. In the right portion of the graph, the "probability of exceedance" line continues to decline and approaches 0% as the horizontal axis values become so large as to be highly unlikely to be exceeded. Because the curve continues to decrease from near 100% to near 0% as the temperature or precipitation value on the horizontal axis increases, the probability that the temperature or precipitation will be between any two values on the graph can be determined by subtracting the lower probability of exceedance value from the higher probability of exceedance value. In the case of the "normal" curve, this probability is with respect to the normally expected climatology for the location and season. When the probability is determined with respect to the "final forecast" curve, it is a statement of CPC's official forecast probability. It is useful to compare this probability with that for the normal climatology to appreciate the difference attributable to the current climate state and climate outlook. Because of the modest level of skill in many cases, this difference may often be minor, and, in the case of the "CL" (climatological probability) forecast, there is no difference at all.

Automatic Probability Evaluation:

In subtracting two "probability of exceedance" values in order to evaluate the probability of occurrence between a lower and upper limit, it is often difficult to obtain an accurate visual estimate of the probability of exceedance from the graph. For this purpose, the process will be automated for users' convenience in the future. The option to use this utility will be provided on each graph. The user will only need to select a region, a lead time, a variable (temperature or precipitation), and the lower and upper limits within which a probability is to be evaluated. The answer will be computed with respect to both the climatological (normal) and the "final forecast" probability of exceedance curves.

Caution Required for the Tails of the Curves

Each of the curves is constructed on the basis of historical observations, and/or the nature and strength of the impacts of the estimated current and future climate state. Near the middle of the distribution there has been plentiful data sampled, because the middle of the distribution is most likely and most frequently observed. On the other hand, in the tails, or extremes, of the distribution, there have only been a few cases. Sometimes there may have been no cases in a large portion of a tail, and then just a single observation far out in the extreme part of that tail. Whatever the exact configuration of the observations, the tails are less certain than the middle and the shoulders of the distribution. Therefore, conclusions based on the extreme tails of the distribution are particularly dangerous, and should be made with caution. The probability curves are based on a Gaussian distribution for temperature, and a flexible power-transformed Gaussian distribution for precipitation. In both cases, the shape and length of the tails are based both on the extreme values and on the variability of the values closer to the middle of the distribution. The tails express only an educated guess of the actual extreme value probabilities, and should not be taken literally. A warning about the upper and lower 7% tails of the curves is posted on each graph. The middle 86% of the probability distribution, ranging from the 93% to the 7% probability of exceedance values, is considered to be reasonably well sampled, in contrast with the outer 7% tails. (Technical note: These cutoffs span the ±1.5 standard deviation interval with respect to the mean for a Gaussian distribution, within which usable numbers of observations are thought to have been sampled.)

Numerical Information Printed on the Graphs

Above and to the right of the probability of exceedance curves, selected summary information is printed. The block of text at the upper left shows the point forecast, or best guess of the numerical forecast. While the point forecast is expressed as an exact number, the certainty that the climate will turn out to be this number is extremely low. That is the idea conveyed by the probability of exceedance curves, which always descend slowly to the right, and never suddenly. The gradual rate with which the curves decline implies a large uncertainty. The point forecast is only given to represent the middle, or center, of the forecast distribution, and in no way is intended to imply that we can make a highly accurate numerical forecast. By analogy, when two 6-sided dice are rolled, the midpoint of the distribution of outcomes for the total is 7. However, an outcome of 7, while more likely than any other outcome, is expected to occur only 16.7% of the time on average, if both dice are fair. In the case of our forecasts, the probability of an outcome of exactly (to two decimal places for temperature and precipitation) what our point forecast indicates is very small -- often less than 1%. Again, it is provided to indicate the center of a large distribution of possible outcomes, the size of which is expressed by the probability of exceedance graphs and the computations that can be done on their basis. This warning is very important because our forecasts are highly imperfect , and should never be interpreted as being exact in the way that a predicted time and height of a local high tide at a seaport is exact. Underneath the point forecast, the anomaly forecast, or the departure of the point forecast from the normal, is shown. Beneath the anomaly forecast is the normal (or center) , based on the observations during the normal base period as discussed above. The normal is represented by the mean for temperature, and by the fitted median for precipitation. In the precipitation graphs, the raw (non-fitted) median is also shown to the right of the fitted median. This is the middle-ranked precipitation amount over the 30-year normal period. It is the average of the 15th and 16th highest amounts (also 15th and 16th lowest amounts). When there is an odd number of amounts, the raw (unfitted) median is simply the single middle-ranked observation. The fitted median is considered a better representation of the center of the precipitation distribution than the unfitted median, because the latter is affected noticeably more by sampling variations.

To the right of the point forecast, the anomaly and the normal, more numerical information about the forecast is shown. The percentile (%ile) shows the percentage of cases in the fitted climatological distribution that would be expected to be lower than or equal to the point forecast. For example, if the %ile is 60.0, the point forecast ranks at or above 60.0 percent of the climatological distribution, and less than 40.0 percent of it. In other words, it is a forecast for somewhat, but not highly, above normal conditions. Because of the fairly large degree of uncertainty in the outcome of our forecasts, it is not common for our point forecasts to rank in the top or bottom 20 percent of the climatological distribution. Beneath the %ile of the forecast, the number of standard deviations (#SDs) away from the climatological normal is given for the temperature forecasts. This is a technical measure that is intended for scientific users, and gives exactly the same information as the %ile of the forecast. That is, for the temperature forecasts every %ile has a corresponding number of SDs away from the mean. (For precipitation this correspondence is not obeyed because of the skewness of the precipitation distribution, so the #SDs is not shown.) The #SDs is given for those who are more used to working with SDs than with %iles. Underneath the number of SDs is the standard deviation (SD) of the climatological distribution for temperature. This is a measure of the variability of the temperature from year to year. Specifically, it gives the amount below and above the normal that forms an interval such that about 68 percent of the cases fall within that interval, assuming a Gaussian distribution. If twice the SD amount is used to form the interval, about 95 percent of the cases would be included. Locations and seasons that have small variations in climate (such as Florida in summer) have a low SD, while places with more year-to-year (or decade-to-decade) variation, such as Montana in winter, have a high SD. Together with the normal (center) number, the SD gives the user an approximate idea of the type of climate normally observed in the given season and loaction. [Technical note: The #SDs number equals the anomaly forecast divided by the SD.]

On the upper right side of the graph, confidence intervals are given for the forecast. The 50% confidence interval gives two temperature (or precipitation) amounts. The lower amount corresponds to the 25%ile (75% probability of exceedance) of the forecast distribution, and the upper amount is the amount that corresponds to the 75%ile (25% probability of exceedance) of the forecast distribution. The amounts that fall between these two limits form an interval that CPC believes has a 50 percent chance of occurring. The 90% confidence interval covers a wider range of amounts, ranging from the 5%ile (95% probability of exceedance) to the 95%ile (5% probability of exceedance) of the forecast. The ranges of amounts covered by the 50% and 90% confidence intervals give an idea of the expected error associated with the point forecast. For temperature, the confidence intervals are formed by moving an equal distance on either side of the point forecast. For precipitation, where the distribution is usually asymmetric (skewed), the distance upward to the top of the confidence interval is usually greater than the distance downward to the lower boundary of the confidence interval. In either case, the regions outside of the confidence interval limits may be considered to use up the remaining probability equally. The ranges covered by the 50% and 90% confidence intervals are usually fairly wide, in keeping with the uncertainty associated with the point forecast. Note that there is some uncertainty for the confidence intervals themselves, just as there is uncertainty for the probability of exceedance curves themselves (as indicated by the "error envelope"). The limits of the 90% interval reach into the tails of the forecast distribution, and should therefore be considered in an approximate manner.

Directly below the forecast confidence intervals, three types of measures of confidence in the forecast are posted, both numerically and verbally. These are estimates of three aspects of the skill expected in the case of the particular forecast. The first, the confidence in shift direction, is a confidence that the climate will deviate from the normal in the direction indicated, without regard to the size of the deviation. The second, the confidence in the point forecast (contraction of the forecast distribution), is confidence that the climate will be close to the point forecast. The third, the integrated confidence, is confidence that the probability distribution as a whole will be different from the climatological probability distribution. The verbal descriptions of the confidence levels are, in ascending order: none, low, fair, moderate and high. Plus and minus signs appear for cases close to the categorical boundaries. Each of the three aspects of confidence is further described next, enabling the user to decide which confidence (or confidences) is most applicable to their particular needs.

Confidence in shift direction: This is a measure of how confident we are that the climate will deviate from the normal in the direction specified, whether below or above the normal. The direction of deviation from the normal is positive when the point forecast exceeds the value of the normal, and negative when it is lower than the normal. The measure, more specifically, is the ratio of the estimated probability that the climate will deviate in the forecast direction to the estimated probability that it will deviate in the opposite direction. That is, it is the odds of the climate deviating in the forecast direction as opposed to it not doing so. For example, if the confidence in the shift direction is 2.00, it indicates that we believe there is twice the probability of a deviation in that direction than in the opposite direction. If the forecast direction is below the normal, a 2.00 confidence would mean that the probability of below normal conditions is 66.7% and the probability of above normal conditions is 33.3%. If there is no confidence whatsoever regarding which side of the normal will occur, the ratio is exactly 1. Note that the ratio is not that of the probability of the more favored outer tercile to the other, but rather a ratio of the forecast probabilities of occurrence of one half of the climatological distribution to the other half. The dividing line between the two halves is the numerical value of the normal (or center of the distribution) that is posted in the upper left corner of the graph. When the "climatological probabilities" (CL) forecast is issued, the shift direction confidence is at its minimum of 1. It may also be 1, however, for a non-"climatological probabilities" forecast--when there is confidence in other aspects of the forecast. One example would be when the likelihood of the near normal category is higher than would be expected climatologically–and the chance of above normal and below normal are both reduced from the climatological chance. This could occur, for example, in a season and at a location having fairly high sensitivity to the state of the ENSO, in a case when the ENSO condition is expected to be very close to normal (i.e. neither an El Niño nor a La Niña tendency is expected). In such a case, although the chances of large deviations from normal are reduced, the direction of the shift from normal is just as uncertain as it would be without any knowledge of the ENSO condition. Forecasts for directional shifts may be considered somewhat useful when this confidence measure exceeds 1.5, and more clearly useful when it exceeds 1.8 or even 2 (which is uncommon). It should also be noted that a high confidence in the shift direction usually, but not always, means that the size of the shift is expected to be large. The amount of the expected shift can be seen in the value of the point forecast. In cases with high confidence in the point forecast (another type of confidence, described below), the confidence in the shift direction may be high even though the predicted size of the shift is only moderate. This is possible because the shift direction refers to any amount of shift in the indicated direction, whether large or small.

Confidence in the point forecast (contraction of forecast distribution): This is a measure of how narrow, or limited, the distribution of possibilities about the point forecast value is believed to be, compared with the distribution of the historical observations about the normal value. Given our current state-of-the-art in climate prediction, confidence in the point forecast is typically small. When the forecast distribution has the same, or nearly the same, width as the climatological distribution, this indicates a relative absence of forecast knowledge that would limit the range of possibilities. The measure, more specifically, is the fraction of the width of the forecast distribution to the width of the climatological distribution. When the forecast distribution is no narrower than the observed climatological distribution, the confidence is 1. Confidence values of less than 0.9 are considered somewhat helpful, and below 0.8, while rare, are still more helpful. This confidence measure is increased in locations and seasons when climate conditions are known to be related to governing forces (such as ENSO), and the status of these forces is able to be somewhat correctly anticipated for the period being forecast. An example of this would be the precipitation during the winter in Florida and other regions in the southern U.S., which is partly determined by the ENSO state, given that the ENSO state itself is somewhat predictable for forecasts made after the preceding summer. In such a forecast, the possibilities for Florida precipitation are somewhat more limited than they would be with no knowledge of the influence of ENSO or no knowledge of what the ENSO state would likely be during the future period being forecast. This particular confidence measure is not related to the amount of shift of the point forecast from the normal; rather, only the width of the probability distribution about its own central value (the point forecast) is relevant here. Therefore, forecasts that are close to the normal still may rate relatively high on this confidence measure. Likewise, in some cases there may be a noticeable shift of the point forecast from the normal, but little or no narrowing of the distribution. This could occur, for example, when there is a gradual, long-term trend that is used in determining the forecast, but when there is little or no information about differences between the climate this year and the last few years of the same season. In that case, all recent years would be affected by the general trend approximately equally, but their large differences from one another related to factors besides the trend are poorly forecast.

Integrated confidence: an integrated distributional difference from climatology. This is a measure of the estimated totality of all differences between the forecast distribution and the climatological distribution. It includes both distributional shifts and narrowing (i.e. confidence in the point forecast over and above the confidence that would be associated with a climatological forecast), as discussed in the context of the two confidence parameters described above. It would also include distributional deviations of other types that may prove to be possible to predict in the future, such as a widening of the distribution (e.g., as related to an expectation of greater than normal intraseasonal variation), or asymmetric or irregular features of the distribution as may be related to specific climate conditions in certain geographical locations (e.g. involving terrain, or land vs. water). This measure, specifically, is estimated as the total of the differences in probabilities of exceedance between the climatological distribution and the forecast distribution over the 11 points on the climatological distribution corresponding to its 0.98, 0.90, 0.80, 0.70, ....., 0.20, 0.10, and 0.02 probability of exceedance values. This sum of the differences is then scaled with respect to the result which would be attained when the forecast distribution is completely separated from the climatological distribution. In the case of complete separation, the climatological probability of exceedance remains at 1 (or 100%), or at 0 (or 0%), while the forecast distribution moves through all of its intermediate values. Complete separation, which is currently unattainable given today's state-of-the-art in climate prediction, would produce a integrated confidence score of 1, while a total absence of separation (as in the case of the "climatological forecast") would produce a score of 0. Integrated confidence values of 0.2 would be considered moderately useful by today's standards, and values of 0.3 would be clearly useful. In examining the integrated confidence values that accompany the graphs, it becomes clear that distributional shifts tend to account for the majority of the integrated confidence value, while distribution narrowing contributes to a lesser degree. This characteristic implies that occurrences of strong climate forcing conditions, whether related to ENSO, strong decadal trends in progress, or other factors, represent "forecasts of opportunity", and that forecast skill (and utility) are not constant from year to year for a given location, season and lead time. Of the three confidence measures discussed here, the only one that remains nearly constant from year to year is the confidence in the point forecast, showing the narrowness of the forecast distribution relative to the climatological distribution. From a practical standpoint, the shift of the forecast distribution from its normal position may be more important to users than its narrowness. This becomes clearer when one considers, for example, the precipitation over a season. If there is a high probability for abnormal wetness, the exact amount of observed precipitation, and its deviation from what was forecast, may be less important to a user than the fact that the precipitation amount was correctly forecast to be above the normal. A forecast of exactly normal temperature or precipitation, even with a very narrow forecast distribution, might not represent information as important to the managers of energy companies, or to farmers, as a forecast of deviant conditions with a wider probability distribution. It must also be noted that our ability to forecast likely shifts from the normal is currently greater than our ability to narrow the width of the forecast distribution. If, at a distant future time, we become able to significantly narrow the width of the forecast distribution (as we can do currently in 1-day weather forecasts), this would automatically improve our shift direction confidence as well. Fortunately, our current lack of strong point forecast confidence does not prevent us from having fairly high shift direction confidence under certain circumstances.

The middle block of text on the right side of the graph provides estimated probabilities, for the final forecast, of selected categorical outcomes with respect to the climatological distribution. Included are the probability of the highest 10% of the climatological distribution, the highest third (called "above normal"), the middle third (called "near normal"), the lowest third (called "below normal") and the lowest 10% of the climatological distribution. The probabilities are indicated on the right side of each line of text, and are given to the nearest tenth of one percent. The boundaries of the variable being evaluated (temperature or precipitation) that define these categories, are shown in parentheses in each text line. Note that the upper and lower 10%, while not completely in the extreme 7% of their tails of the distribution (see the warning above, which is also posted on each graph), are subject to greater uncertainty than the terciles. When the "CL" (climatological probability) is given on the traditional map of forecast probability anomaly, the probabilities shown in this block of text will show just that: 10.0%, 33.3%, 33.3%, 33.3%, and 10.0%. When a non-CL forecast is shown on the map, the departure from the climatological probability of the favored tercile in the present product should agree with that given on the map. For example, if there is a 10% probability anomaly for the above normal tercile on the map for the location in question, then the "prob of above normal" should be 43.3% (33.3% + 10%). Because a Gaussian (gamma for precipitation) fit is used in the present product, the borrowing rules that are assumed as rough approximations for the probability anomaly maps are not closely followed. The probability results shown here are considered to be more realistic than the "rules of thumb" (with borrowing) assumed for the maps. Thus, we do not always preserve 33.3% probability for the near normal tercile when a tilt toward above or below normal is indicated, and we do not assume a reduction in the probability of the least favored outer tercile that is equal to the increase in the most favored outer tercile. In some cases the middle tercile may have the highest probability anomaly here (usually by a small amount), even when a slight tilt toward one of the two outer terciles is indicated on the maps.

The lower block of text pertains to the observations of the most recent 10 years (for temperature) or 15 years (for precipitation). The "center" refers to the mean of these recent years when there is no skew in the data (e.g. for temperature), and refers to the fitted median, or representative center of the distribution, if there is skew (as found in most cases for precipitation). The "mean" is the average over the 10 (or 15) most recent years. The unfitted sample median, shown to the right is the center-ranked value out of the group of 10 (or 15) years. When there are 10 years, the median is the average of the 5th and 6th highest (also the 6th and 5th lowest) values; when there are 15 years, the median is simply the 8th highest (also 8th lowest) value. The median gives an idea of the middle value, without being affected by the extremeness of the higher and lower values. The "anomaly of the 10 (or 15) year center" is the center (given in the line above) minus the overall normal that is given in the upper left block of text, discussed above. It indicates the departure of the 10 (or 15) year center from the longer (and older) normal, and in some cases may indicate a systematic trend. Near the bottom of the graph, below the "0%" line on the vertical axis, the observations of the last 10 (or 15) years are shown by the last two digits of the year. These digits are positioned horizontally so that they indicate the temperature or precipitation levels of that year, as printed on the horizontal axis just beneath them. An asterisk on the "0%" line shows the central value of these observations (mean for temperature, median for precipitation).

How These Graphs Should NOT be Used

The graphs are to be used at the users' own risk. The probabilities of exceedance are expressions of uncertainty inherent in the climatological distribution and in the final forecast which is conditional on the current and expected climate state. The informed user is aware that in any individual case, the implications of the forecasts may be misleading, as for example when the direction of the shift from the normal turns out to be incorrect. The value of the forecasts is likely to become visible with repeated use, in which case the frequency of successes will exceed the frequency of failure by an amount that is roughly conveyed by the confidence estimates given with the graphs. This value may or may not show up clearly in individual cases or a small set of cases. To help show how this product should be used, the following are EXAMPLES OF IMPROPER USE OF THIS PRODUCT:

Treating the "point forecast" as a literal or exact forecast, as in a forecast of tomorrow's maximum temperature. The point forecast is only the center of a wide range of possibilities.

Using the forecast categorically, without any hedging. The amount of hedging, or weighing and acting on the possibility that the forecast will be incorrect, should be carried out using the probability differences among the alternative user-defined outcomes, in conjunction with the costs and the savings associated with each possible sequence of decision and climate outcome. In any individual case, the forecasts should be regarded as probability statements, not as absolute or "all-or-nothing" categorical statements.

Trusting probability anomalies at face value when they are completely embedded in either of the 7% tails of the distribution. For example: "This year, the chance of a 1-in-100 year drought is 5 times the normal chance." Probability anomalies in the tails should be regarded as rough estimates at best.

Regarding the climate over the last 10 (or 15) years as an indication of the recent trend. In some cases, the departure from normal over the recent years may be due entirely to chance. Some idea of the confidence associated with a recent shift may be obtained from the text accompanying the multi-season climate outlook forecast maps.

Assuming that exactitude in the probabilities, the tercile boundaries, the upper and lower 10% boundaries, or the point forecast or the recent trend implies precision in the forecast itself, or in our knowledge of the "normal" or the recent trend. Precision in any of the quantities presented are only our most precise guesses. The exact probabilities attempt to convey precisely how uncertain we are about the forecasts. THIS PRECISION SHOULD NOT BE INTERPRETED AS IMPLYING FORECAST ACCURACY! They are two entirely different kinds of accuracy.

A Word about the Fitting of Precipitation Distributions

Precipitation is fitted here using a Gaussian distribution following a power transformation of the original precipitation data. The original data are raised to a power that is determined by the degree of skewness. The skewness of the raw precipitation data is indicated for the precipitation graphs. When skewness exceeds about 0.6, its effects become marked, and the need for a transformation (or an asymmetric modeling scheme) become obvious. When there is a positive skew (featuring a long positive tail of the distribution, a shorter negative tail, and a mean that is higher than the median), the power is less than 1. The power is calibrated to be that which approximately eliminates the skewness, based on a large number of empirical simulations. Prior to applying the power transformation, the seasonal cycle of the skewness at each location is subjected to a harmonic analysis (using the first 3 harmonics) and the resulting smoother seasonal cycle of skewness is averaged with the raw skewness numbers to yield a lightly smoothed seasonal cycle. The power transformation follows, using powers determined from the empirical skewness-to-power correspondence. Once the data are power-transformed, Gausian statistics are applied as they are for the temperature data. Finally, the results are raised to the reciprocal of the power that was used earlier, to restore the original skewed frame of reference. The above technique is effective for skewed distributions that do not contain many zero amounts. The presence of more than about 5 to 8 percent of zero amounts causes a violation of the Gaussian assumption even after a power transformation, because it represents a "floor effect", or a bunching of observations at the lower end of the distribution. The 3-month total precipitation data processed here normally have no zero amounts, and when they do (in certain climate divisions in the Southwest U.S. in summer), the frequencies are very low.

Another option for the transformation of precipitation data for statistical processing is the gamma distribution. The gamma distribution can fit skewed distributions because it has one parameter that fits the general scale of the distribution (e.g. it distinguishes generally wet climates from dry ones), and another parameter that fits the shape, or skewness, of the distribution. In the case of the 3-parameter gamma developed recently (Richard Lehman, 1997, 13th Conference on Hydrology, AMS), there is also an allowance for a distribution whose range ends abruptly, as in the "floor effect" created by a significant number of zero amounts. Recent research at CPC has shown that the 3-parameter gamma has strong potential for utility in fitting precipitation distributions for shorter time periods, when the floor effect of zeros is a prominent feature. The conventional 2- parameter gamma distribution could also be used for that application but would extend a substantial portion of its probabilities to negative numbers when there are many zeros in the distribution. The 2-parameter gamma could also be used here for seasonal data, and this has been considered. Sensitivity tests comparing the flexible power transformation, used here presently, and the 2-parameter gamma, indicate that neither option has clear superiority over the other. The power fit is somewhat more sensitive to outliers than the conventional gamma; this has both positive and negative consequences. Determination of the forecast distribution can also be based on either fitting model, although it is somewhat more straightforward with the power-Gaussian fitting scheme. The climatological distribution, including the tercile class boundaries, are slightly different for the gamma fit than for the power-Gaussian fit. If the 2- or 3-parameter gamma becomes used as the official precipitation model fitting technique at CPC, then that gamma will also be used for the "probability of exceedance" graphs displayed here, for the sake of consistency among CPC's products. This consistency is important for precision in the verification process--particularly for categorical (i.e. tercile) verification. For the precipitation forecasts themselves, in physical units, this issue will usually have a minimal or nonexistent impact. That is why the displays provided in this web site are not being delayed pending resolution of the precipitation fitting issue.


NOAA/ National Weather Service National Centers for Environmental Prediction Climate Prediction Center 5200 Auth Road Camp Springs, Maryland 20746 Climate Prediction Center Web Team Page last modified: December 12, 2002	Disclaimer	Privacy Notice