Treatment of Uncertainties for National Estimates of Greenhouse Gas Emissions

Appendix A:
Methods of Assessing Uncertainties

A1 Review of Methods of Assessing Uncertainties
A2 Uncertainty Evaluation: Other Specific National Approaches
A3 Conclusions of Review of Approaches
A4 References

A1.   Review of Methods of Assessing Uncertainties

A focused review has been carried out of methods by which sources of uncertainties in estimates of emissions and sinks of greenhouse gases may be identified and quantified. To be as comprehensive as possible, a wide range of literature, some pertaining to broader environmental topics [34-36] was included in this review.

In general, uncertainty in emissions estimates is contributed to by two main types of error: bias and imprecision, and this must always be borne in mind when analytical methods are appraised. A bias is a consistent difference between a measurement or assessed value and its "true" value (which may be inaccessible) that is not due to a random chance, whereas an imprecision in a parameter value is the difference due to random error or fluctuations between a measurement or assessed value and its "true" value.

In an emissions inventory, bias can result from an emissions estimation process in which a systematic error occurs because some aspect of the emissions inventory process is misrepresented or is not taken into account. For example, were the emission factor for a given source category to be developed from a non-representative sample of source types, the emission factor would produce a biased estimate of the emission. Bias may also result due to the inability of obtaining a comprehensive set of measurements for all conditions. Common examples of bias arise when surrogates are used (e.g. where sales are used to represent consumption).

In contrast to bias, imprecision in a parameter is the difference due to random error or fluctuations between a measurement and its true value. Multiple measurements of the parameter will differ, but if the measurements are unbiased, they will cluster about the "true" value of the parameter. This imprecision is caused by sampling error and human error, as well as by the natural fluctuations in the process being measured. Emissions data variability results from a number of causes including temporal or spatial fluctuations in data used to estimate emissions.

In both of the above cases, it is clear that uncertainty is closely related to that of the quality of the data used to compose the emissions database.

Sources of uncertainty in emissions data can usefully be resolved into three general classes [37]: variability, parameter uncertainty, and model or conceptual uncertainty. In the case of variability more research would simply lead to better estimates of the underlying distribution, but would not reduce its width. In the case of parameter uncertainty, research would hopefully lead to a reduction in the width of the relevant distributions, whereas model or conceptual uncertainty is more problematic in that the issue concerns the inappropriate representation of the system, such as spatial, temporal and species allocation. Here a better understanding of the simplification implied by the modelling would help in the specification of the direction and magnitude of any biases.

Uncertainties in emission estimates can arise from a range of causes including: -

In general these may be summarised as due to deficiencies (which may be avoidable or not) related to the completeness, appropriateness and representativeness of the information used.

In this study a spectrum of techniques has been reviewed, ranging from the qualitative methods used to discuss known or suspected biases and imprecisions in the database, through to quantitative techniques that utilise distributions of possible parameter values and likely correlations between these, and translate these as a whole into the resulting uncertainty of the final emission estimate. In this review three main categories of approaches are used: qualitative, semi-quantitative and quantitative.

Throughout all of the methods reviewed a number of common threads were apparent:

As the information held within any emissions database will be of variable quality, and hence subject to different degrees of uncertainty, a further issue to be addressed relates to the way in which the different approaches to assessing uncertainty could be used in combination to produce some overall assessment or measure of uncertainty.


These are in general rather poorly defined, but in essence they all involve the listing and subsequent discussion of the many sources of uncertainty. Each data item is discussed in terms of the direction of any bias (i.e. whether they are judged to be over or under-estimates), together with the relative magnitude of the specific source of uncertainty. The latter might be limited to terms such as 'low', 'unlikely to be significant' or 'considerable', or may be more specific, such as 'factor of two' or 'order of magnitude'. In some cases, statistics would be available for the variables used to develop the inventory itself, such as standard deviations and confidence limits, and in these cases it should be possible reproduce these. If such information exists, this general approach would proceed by discussing in a general way their contribution to the accuracy of the estimates themselves.

The output from qualitative methods of uncertainty assessment would usually be presented in narrative form. However, tables provide a more systematic and concise method of summarising the conclusions. Table A1, based on a recent report of QA/QC procedures used during the development of an inventory from offshore oil production facilities [38] serves to illustrate the form of output from qualitative surveys of uncertainty. Note that a comprehensive interrogation of the various primary sources of information was undertaken. In the current study, secondary sources of information are necessarily relied upon much more heavily. Many of the key sources of uncertainty are generally applicable to any inventory; it would be important to list and discuss issues that are particularly relevant. It should be noted that an additional column describing the direction of any biases or the relative magnitude of any precision would provide additional valuable information to the overall assessment.

In summary, qualitative methods such as that outlined have the advantages of being applicable over the entire inventory, and are possible to undertake using fairly modest resources. They have the disadvantages of being unable to produce an overall, meaningful estimate of uncertainty from the range of sources, and of being subject to difficulties with aspects of the terminology used (e.g. how to distinguish between the widely used descriptors 'considerable' and 'substantial'). The more general difficulty of being used in combination with more quantitative techniques is discussed in Section A3.


Following on from the purely qualitative methods, in the context of emissions inventories it is often possible to rank the many individual contributors to parameter uncertainty with respect to each other (i.e. an ordinal presentation), and in some cases also, to produce more meaningful information (though falling short of being fully quantitative) on the 'absolute' weights to be assigned to individual contributions to overall parameter uncertainty. Such approaches are classified broadly as semi-quantitative methods.

Semi-quantitative techniques are commonly used in the role of verification [39], that is, to establish the reliability of the inventory for the intended applications. The procedures are applied to establish confidence that the data are sufficient in terms of coverage, completeness and reliability to guide decision makers to effective policy options.

Semi-quantitative methods focus on the quality of the input information, allowing future resources to be concentrated on those areas of the inventory that currently hold the least satisfactory information or lowest quality data. Many of these methods propose a coherent method for combining the quality in emission factors and activity rates into that of their products.

Semi-quantitative ranking methods are relatively easy to implement, and can be used where detailed data on emissions are unavailable. A significant drawback of their use is that it may be difficult to eliminate logical inconsistencies in the rankings themselves (e.g. A > B, B > C, C > A) as subjective criteria are applied by different people at different times.

a. The AP-42 scheme

For some time the US Environmental Protection Agency (USEPA) has used a rating system for its preferred emission factor listings included in its AP-42 document [40]. This technique uses a letter rating system of A through E to represent the confidence in emission factors from best to worst. A factor's rating is a general indication of the reliability, or robustness, of that factor. In this system, factors qualified with a rating of 'A' are based on several measurements or a large number of sources, or on more widely accepted test procedures, whereas those based on a single observation of questionable quality, or one extrapolated from another factor for a similar process, or on engineering or expert judgement, are qualified with the lowest rating of 'E'. The USEPA has recently expanded this approach to include a letter based rating of the emissions estimate as well as for the emission factor. While there are some guidelines for the assignment of the letter score, this approach is largely subjective.

As the ratings do not consider the inherent scatter among the data used to calculate factors, they do not imply statistical error bounds or confidence intervals about each. At best, a rating should be considered an indicator of the accuracy and precision of a given factor. The indicator is largely a reflection of any estimates derived with these factors.

Two steps are involved in factor rating determination. The first step is an appraisal of data quality or the reliability of the basic emission data that will be used to develop the factor. The second step is an appraisal of the ability of the factor to stand as a national annual average emission factor for that source activity. The AP-42 rating system for the quality of the test data consists of four categories, and is presented in Table A2, whereas the emission factor quality ratings are described in Table A3.

The AP-42 emission factor scores are of some value as indicators of the quality of emissions estimates. At best, they rate the quality of the original data as applied to estimates for that original point source. However, when applied to the other sources or to groups of sources (i.e. the area sources) the AP-42 factor score is less meaningful because it does not consider how similar the original source and the modelled source(s) are, and unless qualified by corresponding factor rating for the activity, does not address the quality of the overall emission estimate.

b. Systems derived from the AP-42 scheme

In the review of inventory quality rating systems recently compiled for USEPA [41], several systems similar to the AP-42 system are described.

  1. A method used in the UK for assessing the overall quality of emissions estimates is based on letter ratings [42]. These are assigned to both emission and activity data. The emission factor criteria for the letter scores (Table A4) are similar to those applied in the USEPA approach [40], and scores for the activity data are based largely on the origin of the data. As with the USEPA system, it should be noted that allocation of ratings to estimates may have a significant subjective component. Similar ratings are then assigned to the activity or production data using the following general guidance [42,43]:
    • Published data either by a government agency or through an industry trade association would be assigned a rating of C
    • Activity measured accurately and with high precision would receive a rating of A or B
    • developed by extrapolation of some measured activity or a nearby country would be assigned a rating of D or E

    The overall quality rating would be determined by a combination following the schedule given in Table A5.

  2. The IPCC has included a rating system in its guidelines for reporting of greenhouse gas inventories through international conventions [44]. This scheme uses a different approach. For each pollutant associated with major source categories, a code is specified to indicate the coverage of the data included in the estimate, together with a judgement on the supporting documentation (Table A6). The codes indicate if the estimate includes full coverage of all sources or partial coverage due to incomplete data or other causes. Additional codes can be specified to indicate if the estimate was not performed, included in some other category, not occurring or not applicable. An additional rating is then applied to each pollutant for each source category to indicate the quality assessment of the estimate as high, medium or low quality. Two additional ratings are requested that apply to the source categories without reference to specific pollutants.

    These ratings cover the quality of the documentation supporting the estimates, rated as either high, medium or low; and a rating to indicate the level of aggregation represents in the estimate. The possible choices are 1 for total emissions estimated, 2 for sectoral split, and 3 for a sub-sectoral split. The rating scheme has more detail but retains a simplicity that allows the analyst to review the quality ratings quickly and to compare the quality ratings with other estimates.

  3. A method developed in the Netherlands [45] recognises the difficulties in agreeing widely accepted definitions of data quality. In this approach, two specific issues are addressed concurrently in the rating scheme. The first is an assessment of the accuracy or uncertainty in the emission estimate, and the second is an assessment of whether decision makers have confidence in the application of the estimates for regulatory and policy activities.

    Two scaling indicators are applied to represent these two concerns. The first is a letter grade from A to E, where A implies the highest quality and accuracy, and E would imply that the estimate is an educated guess. The second rating scale applies a letter code to indicate the purpose for which the estimate was prepared. A rating of N would imply that the emission factor is based on national averaged numbers and therefore aimed at estimating the national total emissions. Such a factor would only be applied to any specific plant (or perhaps a specific region of the country) with caution. At the other extreme, a factor of P would indicate that the estimate was based on plant level data, and should not be used to give high confidence emissions on a regional or national level.

c. The Data Attribute Rating System (DARS)

Compared with the methods outlined above, this rating system [46], which is still undergoing development, permits a greater degree of quantitativeness to be recorded for the quality of parameter values within the emissions inventory. As for some of the previously reviewed methods, the system disaggregates emission estimates into emission factors and activity data, but then DARS allows for the attributes of these to be examined and assigned a numerical score for ranking purposes. These scores are determined against a set of numerical criteria to represent the reliability of each attribute estimate. In this way, each score is based on what is known about the emission factor and activity rate parameters. The resulting emission factor and activity data scores are then combined to arrive at an overall confidence rating for the inventory estimate.

The theoretical basis of DARS is described by Beck et al. [46]. Scores are assigned to four data attributes: measurement/method, source specificity, spatial congruity and temporal congruity. A key feature of DARS is that these attributes are independent of the other scores. However, the emission factor and activity rate scores for a given attribute are not necessarily independent. This is because the choice of one is usually limited by the selection of the other. The method to be used for filling in a scoring box is given in Table A7. The composite scores for emission factors and activity rates, and emissions are computed by averaging the scores in a column.

The individual scores themselves are produced by following attribute-specific flow charts. Low data quality would indicate a low value (1 to 3), whereas more information or greater confidence would merit a higher score (up to 10). These values are then divided by 10 and entered into the table as shown. In the absence of sufficient information on the derivation of factors, activity or emissions, the score entered is the highest that can be confidently justified. If the source or derivation of data is totally undocumented, the highest possible score is 1. Specific guidance on how to assign scores to the individual attributes is given in Appendix F of Reference 47. The composite scores themselves (which will take values between 0 and 1), although intended to be as auditable as possible, inevitably involve qualitative and rather subjective assessments. Hence it is essential that sufficient care is taken in the assignment of values to preserve the validity of the final scores.

The DARS approach, when applied systematically by inventory analysis, can be used to provide a measure of the merits of one emission estimate relative to another. The proposed inventory data rating system cannot guarantee that an emission inventory with a higher overall rating is of better quality, or more accurate, or closer to the true value. The inventory with the higher overall rating is, however, likely to be a better estimate, given the techniques and methodologies employed in its development.

A software implementation of DARS has recently been produced greatly enhancing its applicability as a practical tool in assessing emission database quality.