Data collection and LCA‑databases

We develop comprehensive national and regional LCA-databases based on economical and environmental statistics. For individual customers we develop proprietary databases, often combining input-output data with specific process data in so-called hybrid databases.


Our data collection strategy

To reduce time consumption and costs of a life cycle study, data collection should be limited to the data, which are most important in order to answer the questions asked. It is for these key data that high quality (validation, level of detail, completeness, and representativeness) is required, while the remaining part of the study may apply default data or may even be disregarded altogether. Thus, the identification of key data may be used to guide data collection in a specific study.

The identification of key data is based on the following characteristics:

  • Key data are relatively large and have a large variation.
  • Key data explain causes of variation or other causal relations.

Key data are not just those data that are large, but those data for which there is also a large improvement potential. A large improvement potential may be revealed by a large variation in the data (although variation may also be unavoidable).

Knowledge on the determining factors for the variation may be just as important as the data themselves, since this may allow modelling of the data to the specific situation. For example, if it is known that the energy consumption in a dairy is largely determined by the size of the dairy and the degree of process integration (incl. recovery of heat), the key energy consumption data for dairies must be expressed not only as energy per kg processed milk, but also related to dairy size and degree of process integration.

Such causal relations may be particularly important when they determine what processes should be included in a study, including influence on other systems. For example, if milk is identified as a major determining factor for the number of consumer shopping trips, a change in the distribution (e.g. provision of home deliveries of milk) might lead to a reduction of the number of shopping trips, which could be of larger environmental importance than the entire remaining life cycle of the milk itself.

When a specific desired dataset is not available, you may decide to use – with or without adjustments – another dataset as a default, e.g. older data than desired, data from a different geographical location, or from a slightly different technology. Data from statistical Input-Output databases may be applied for less important parts of the product systems. The additional uncertainty introduced into the study by the use of such data must be assessed, and when more than one default data set is available (e.g. one old data set from the right region, and a more recent dataset from an adjacent region), the best of these datasets is the one that (with possible adjustments) minimises the additional uncertainty.

Data management

Having applied a lot of effort in obtaining adequate data, you will want these data to be available also for future use and documentation. Therefore, adequate procedures are needed for documenting the data collection. The “data” concept covers both qualitative and quantitative information as well as “meta-data”, i.e. data about the data, e.g. information on how the data were obtained and on their validity and limitations.

Efficient data documentation, storage, and retrieval for later use, require an electronic data format, and preferably a standardised one that allows export and import between different software and databases. For this purpose we developed the SPOLD format for LCI data, in co-operation with a large number of LCI data suppliers. A modified version of this format (EcoSPOLD) was later adopted by the leading database supplier Ecoinvent.

A guideline for LCA data collection systems was developed as part of the CASCADE project. This now serves as a key input to managers of national and industry database initiatives.