PEF weighed and found wanting

Our guest-blogger today is Maartje Sevenster, Sevenster Environmental, who has followed and analysed the process leading to the recently published weighting method for the EU Product Environmental Footprint (PEF). Here she shares her serious reservations on the process and the results.

A weighting set for the EU Product Environmental Footprint (PEF) was published last month. The weighting factors have been developed by the Joint Research Centre via an elaborate approach that has attempted to separate value-based weighting into objective factors. Nevertheless, the result is a poor, semi-qualitative approximation that mixes characterization, distance-to-target weighting, panel weighting, and uncertainty. All in all, the approach comes across as a black box of flawed mathematical operations.

A PEF consists of a characterization result for each of 16 impact categories, a corresponding set of weighted normalised results, and one single-score result. The use of this weighting method and the resulting single-score will be a requirement in all PEF studies and is meant to facilitate interpretation.

The final weighting set is an average of three independently derived sets, with the average multiplied by a robustness factor. Two of three independent sets are derived via traditional panel weighting and the third is based on a hybrid ‘evidence-based’ approach.

The term evidence-based has a feel of objectivity about it, but in its first part the approach applies weighting to issues that could better be investigated by natural science, and in the second part – that necessarily must be based on subjective preferences – the chosen approach violates basic requirements for good valuation practice.

In the first part of the JRC approach, an expert panel was asked to score the below seven characteristics of impacts on a scale from 1 to 100:

  • Spread of impact
  • Time span of generated impact
  • Reversibility of impact
  • Level of impact compared to planetary boundary
  • Severity of effect on human health
  • Severity of effect on ecosystem quality
  • Severity of effect on resources availability

It is immediately obvious that many of these factors, such as time span, are already covered by the commonly used LCA natural science based characterization models, even if they are not always made explicit in the end results. In fact, Annex 13 of the JRC weighting report includes a similar criticism by Mark Goedkoop: “using a panel to link mid to endpoint is really weird. This means we replace science by the verdict of panellist. I am quite aware about some of the uncertainties in the mid to end point factors, but I always thought we prefer science over the laymen’s view. Uncertain science is always better than no science at all.” Another example: the use of GWP100 for characterizing global climate change impacts implies a natural science based assessment of the timing of the impacts for comparison of greenhouse gas emissions. Is it then valid to subsequently allow an expert panel to assign a zero weight to time span as a weighting factor, which was theoretically possible for the experts in this approach?

Only two of the seven factors are not part of LCA characterization models, namely reversibility and level compared to planetary boundary. Reversibility is also the only factor that is intrinsically categorical and therefore an excellent illustration of the artificiality of the approach. Is it valid to say that an irreversible impact is (only) a 100 times worse than an impact that can be reversed by natural processes within one year? It would certainly be useful to dicuss these factors prior to determining a multi-criteria type panel weighting per impact category.

Factors such as reversibility and time span may well play a role in expert judgements of the severity of a certain impact as compared to others. However, the JRC approach first introduces a categorical scaling for each of those factors turning them into artificial ordinal variables. For instance, for time span the following categories are used:

  • Momentary [less than 1 month] = score of 1
  • Very short term [more than 1 month and less than 1 year] = score of 20
  • Short term [1-3 years] = score of 40
  • Medium term [4-30 years] = score of 60
  • Long term [31 – 100 years] = score of 80
  • Very long term [more than 100 years] = score of 100

Even though we know precisely that some impacts are instantaneous and others may be spread out over hundreds of thousands of years, such as those of radioactive radiation, the difference is here reduced to an arbitrary factor of 100. The bottom line is that most of the seven factors can be evaluated by natural science, albeit with considerable uncertainty, and do not need expert weighting. The scaling wipes out all scientific evidence and along with it any understanding of what a resulting indicator might really mean.

The second part of the JRC expert weighting procedure is a more traditional expert panel judgement of the relative importance of the seven factors. This leads us to another troubling aspect, which is that the seven factors are not completely independent as is required for proper evaluation of (compensatory) weights. Especially the “level compared to planetary boundary” overlaps with all other factors to at least some extent. Moreover, averaging categorical variables is mathematically meaningless, even when the categories appear to be “numerical”.

Finally, this weighting set from the expert panel is averaged with two other weighting sets derived via a different approach. This seriously undermines the transparency of the weighting, which should at all times be straightforward to interpret, not just a set of numbers to arrive at a single score. This is further aggravated by the use of a robustness factor to assess what is in essence uncertainty. Again, this factor involves three arbitrarily scaled ordinal variables that are averaged. The report shows some inconsistencies regarding the final choice for this robustness factor, which is apparently not considered very robust by JRC, since toxicity impacts have been excluded from the benchmark calculations in spite of already having a very low weighting due to their low estimated robustness. The semi-numerical approach gives a false sense of objectivity to this “uncertainty assessment”.

To summarize, the final weighting set is the result of so many mathematically questionable averaging, scaling, and multiplication steps that it is hard to take serious. To allow for proper interpretation of results, weighting sets should be based on clear and transparent principles. It is preferable to use a single-step conversion with a fairly limited but unambiguous perspective, such as weighting based on damage costs.

Previous blog-posts on PEF:

The clock is ticking for PEF
Harnessing the End‑of‑Life Formula