# How Hausfather et al (2019) mis-estimate the climate sensitivity of the IPCC’s First Assessment Report

**Background**

Climate sensitivity is the increase in atmospheric temperatures caused by an increase in the atmospheric amount or concentration of greenhouse gases. Since carbon dioxide (CO2) is the main forcing agent among man-made greenhouse gases, climate sensitivity has usually been expressed as a function of CO2. For a given increase in CO2, a more sensitive climate will respond with greater increases in air temperature. And, since the forcing or warming effect of CO2 is approximately the same each time its concentration doubles, for reasons of convention sensitivity usually refers to the warming effect caused by a doubling of atmospheric CO2.

If humanity’s only influence on the Earth’s climate was the increase in CO2 concentrations, calculating climate sensitivity would be easy — or at least easier than it is in practice. As long as you’re able to account for natural variations in the Earth’s climate, the only things you need to know are:

· How much CO2 concentration has increased. This has been a solved problem since record-keeping began at the Mauna Loa Observatory, in 1959.

· How much temperature has risen. There is much greater uncertainty around temperature change than around CO2 change, but still, if the timeframe is long enough (e.g. a century) different sources tend to agree.

Unfortunately, CO2 is not the only greenhouse gas created by mankind. Most notably, we have also increased concentrations of methane (CH4) and chlorofluorocarbons (CFCs), along with nitrous oxide (N2O), ozone (O3), and others. How do we account for those in a calculation of climate sensitivity?

Please notice that the calculations below refer to *transient* climate sensitivity, more commonly called transient climate response (TCR). This metric refers to the forcing that happens at the same time as forcing from greenhouse gases increases. Equilibrium climate sensitivity (ECS) is higher, because even when forcing is stable you can still expect warming as the ocean releases accumulated heat. But this article will deal exclusively with TCR. So in this article, if I mention only ‘sensitivity’, I’m talking about TCR.

The standard practice in climate science to estimate TCR has been to calculate the effect of each of the different man-made greenhouse gases (and other agents such as aerosols) on the Earth’s radiative budget, and then add them up. For example, let’s say the estimated radiative forcing caused by a doubling of CO2 concentrations is 4 watts per square meter (W/m2). Imagine that we had seen over the historical period a forcing of 2 W/m2 from CO2, along with 1 W/m2 from methane. If so, we’d say that total forcing (3 W/m2) is equivalent to 75% of the forcing that comes from a doubling of CO2. Now, suppose that these 3 W/m2 had led to a warming of 1ºC. If that were the case, we’d calculate climate sensitivity as: (4 / 3) * 1ºC = 1.33ºC.

However, our estimates of the forcing impact of CO2 and other greenhouse gases are always in flux. Following the prior example, if it turned out that forcing from CO2 was 5W/m2 per doubling, then forcing from CO2 over the historical period would be not 2 W/m2, but 2.5W/m2. Methane forcing would be unchanged at 1W/m2, but including it would bring the total to 3.5W/m2, rather than the 3W/m2 we calculated before. And the calculation for climate sensitivity would be: (5 / 3.5) * 1ºC = 1.43ºC.

To offer another example, let’s say our estimate of forcing from CO2 was spot-on initially: 4W/m2 when concentrations double, 2 W/m2 over the historical period. But imagine that, despite getting CO2 forcing right, we had under-estimated methane. If methane forcing had been 1.5W/m2 rather than 1W/m2, then total forcing would have been 3.5W/m2, and climate sensitivity would be: (4 / 3.5) * 1ºC = 1.14ºC.

These numbers are all made-up, but the revisions to forcing estimates over the decades have been of similar magnitude. And as you see, even if you know temperature change with complete accuracy, the uncertainty in forcing means the calculation of climate sensitivity is likewise uncertain.

The point I want to make is that the value of forcing that arises from a doubling of CO2, denoted F_2x in the scientific literature, is unknown. We have an estimate today, but we had different estimates decades ago and indeed a paper published this year may offer a slightly different number than one published five years ago. And, in order to calculate climate sensitivity, you have to be sure about which value of F_2x you’re using.

Climate sensitivity is not the amount of warming we get with each watt-per-square-meter, but rather the amount of warming caused by the addition of CO2. Exactly how much radiative forcing is caused by that addition of CO2 is a matter of scientific interest, but it doesn’t represent climate sensitivity. So let’s see how to calculate sensitivity — and how not to.

**What is the value of F_2x?**

To answer that question it’s better to define first a word: logarithmic.

If you read about climate change online you may come across statements like ‘the effect of CO2 is logarithmic, not linear’. To understand that, think about a natural logarithm, or ln. You don’t need to be mathematical genius — indeed you don’t even need a calculator. What you need to know is that the value of ln (x) increases by the same amount each time x doubles:

ln (2) = 0.693

ln (4) = 1.386, which is 0.693 more than ln (2)

ln (8) = 2.079, which is 0.693 more than ln (4)

And so on.

The forcing that arises from a doubling of atmospheric CO2, which is to say F_2x, is normally defined as ln (2) multiplied by some number. In recent decades the ‘consensus’ number has been 5.35. In a formula, this means:

F_2x = 5.35 * ln (2) = 3.71W/m2.

So if you read that the forcing caused by a doubling of CO2 is 3.7W/m2, that’s how the number is calculated. But the estimated multiplier has changed along with our understanding of the Earth and the atmosphere.

Now, what’s the point of ln (x)? Isn’t it better to say simply that F_2x = 3.7W/m2? Well, the forcing from greenhouse gases has not yet reached the value equivalent to F_2x, so when discussing historical climate change you’ll virtually always deal with fractions of a doubling. And that’s where ln (x) helps.

For instance, let’s say CO2 concentrations increased from 300 parts per million (ppm) to 360 ppm over the period you’re studying. A naïve calculation would find that the forcing caused by a CO2 increase of 60 ppm is equivalent to 20% of the forcing caused by a doubling of CO2, because 60 / 300 = 0.2. However, if you google ln (1.2), you get a value of 0.182. If you then divide this by ln (2), you’ll find that the forcing involved in such a CO2 increase is 26.3% of F_2x, not 20%.

So the difference between logarithmic and linear matters. Increases of CO2 concentrations that don’t reach a doubling mean that forcing is greater than you’d estimate if you made a linear extrapolation. Perhaps more importantly, expressing forcing in terms of ln (x) and fractions of F_2x avoids the confusion caused by W/m2. Keep reading to find what problems W/m2 causes.

**What is the transient sensitivity of old climate models?**

This is a modern question. Until the 1980s, climate models did not even distinguish between transient and equilibrium sensitivities: they estimated a temperature increase for a doubling of CO2, and in the simulations atmospheric warming instantly followed forcing. Therefore, for old climate models TCR and ECS were the same.

Even when climate models started to simulate the ocean, and the delayed atmospheric warming caused by oceanic heat uptake, the papers and reports describing those simulations did not specify or calculate the models’ TCR. In the case of the IPCC, its first report in 1990 already gave both a ‘best’ or most likely value for ECS, and a ‘likely’ range for that metric (i.e. a confidence interval). But the IPCC did not offer a confidence interval for TCR until the Third Assessment Report, in 2001.

The internet and the scientific literature are full of articles and papers comparing the temperature projections made by climate models in the past with the evolution of temperatures in the observational records. However, there is a growing consensus that such a simple comparison is inadequate, for the climate modellers of the past couldn’t have known the exact amounts of CO2, methane, and so on that people would emit into the atmosphere. This mis-estimation of emissions, combined often with a mis-estimation of the fraction of emissions that would remain the in atmosphere, meant that the actual concentrations of greenhouse gases have usually diverged from those of model projections. That’s why recent comparisons of models and observations have tried to account for possible differences in the greenhouse forcing assumed by climate models of the past and the greenhouse forcing that actually happened.

A notable paper in this regard is Hausfather et al (2019), which I will refer to as H19. This paper looks at a series of past climate models, or more accurately past climate projections. The paper estimates how much warming and forcing these old projections expected, and thus calculates an ‘implied TCR’ for the models. As an illustration: if a model projected a forcing of 2W/m2, then Hausfather divides this by an assumed F_2x of 3.71W/m2. This would mean that, if the temperature change projected by the model were 1ºC, the model’s implied TCR would be: (3.71 / 2) * 1ºC = 1.85ºC.

Here’s the bottom panel of Hausfather’s Figure 2; the ‘C’ is meant to represent degrees centigrade, i.e. ºC. The red circles are the authors’ estimates of the models’ TCR, while the blue circles represent their estimate of real-world TCR over equivalent periods:

(Side note: this is not a serious way to estimate real-world TCR. You shouldn’t calculate one TCR for 1970–2000, another for 1990–2017, etc. because natural variability will affect the results far more than if you used a longer period. Besides, using short periods widens the confidence interval — the estimation is less precise. In fact, just a few months before publishing H19, Hausfather himself co-authored a paper that estimated a single TCR for the entire historical record, with a result both lower and more constrained than the intervals reported by H19. Nevertheless, the primary contribution of H19 is its estimation of *model* TCR, and that’s what I’ll focus on).

There is a lot to digest in the paper. First, a criticism: the calculations for the oldest projections, which involve only changes in CO2, are unnecessary. There is no need to convert from changes in CO2 concentrations to forcing expressed in W/m2. Instead, everything could (and as we’ll see should) have been expressed in terms of CO2 doublings. So, if a model projected an increase in CO2 concentrations of let’s say 50%, H19 could have used that percentage (along with the model’s temperature projection) to calculate model TCR without involving W/m2 at all.

Another issue with the oldest projections is that several of them are best described not as different models, but rather different scenarios using the same underlying model, which is that of Manabe & Wetherald (1967). Since Manabe & Wetherald’s estimate of TCR was 2.36ºC, it’s not surprising that projections based on their model cluster around that value, with small differences partly caused by rounding issues. In the case of Sawyer 72, the lower ‘implied TCR’ of the forecast is not the result of any physical insight, but rather a confusion on the part of the author, who seems to have assumed a linear effect of CO2, rather than a logarithmic effect, and so predicted the wrong temperature increase in a case in which CO2 increased without reaching the doubling mark. To be clear: the warming that Sawyer 72 expects from a doubling of CO2 is not the 1.9ºC stated in H19’s Figure 2, but rather 2.4ºC. And this number isn’t original: Sawyer cites Manabe’s 1970 paper, which in turn is based on Manabe & Wetherald 1967.

I suppose that ‘seventeen models’ sounds better than ‘fourteen models’, but it’s hard to see the scientific value of Hausfather’s calculations in this regard.

For later models H19 offers quite a bit of insight. In particular, I believe its analysis of Hansen’s projections, both 1981 and 1988, is the most thorough yet. I’m not totally certain the paper’s calculations for Hansen’s models are correct, but they’re definitely an improvement on previous articles, which usually criticized Hansen without even considering the difference in forcing between Hansen’s projections and the real world.

I’ll close this section by emphasizing that H19 *calculates* forcings involved in climate projections of the 1970s and 80s. he problem is that for the projections of the IPCC, which started in 1990, H19 does something different.

**Hausfather et al take forcing numbers from the IPCC reports as given**

In its Supplementary Information, H19 describes how it obtained forcings from the IPCC’s First Assessment Report (FAR):

“External forcing values for the EMB were digitized from Figure 6 (also Figure A.6) using the business-as-usual scenario”

Indeed, I digitized Figure A.6 (which can be seen immediately below) and the values match what Hausfather reports. The forcing numbers from FAR were not ‘adjusted’ in any way by H19.

(The chart is sloped, but it still provides better estimates than Figure 6, as the latter has a thick line representing the Business-as-usual scenario. By the way, EBM standards for ‘energy balance model’. The ‘EMB’ of the quote is a typo).

In the case of the Second Assessment Report (SAR), I haven’t digitized the values, but H19 reports much the same:

“We digitized EBM values from Figure 19”

For the Third Assessment Report (TAR), the IPCC provided a table:

“Decadal values for both temperatures, total forcings, and CO2 used in the EBM featured in the TAR main text were obtained from Appendix I:https://www.ipcc.ch/site/assets/uploads/2018/03/TAR-APPENDICES.pdf

These decadal values were transformed into annual estimates via linear interpolation.”

So what is the problem?

H19 uses a value of 3.71 W/m2 as F_2x. More specifically, its Equation 7 describes the forcing that arises from a doubling of CO2 as 5.35 multiplied by ln (x), where x is the ratio between CO2 concentration at the end of period and at the beginning. So, if concentration doubles, x = 2, ln (x) = 0.693, and F_2x = 3.71 W/m2.

But this is not the value of F_2x used by the IPCC’s First and Second Assessment Reports. On Table 2.2, page 52, FAR gives the forcing that comes from a doubling of CO2 as: 6.3 * ln (CO2 final / CO2 initial). You can work out how many W/m2 that is, but it’s enough to notice that 6.3 / 5.35 = 1.177. Which means **F_2x is 17.7% higher in the IPCC’s report than is assumed by Hausfather**.

For SAR, the formula is at the bottom of page 320; it’s stated differently but the result is the same. And indeed the Third Assessment Report confirms F_2x for the previous reports on page 356: “IPCC (1990) and the SAR used a radiative forcing of 4.37 W/m2 for a doubling of CO2 calculated with a simplified expression.”

Let’s work through the math and verify how exactly H19 arrived at its results. From H19’s Figure 2, it seems the ‘implied TCR’ calculated for FAR’s Business-as-usual scenario (the only scenario that H19 analyzes) is 1.6ºC. H19 reports:

· A temperature trend of 0.261ºC per decade

· A forcing rate of 0.607 W/m2 per decade. (Rate, not trend — see more info on this in the Data section)

If you divide 3.71 W/m2 by 0.607 W/m2, you get a value just above 6.1. In other words: per Hausfather’s numbers, F_2x is 6.1 times greater than FAR’s decadal forcing. Therefore, to convert from FAR’s decadal temperature trend of 0.261ºC to TCR, you have to multiply 0.261ºC by 6.1. And if you do that you arrive at an implied TCR of **1.59ºC or 1.6ºC** — just as Figure 2 shows.

But, since FAR’s actual F_2x is 4.37 W/m2, the multiplier instead should be 4.37 / 0.607 = 7.19. So FAR’s actual ‘implied TCR’ is 7.19 * 0.261ºC = **1.88ºC**.

I haven’t looked in detail into SAR’s numbers, but in all likelihood the same upwards revision of nearly 18% should be made.

Additionally, H19 states that “FAR forcings increased 55% faster [than observations]”. Although this is true in raw W/m2, FAR’s forcing projections using that metric are so high compared to the observations in part because its F_2x is also high by modern standards. Correcting for this issue eliminates much of the ‘overestimate’: 1.55 / (6.3 / 5.35)) = 1.31. Put other way: when forcing is expressed as a share of F_2x, not in W/m2, FAR’s Business-as-usual scenario over-estimated forcing by 31% rather than 55%.

As for TAR, it used the same F_2x as H19, so the fact that the latter took the forcing numbers as-is did not bias the estimate of TCR.

**Why did the First Assessment Report over-estimate forcings from 1990 on?**

This is only tangential to H19, but the issue builds on my previous article on why FAR’s business-as-usual projection over-estimated forcing. In that article I mostly dealt with the accounting of CO2 emissions and concentrations in FAR; my discussion of specific forcing numbers was unfortunately based on H19 and so used raw W/m2 to compare FAR with observations. That part of the article should be viewed as only a very crude attempt to quantify FAR’s mistake.

Now, I readily admit the digitizing of FAR’s figures in H19 is better than anything I could do. So I take the forcing numbers that H19 states for FAR. For comparison with the real world, I use forcings from Lewis & Curry 2018 (LC18 from here on). It’s not that I consider one dataset more ‘correct’ than the other — it’s just that I’m more familiar with LC18’s numbers. And you need to know what you’re counting if your accounting is to make sense.

I choose 2016 as the end year, since that’s the last year for which LC18 offers numbers. To calculate the increase in forcing between 1990 and 2016, I simply subtract the former’s values from the latter’s. In the real world, using the difference between two single years can give misleading results if one or both of them have heavy volcanic activity, but this is not the case for 1990 and 2016. As for non-volcanic forcings, these tend to change very slowly and the overwhelming pattern is a monotonic increase, so this simple method does not create the problems it would create if you applied it to temperatures, for example. In any case, the value I get for FAR is almost the same as if you just multiply Hausfather’s yearly forcing rate by 26.

LC18 has a value of 3.8 W/m2 as F_2x. Taking that into account, the results are:

· For FAR, forcing increases by 1.556 W/m2, or by 0.356 doublings of CO2

· For LC18’s original forcing numbers, the increases are 1.060 W/m2 and 0.279 doublings of CO2

Additionally, LC18 recommend that two adjustments be made to their forcing numbers, to reflect the different efficacies of some forcing agents. These adjustments involve black carbon on snow (which should be multiplied by three) and volcanic forcing (which should be cut approximately in half). When re-running the numbers this way, I get a slightly smaller increase between 1990 and 2016: 1.021 W/m2 and 0.269 doublings of CO2.

It’s unclear to me whether the comparison should be done with LC18’s original or adjusted forcings. The results in either case are similar to those given by Hausfather’s data, suggesting that FAR over-estimated forcing since 1990 by about 30%. One could also say that FAR’s Business-as-usual scenario over-estimated forcing by 0.089 doublings of CO2 (if compared with LC18’s adjusted numbers) or by 0.077 doublings (if compared with LC18’s original figures).

But why did the error arise? Is it because FAR over-estimated future concentrations of Montreal Protocol gases? Of methane and the associated stratospheric water vapor? Of CO2? It’s known that FAR over-estimated future forcings from all these three agents, and indeed the sum of these over-estimates will be greater than the total over-estimate because FAR also *under*-estimated elsewhere. For example, FAR ignored the positive forcing that has taken place since 1990 due to reduced aerosol concentration and increased tropospheric ozone.

Now look at the main chart (Figure 2.4) that expresses the different forcings in FAR’s projection. It represents 10W/m2 on the vertical axis, which is to say 2.3 doublings of CO2. But the mistake we’re looking for in the chart is only about 0.10 doublings of CO2, so we’re hunting for a discrepancy that makes up 4 or 5% of the chart’s height, and that’s spread around several forcing agents. (Remember that there is a ‘negative mistake’ due to forcings that FAR omitted, hence the net or total mistake equivalent to 0.08 or 0.09 doublings of CO2).

A really skilled digitizer may find the exact forcing change for the different forcing agents between 1990 and 2016. He would then have to compare this with the numbers from LC18, which is itself a challenge because LC18 aggregates the forcing of methane, N2O and Montreal Protocol gases. I cannot do this while guaranteeing any accuracy, so I won’t.

Still, checking the digitizing of FAR’s Figure 5 from my previous article, I see that the Business-as-usual projection showed an increase in CO2 concentrations of 62.95 ppm between 1990 and 2016; this is 17.74% of the digitized baseline figure (354.88 ppm). For Mauna Loa, the equivalent numbers are 49.85 ppm and 14.07% (the baseline is marginally lower in Mauna Loa data than in my digitization). Given the the logarithmic effect of CO2, these increases in ppm should cause a forcing equivalent, respectively, to 23.56% and 18.99% of F_2x. So the difference in CO2 concentrations between FAR’s Scenario A and reality caused the former’s forcings to exceed the latter’s forcing by an amount equivalent to 0.04 or 0.05 doublings of CO2.

Remember the entire mistake in forcings that we were trying to explain was about 0.08 doublings of CO2. Thus, the overshoot in CO2 concentrations by itself accounts for more than half of FAR’s over-projection of forcing.

**Conclusions**

Hausfather et al is the first serious attempt to evaluate the projections of old climate models, looking both at the temperature changes and forcings featured by these models. Like any first attempt, it overlooked some things that are obvious in retrospect. I hope the authors issue a correction for the TCR values given for the IPCC’s First and Second Assessment Reports. And I hope they and other researchers keep working on the issue. The last word on old climate models has not been said yet.

More generally, any serious comparison done nowadays between climate models and reality should account for forcings. We don’t completely know the real world’s forcings, and in fact we don’t exactly know climate model forcings either. But the lack of absolute certainty about a factor does not mean you should ignore that factor altogether.

**Data**

This spreadsheet hows my calculations.

This Google Drive folder shows the calculations and data for my previous article on the IPCC’s First Assessment Report.

The IPCC’s First Assessment Report is here, the Second Assessment Report is here, and the Third Assessment Report is here.

The 2018 paper by Lewis & Curry is here. The forcing data is here, while the Reply to a Comment (2020) is here. The Reply’s online version includes the computer code used in the original paper to calculate forcings.

Hausfather et al publish all their code and data online. Here I list some specific links:

· Yearly temperature and forcing figures digitized from the IPCC’s First Assessment Report

· Forcing and temperature trends for old climate models, including FAR

· Forcing rates for old climate models, including FAR

· Real-world temperature trends

Attentive readers may have noticed that the second link features forcing and temperature ‘trends’ for climate models, whereas the third features forcing ‘rates’ for the same models.

The figures of Hausfather’s Figure 1 are labelled ‘rates’, and indeed they seem to correspond to the numbers given in the third link, not the second. The value under the ‘coef’ column in the third link, for the First Assessment Report, is 0.0607 (representing W/m2/year). This matches Figure 1, which shows a best estimate for FAR of just over 0.6W/m2/decade. The confidence interval likewise matches the numbers for columns coef_low and coef_high. And the central or best estimate also agrees with the 55% over-estimate in forcings in FAR, mentioned in the paper’s text.

That said, you get marginally different numbers if you simply divide the 1990–2017 forcing increase by 27: 0.0601 W/m2/year. This is about 1% less than the rate Hausfather reports.