## HadCRUT3: Cool or Uncool?

#### Posted on 28 March 2012 by Kevin C

The UK Meteorological office have for many years published estimates of the global mean surface temperature record from 1850. Over the last decade it has been noted that this record has shown little or no warming. The Skeptical Science trend calculator shows that the difference between the HadCRUT3v trend and the IPCC forecast over the past 15 years is statistically significant at the 95% level. What is going on?

Foster and Rahmstorf (2011) have shown that two natural cycles - the El Nino Southern Oscillation (ENSO), and the solar cycle - have contributed temporarily to this apparent slowdown in global warming. But the slowdown is *much* more obvious in the HadCRUT3v data. Why?

The clues lie in one basic statistical principle, and two features of the data.** **

## The statistical principle: Sampling a stratified population

Suppose you want to determine some statistic on a large dataset, say the average height of the children of a given age. You could simply measure everyone. But that would be impractical. So normally you would measure the heights of a representative sample group. If the group is large enough, the average height of the sample group will give a good estimate of the average height of age group as a whole.

Or will it? Suppose three quarters of your sample group are girls. Girls make up approximately half of the population as a whole. But girls in the chosen age group are on average shorter than boys. If girls make up three quarters of your sample group, then the average height of the sample group (the 'sample mean') will be lower than the average for the population as a whole (the true 'population mean'). The sample group is not representative of the population, and as a result produces a biased estimate.

The problem is that the population is stratified - it is divided into groups with different statistics. A representative sample from this population must be both 'big enough', and contain appropriate proportions of the different strata - in this case girls and boys.

Now consider a more complex case. Samples are to be taken a year apart to determine the rate at which the children are growing taller. The first sample consists of 50% boys and 50% girls. The second sample, a year later, has about 25% boys and 75% girls. The first sample is unbiased, the second is biased low. The resulting trend may erroneously suggest that the children are growing shorter!

Note that there are two problems in estimating the trend: Firstly we are undersampling the faster growing strata, and secondly the proportion of the data coming from the taller group is declining. Both add a downward bias to the estimated trend.

## Two pieces of data concerning HadCRUT3

### Land/ocean temperatures

Land surface temperatures have been increasing more quickly than sea surface temperatures, as would be expected given the higher heat capacity of water. The following figure shows the area-average temperature anomalies from CRUTEM3 and HadSST2:

(Alternatively, look at this figure from GISTEMP.)

### Land/ocean coverage

Land coverage in the HadCRUT3v record has been declining over the past 50 years. The following figure shows the proportion of the HadCRUT3v global sample drawn from land measurements. The actual proportion of the Earth's surface covered by land is about 29%.

(The spikes during the world wars are due to poor SST - sea surface temperature - coverage during these periods. You can get a reasonable estimate of this graph based on the coverage values given on alternative lines of the CRUTEM3 and HadSST2 data files, however the values themselves are slightly peculiar: The land and ocean coverage exceed the fractions of the surface covered by land and ocean, and in some cases add up to more than 100%. This is due to the coarse 5 degree grid, and the fact that coastal cells are treated as both 100% land and 100% ocean. The figure above is a more accurate estimate based on the gridded datasets and a high-resolution land mask.)

## Putting it together

The proportion of land readings in the HadCRUT3v sample has been dropping since the 1960s, and has dropped from ~25% to less than 23% since 1995. Over the same period the land temperature anomalies have been increasing faster than the sea surface temperature anomalies, with the greatest differences occurring since 2000.

What does this mean for the temperature record?

The temperature estimated from the unrepresentative sample will be an average of the temperatures from the land strata and the ocean strata (T_{land} and T_{ocean}), weighted by the proportion of the Earth's surface covered by data from each strata (P_{land} and P_{ocean}):

T_{biased} = P_{land} T_{land} + P_{ocean} T_{ocean}

However, this is a biased estimate: Not only is it subject to normal sampling errors, it is biased by the fact that the proportions of land and ocean data in the sample are different from the proportions in the real data. An unbiased estimate would use the true land and ocean proportions:

T_{unbiased} = 0.29 T_{land} + 0.71 T_{ocean}

where 0.29 and 0.71 are the actual global land and ocean fractions.

We can calculate the bias from the difference between the biased and unbiased estimates:

Δ_{bias} = T_{biased} - T_{unbiased} = T_{land }(P_{land }- 0.29) + T_{ocean} (P_{ocean} - 0.71)

= (T_{land }- T_{ocean}) x (P_{land }- 0.29)

The bias in the HadCRUT3v data due to the unrepresentative land/ocean sampling can be calculated by this equation, and is shown in the following figure (as a 60 month moving average):

Until 1980 the bias is small, because the land and ocean temperatures do not differ significantly. After 1980, the difference between the land and ocean temperatures becomes significant, and at the same time the sampling of the land and ocean strata becomes increasing unrepresentative, amplifying the bias. (This is analogous to boys growing faster than girls at the same time as the proportion of boys in the sample is dropping.)

What impact does this have on the temperature trends? If HadCRUT3v is biased over recent years, it looks as though it is biased low. However before drawing a firm conclusion we need to look for other sources of bias; this will be subject of the next post in the series.

*Note: While we can estimate the bias by careful statistics, the ideal solution to poor sampling is the one that Hadley and CRU have adopted - improve the data coverage. That of course involves a lot more work.*

**Acknowledgements**

Thanks to Tom Curtis both for helping with this article, and for suggestions which inspired the original analysis.

miffedmaxat 02:17 AM on 28 March, 2012Chris Gat 02:59 AM on 28 March, 2012Kevin Cat 03:09 AM on 28 March, 2012Chris Gat 03:21 AM on 28 March, 2012Chris Gat 03:59 AM on 28 March, 2012Chris Gat 04:12 AM on 28 March, 2012dana1981at 04:18 AM on 28 March, 2012Steve Caseat 04:18 AM on 28 March, 2012"Until 1980 the bias is small, because the land and ocean temperatures do not differ significantly. After 1980, the difference between the land and ocean temperatures becomes significant ... "Haven't land and ocean temperatures always been significantly different by around 7.5°C? What am I missing? Here's what NOAA says: Land Surface Mean Temp. 1901 to 2000 (°C) 8.5 Sea Surface Mean Temp. 1901 to 2000 (°C) 16.1 Source scroll down half way. Here's a graph I made some time ago that plots out the difference in trend between the two: Here's one I made about the same time that plots just the difference.Chris Gat 04:43 AM on 28 March, 2012Chris Gat 04:49 AM on 28 March, 2012Chris Gat 05:45 AM on 28 March, 2012KRat 05:57 AM on 28 March, 2012"I'm thinking that it should be possible to use an alternate method. I have one in mind where each station contributes a measurement that is weighted according to the distance from the station."What you are describing is the GISS method, as described in Hansen and Lebedeff 1987, where the measurement weighting is driven by the observed correlation of temperature anomaly with distance. Each measurement within a certain radius of a point(up to 1200km)is weighted by the distance correlation when calculating an estimate at that location.martinat 06:30 AM on 28 March, 2012Moderator Response:[DB] "

Is that enough to explain why global warming seems to have stopped?"Non sequiter. Please see the following post: http://www.skepticalscience.com/Breaking_News_The_Earth_is_Warming_Still_A_LOT.html

Kevin Cat 07:07 AM on 28 March, 2012Chris: Yes, the grid cells are weighted by area. An improved method, used by GISS, involves allowing the number of cells to vary by latitude to keep roughly constant area. It's pretty simple in practice. As well as GISS, you might want to take a look at what Nick Stokes has done in TempLS. He's looked at weighting each station by the unique area around it and loads of other nice stuff, some of which anticipated the ideas in BEST.Steve Case: Sorry, I'm talking about anomalies exclusively in the article. I was trying to remember to put the word anomaly in everywhere, despite the repetition, but missed some. Since the temperatures are always converted to anomalies before averaging, the difference in the absolute values disappears.Martin: The land/ocean bias is not enough on its own to explain the difference between HadCRUT3 and, say, GISTEMP. There is another major source of bias in HadCRUT3 as well - you have probably read about it elsewhere. Once we've looked at that I think you will have your answer. I started with the land/ocean bias because it is obvious and introduces the concepts.Chris Gat 08:46 AM on 28 March, 2012barryat 09:02 AM on 28 March, 2012Glenn Tamblynat 10:19 AM on 28 March, 2012itself, then the variability between nearby stations is much less, and we can meaningfully work with fewer stations further apart - the sheet is stiffer. This idea really ties a lot of people up in knots and is the underlying driver for much of the 'Dying of the Thermometers', 'Its bad stations' type memes that have had so much traction. Most people can't get their heads around the difference between working with Temperatures and Temperature Anomalies. And Joe Public probably assumes that the calculations are done using Temperatures. I did a 4 part series on this nearly a year ago that goes through a lot of this.Glenn Tamblynat 11:21 AM on 28 March, 2012Steve Caseat 01:45 AM on 29 March, 2012Your graphs seem to show a divergence becoming more pronounced about 1980, but that is just the old eye-ometer.My eye-ometer sees the same thing you do. The question is, will the sea suface temperatures catch up? Kevin C Wrote:

Since the temperatures are always converted to anomalies before averaging, the difference in the absolute values disappears.Considering how heat flows through the system, sun => surface => atmosphere => out, the difference between the surface and the atmosphere is important and ought not be ignored. As the difference between the two becomes less, there should be less net heat transfer and the ocean surface ought to warm. That difference has narrowed by about (7.75°C - 7.5°C = 0.25°C) over the last 160 years and as Chris's eyometer points out much of that is in the last 30 years. I'm thinking that the 0.25°C is probably the signal from increasing CO2. If you plot out the difference using anomalies you get this one: I doubt that the sigmoid shape is due to randomness and it shows the 0.25°C increase very nicely. It also shows that the eye-ometer increase onward from 1980 discussed above isn't all that unusual.

Kevin Cat 03:15 AM on 29 March, 2012Tom Curtisat 04:33 AM on 29 March, 2012Chris Gat 07:32 AM on 29 March, 2012Chris Gat 07:35 AM on 29 March, 2012Kevin Cat 17:05 PM on 29 March, 2012Nick Stokesat 19:41 PM on 29 March, 2012emosca11at 23:18 PM on 29 March, 2012Steve Caseat 01:28 AM on 30 March, 2012It looks like we have a big cooling event covering the period 1880-1900. Given the 60 month smooth, it would have to start around 1883.I don't know what to make of the sigmoid shape of the curve. I was interested in the trend when I set out to make the graph. I found out that the several degree gap between the warm ocean and the cooler atmosphere has narrowed by about 0.25°C over the last 160 years. The sigmoid shaped curve that appeared shows us that at times the gap widens. Take a look at the 50 year period from 1920 to 1970. I'm not offering up any theories and in my wonderings around the net I haven't seen any from any one else.Riccardoat 02:51 AM on 30 March, 2012Kevin Cat 03:25 AM on 30 March, 2012Kevin Cat 03:29 AM on 30 March, 2012Tom Curtisat 03:47 AM on 30 March, 2012prima facieevidence that the mid century temperature peak was at least in part non-forced. I base this claim on the fact that the difference in the anomalies is declining at the time of that peak. However, this is no comfort for the "it's all oceanic oscillations" crowd for the recent warming is clearly associated with a very strong positive forcing. As noted before, it is stronger than any forcing shown elsewhere on the record except for the brief excursions due to major volcanic events. An important additional caveat is that temperature records prior to 1950 are incomplete, and particularly so prior to 1880 so that prior to those dates noise is a significant factor. Also, of course, HadCRUT3 is now obsolete, and its flaws will also constitute noise on the record.Riccardoat 04:29 AM on 30 March, 2012Steve Caseat 12:03 PM on 1 April, 2012… Science of Doom has an extensive discussion of the difference of the ocean's response to heating by solar radiation and back radiation …I suppose this will be considered nit picking, but back radiation from the cooler atmosphere doesn’t do any heating of the ocean. It does slow the cooling of the ocean by canceling out part of the spectrum, but it’s the sun that does the actual heating and reestablishment of equilibrium. Yes, the effect is the same and it’s perhaps just semantics, but claiming that back radiation heats the ocean leads to erroneous thinking.

Moderator Response:[DB] Your statement about back radiation is off-topic on this thread. Any who wish to respond to it please do so on a more appropriate thread. Thank you.