Non-English climate science

Today we are used to receiving new climate research written in English. That has not always been the case. There even was a time when English was a very minor language in science. Some time ago I started thinking that by concentrating on research written in English we might be missing lot of climate science, especially historically. I decided to take a look at the situation.

I used Google Scholar and Google Translator for searching papers containing the word "climate" in all languages supported by Google Translator. I recorded the number of hits for each language. Results of this are shown below in a table. Note that this analysis is very rough, so I suggest that the presented numbers should only be taken as directional, and that the big picture presented in the table is more meaningful. The resulting numbers have a lot of uncertainties, some of which I explain below. Here's the result table:

Country/language Word Results
English/Latin climate 2550000
Spanish/Italian/Portuguese clima 954000
China-simple ?? 614000
Germany/Norway/Denmark klima 350000
France/Romania climat 318000
Russia/Serbia ?????? 93800
Japan ?? 49400
Turkey iklim 43600
Sweden/Poland klimat 34100
Korea ?? 33900
China-traditional ?? 31100
Netherlands/Afrikaans klimaat 24100
Ukraine/Belarus ?????? 23500
Albania klimë 7600
Arabic ???? 6610
Lithuania klimatas 6270
Finland ilmasto 3980
Persia ????? 3850
Greece κλ?μα 3500
Esperanto klimato 3480
Czech podnebí 2830
Vietnam khí h?u 1390
Azerbaijan iqlim 883
Hindi ?????? 821
Estonia kliima 584
Slovenia podnebne 575
Slovakia podnebie 346
Thailand ????????? 468
Latvia klimats 255
Hebrew ?????? 244
Iceland loftslag 179
Swahili hali ya hewa 113
Yiddish ??????? 83
Welsh yn yr hinsawdd 28
Armenia ????? 18
Irish aeráide 12
Urdu ?? ? ??? 3
Gujarati ?????? 1

There are 2,550,000 hits in the English/Latin languages. Non-English (excluding Latin of course) languages have 2,235,364 hits. So, it seems that almost an equal number of climate papers exist in English as in non-English languages. Some languages are missing from the table because they didn't produce any hits (and of course lot of others that are not supported by Google Scholar).

Like I mentioned above, the numbers have a lot of uncertainties. Google Scholar returns a lot more search results than just peer-reviewed papers. There are books, reports, and even some blog posts. This distorts the resulting number of hits. This seems to be a substantial problem for example in the search results for my native language, Finnish.

Another source of error is that Google Scholar returns search results for both author names and journal names. This is a big issue for example in German results. There seems to be lot of papers published by many authors who have the last name "Klima". 350,000 hits for the German language therefore seems to be off by quite a lot. A search for "Klimawandel" (climate change) resulted in 21,900 hits. English "climate change" gives 1,570,000 hits, so the resulting ratio of climate/climate change = 1.62 for English. Assuming the same ratio for German, it would result in 21,900 * 1.62 = 35,600 hits for "klima" (climate). However, this feels somewhat too low considering that German is a common language in science, and that other comparable languages have many more hits (for example, French has over 318,000 hits - but see below for the need to correct French results). Also, most of Hungary's results seem to be from author's names.

Yet another problem is that not all of the search results are in the language that was intended. This is partly due to the issue mentioned above about Google Scholar returning results both for author and journal names. There are also occasions where another language has the same word (or close enough for Google Scholar) in another meaning, or has an author's name matching the search word. French search results, for example, includepapers in other languages. According to the first result page (yes, I know it's not a very big sample...), French results are 20% non-French. This would reduce the number of French language hits to 254,400.

Albania's word for climate is "klimë", but almost all search results are for "klime", so Google Scholar sometimes gives additional results for words that are close to the actual search.

Search results might also not be climate related. The word "climate" has other, non-meteorological, meanings. Such as the political climate, or a climate of fear. The possibility for this source of error might be even worse for some other languages.

There are also duplicate entries for some papers. And these probably are not all error sources. Some non-English papers have also been published in English (or vice versa), so the ratio of non-English and English papers (= 0.87) might not be accurate. Additionally, some non-English papers have English abstracts.

So, it seems that despite all of my search results, there are not 5 million climate papers out there. But there are a lot of them - and quite a few of them might be in a language other than the English and Finnish that I understand. It sure would be nice to be able to read all those papers when needed.

Posted by Ari Jokimäki on Friday, 25 January, 2013

