The simple truth about statistics
In the age of the internet, there is no reason why anyone should be fooled by statistics
It is the British prime minister Benjamin Disraeli who is famously credited with the phrase: "There are three kinds of lies: lies, damned lies, and statistics" but the expression has been around almost as long as the word statistics (first coined in 1749 for those wondering). What is it about numerical data that sparks such distrust in people?
Partly, there seems to be an assumption that anything involving numbers is an dark art that needs to be left to experts. Few non-mathematicians – including politicians and journalists – seem to have the numerical confidence to question or check statistics they are given. This led the (then opposition) Conservative party to report in February this year that in deprived areas 54% of women under 18 will fall pregnant (it should have been 5.4%). Then the Independent continued the theme last week (22 September) by deciding that 49% of all girls under 18 have had abortions. If you make the quick assumption that half of all under 18s are under 9s, then this implies all girls from 9 to 18 must have had an abortion. In both of these cases, if people had taken the time to think about the statistics, any logical person could tell that something was amiss.
Even when statistics are carefully checked, and don't have the decimal point equivalent of a typo, things don't always look right. During the same August week two different media stories broke: one painting a grim picture of breast cancer rates in the UK; the other a much more optimistic picture.
Monday 9 August: "Breast cancer rates in the UK are more than four times higher than those in eastern Africa, the World Cancer Research Fund has revealed." This is the original press release.
Thursday 13 August: "Death rates from breast cancer have fallen more dramatically in the UK than any other European country, cancer researchers have said." Original report.
Both reports were using completely accurate statistics, but simply used different measures to back up their message.
The statistics comparing England and Wales with Europe were published in the British Medical Journal (BMJ). These results measured breast cancer mortality, which is how many women will die from breast cancer in a year for every 100,000 women in the population. Between 1989 and 2006 England and Wales did indeed record the highest drop in breast cancer deaths in Europe.
What wasn't reported was that England started with the worst death rate out of all 30 European countries (of 100,000 English women, 42 would die from breast cancer in 1989 compared with a European average of 30). Despite the biggest decrease in Europe, England still has the seventh worst death rate in Europe (28 women out of 100,000, compared with Romania 23, Poland 21 and Spain 19).
As for the World Cancer Research Fund (WCRF) report that compared the UK with East Africa: it was looking at how many women in 100,000 were diagnosed with breast cancer, not deaths. There are several problems trying to compare UK statistics with East Africa's. The report does mention in passing that the eastern Africa numbers are only reported cases. Much of the population do not have sufficient access to medical support to be diagnosed in the first place.
Not only that, but a quick check on the World Health Organisation's website shows that the average life expectancy for women in Zimbabwe is only 42.3 years (compared with the UK's 81.7 years). Most women in East Africa simply do not live long enough to get breast cancer. In the UK, eight out of 10 breast cancers are diagnosed in women aged 50 and over. That women in a different country have half the life expectancy of the UK is the real story, not that our decadent western lifestyle is causing breast cancer.
Untangling the BMJ statistics involved just looking at the table of figures attached to the published paper, and the WCRF report was quickly put into context with a glance at WHO figures. This doesn't require any form of mathematical training; thanks to the internet, anyone with a curious mind can check around statistics to see what the real story is. This shouldn't happen only when there are two seemingly contradictory stories breaking in the same week: any message that is based on statistics should expect to be subjected to lay scrutiny.
By their very nature, statistics can only be misused when the audience doesn't bother checking them. Statistics are just a numerical summary of evidence that has been collected. They give people the starting point to delve directly into that evidence and see if the arguments hold together.
When misused, statistics are less Disraeli's "damned lies" and more another leader's "I did not have sexual relations with that woman". It is by not presenting all of the information and selectively choosing definitions that statistics can appear to lie. But such claims will not stand up under cross-examination.
Matt Parker's website is Stand-up Mathematicianhttp://www.guardian.co.uk/science/blog/ ... lies-abuse