The Pitfalls of Data


Data, Research

Back to news

At RCU we spend a lot of time analysing data and advising colleges and Government departments about what data means. We are always careful, however, not to fall into the trap of drawing the wrong conclusions from what appears at first glance to be compelling evidence.

The chart below (taken from the excellent website “Spurious Correlations”) illustrates the old adage that “correlation is not causation” – one of the biggest pitfalls for data research. The chart compares the numbers of Japanese passenger cars sold in the US between 1999 and 2009 with suicides by crashing of motor vehicles over the same period. The correlation between the two is extremely strong (over 93%). Common sense tells us, however, thatJapanese motor cars are probably not the cause of motor vehicle suicides!

The Pitfalls of Data blog

There is clearly a link between the two factors and it is useful to come up with theories that might explain the link. For example, both the sale of Japanese cars and suicides might be related to the state of the economy – in periods of recession suicides might rise and people may be more likely to buy cheaper, reliable and economical cars. I have no idea whether this is true, but the next stage would be to try and test this theory against more data. Of course, there remains a small possibility that there is something about Japanese cars driving people to suicide but we wouldn’t feel happy about accepting this conclusion without having a lot more evidence to back it up.

In education and skills we need to be equally cautious about making connections between two things without having sufficient evidence. This particularly applies to any link between qualifications and the impact on the economy. This is a very complex area where lots of factors are involved such as alternative options for individuals if a college course was not taken, pre-existing skill levels in the local economy, the relative growth or decline of particular local industries, productivity levels in different geographical areas and the move towards self employment etc. Data is very important in helping us to understand what is happening, but like the above example about Japanese cars and suicide rates we need to recognise the dangers of making spurious correlations involving a single set of figures.

In particular, it is very important that data analysts understand and take into account the unique context in which further education operates.

Sign up to our email marketing newsletter

Receive all the latest RCU news updates and events when you subscribe today.

Sign up