Many column inches of newspapers this past week have been filled with a diagnosis of what went wrong with polls predicting US election outcome.
Why were most media outlets and most polls predicting a Clinton win? More relevantly to this article, should it actually matter to Customer Insight leaders.
I believe some of the lessons that can be learnt from this and other recent failures are very relevant for insight leaders. There are two main reasons for that.
Firstly, not everyone got it wrong or got it wrong as much; so there are examples to learn from. Secondly, some of the lessons to learn relate to relative use of analytics and research methods — so their relevance is much wider than just election polling.
As might be expected, the New York Times ran well-written piece, about the failure of polls and many forecasters.
The Problem
The NY Times lists the polls that got it wrong and why most polls predicted that Clinton would win.
But one of the main reasons I like this article, is this timely quote:
“But data science is a technology advance with trade-offs. It can see things as never before, but also can be a blunt instrument, missing context and nuance. All kinds of companies and institutions use data quietly and behind the scenes to make predictions about human behaviour. But only occasionally — as with Tuesday’s election results — do consumers get a glimpse of how these formulas work and the extent to which they can go wrong.”
Very well said. A more balanced assessment of the potential for Data Science needs to be heard, especially amidst the enthusiasm of conferences & pundits.
Casual factors in US election
One can normally rely on Andrew Gelman for a balanced critique of statistical methods & the weight of evidence for claims.
In this post, he does not disappoint. Drawing our attention to it being a smaller swing than you might think (2%), he usefully highlights several factors at play.
Andrew critiques a number of popular theories (for once it looks like it wasn’t another case of “ Shy Tories ” or “Shy Trumpsters”). Then he goes on to list numerous factors are work, including voter enthusiasm, collapses of this party vote (despite what some papers say), long queues for voting putting people off & potentially an underestimating of the effect of non-traditional campaigning (e.g. Twitter & TV shows).
There is no simple conclusion, but I take away the evidence for no one simple cause for models getting it wrong. There were several factors at play in reality.
Well worth all statisticians reading that article.
Research Methods
As much as Andrew can be trusted for statistical robustness, GreenBook is a trusted source for research best practice.
In this reflection from Tom Anderson, he reminds us that not everyone got it wrong. His text analytics of social media sentiment proved accurate (even if published with a very cautious understatement of conclusion). Like Nate Silver (of FiveThirtyEight ), their results showed a very real possibility for Trump to win.
What is interesting in this approach though, is that the ‘hero’ is not clever model tweaking by skilled statisticians, but a change in research methods used. Tom makes a good case for increased use of behavioural analytics, social media analytics or other non-traditional methods – may of which did predict a Trump win.
I agree with his conclusion, that could also have been made after Brexit polls .The learning point is the growing evidence that: “conventional quantitive Likert-scale survey questions—the sort used in every poll—are generally not terrific predictors of actual behaviour”.
Well worth all research leaders reading that post.
Who is the real villain?
OK, if it’s not true that all pollsters got it wrong, who is really to blame? Should the public and analysts shift their focus to another villain?
As usual, Tim Harford’s More or Less podcast points us in the right direction, in highlighting again the critical role of the Electoral College system.
But the most devastating critique I have read, of the impact this system actually has on US election results, is this piece from Forbes magazine:
In that, statistician Meta Brown (author of great books on Data Mining & Storytelling), lays bare the real villain of this result. I had not realised how much the Electoral College system favours white voters, nor its dubious history in compromises during abolishing slavery.
Compared to those flaws, perhaps the selecting of better research and statistical methods is a minor improvement. So, don’t scapegoat the analysts.