This column was originally published by the Times of Israel.
While various statistical methods may be sound, all polling analysis depends on the quality of assumptions and data inputted. Garbage in, garbage out: skewed data inputs lead to skewed poll results, no matter how brilliant any particular statistical methodology.
At first, it was just a trickle, a misguided throw-away line here and there, easily ignored. Then it started picking up momentum, showing up in one Israeli commentary after another. And now, it is conventional wisdom in the Israeli press and public that the U.S. election is already over, that polls show President Obama’s reelection is inevitable, and that Republican Mitt Romney might as well throw in the towel now.
Of course, this is nonsense. It is based on the most superficial reading of the most superficial polls. In fact, as described below, while poll results are all over the map, the most historically accurate pollsters consider the race tied.
This is hardly the first time the Israeli media herd has stampeded in the wrong direction based on flawed analysis. This time, everyone seems to be on board with the “only a miracle can save Romney” theme—this while U.S. economic numbers decline, the Middle East is on fire and al-Qaeda flags have been flown from area U.S. embassies, and we have yet to have even the first of the Presidential debates. (In fairness to the Israelis, it should be noted that much of the American media are, to varying degrees, on the same bandwagon, and are only now beginning to question to what degree the polls are skewed.)
But when even my own mathematician brother succumbs to this misreading of the race, it is time to explain what the polls do and do not say.
First, not all polls are created equal. There are all kinds of methodological differences among polling organizations in terms of targeted response rates, when voters are called (just how many Romney-voting small-business owners are home to answer a pollster’s weeknight call between 5:00 and 7:00 pm?), whether pollsters use live interviewers or computer calling, whether only landlines are called or also cellphones, how to weigh the non-response rate, and whether calls with no answer are called back. Statisticians argue endlessly about which methods should yield the most accurate results.
While various statistical methods may be sound, all polling analysis depends on the quality of assumptions and data inputted. Garbage in, garbage out: skewed data inputs lead to skewed poll results, no matter how brilliant any particular statistical methodology. Decisions regarding those inputs are as much art as science.
And no assumption is as controversial or as influential on presidential polling results as “partisan weighting,” i.e., adjusting samples according to estimated party affiliation and turnout. The theory makes intuitive sense: if history and current trends indicate that, for example, 5% more of those voting are likely to be Republican, (“R+5”), the poll sample should be weighted accordingly, even if more Democrats are initially polled than Republicans.
How much weighting is appropriate? That is the crux of recent debates.
In the 2004 election in which George W. Bush defeated john Kerry, Democrats and Republicans turned out in roughly even numbers; but in 2008, the electorate consisted of 8% more Democrats than Republicans. Should a pollster adjust the results of his sample using the 2008 election as a model, which will strongly favor Obama, or the 2004 model, which will not?
The dirty secret of polling-as-news stories is that results are essentially pre-determined by the weighting model used. It should hardly come as news that a poll with 10% more Democrats sampled than Republicans will favor the Democratic candidate, or vice versa.
Even if we start at the 2008 D+8 figure, does anyone seriously believe that Democratic turnout will be the same for the reelection of a divisive incumbent in a stagnant economy with high unemployment as it was for the historical election of the first black President running on “hope and change” against a particularly weak candidate?
In fact, since 2008 the Democrats’ party identification advantage has not only eroded and disappeared, but has turned into a Republican advantage: the 2008 D+8 number identification from 2008 had already become R+1.3 by the 2010 midterm elections, which Obama himself described as “a shellacking” of the Democrats. And since then? Has anything in particular changed so dramatically as to move the needle in Obama’s direction? Apparently not, as the Republican advantage has only grown, and now stands at R+4.3%.
So, in an R+4.3 country, who in his right mind would take seriously a poll which skews current samples to D+8 or D+11, especially in states where that presumes a greater Democratic advantage than even 2008 would predict?
That is the question every news organization should be asking before reporting such polls. Yet, the Israeli press dutifully reports such poll results: just last week, it was big news in the Israeli media when The New York Times and CBS News published their own polls showing Obama up by 9 to 12 points in the important swing states of Ohio, Pennsylvania and Florida. Buried in the story, however, was the pro-Democrat weighting—as much as D+11. Is this really news—that a D+11 poll has Obama ahead by 11%?
In 2008, Obama won Florida by under 3%, and Ohio by under 5%. In other words, Florida and Ohio were less Democratic than the then-D+8 nation as a whole. As party affiliation has swung strongly in the Republican direction—and as Obama’s image has been badly tarnished in both state’s sizable Democrat-leaning Jewish communities—perhaps a bit of skepticism is in order before breathlessly reporting the New York Times/CBS numbers claiming double-digit leads for Obama.
Republicans consistently vote with their party at a greater rate than do Democrats. For 40 years, Republican candidates have won a remarkably consistent 84% (+/- 3%) of the votes of Republicans, while Democrats hover around 78% voter retention rates. Thus, under-weighting a Republican polling sample is apt to directly lower Republican poll results.
Another red flag is that many of the reported polls showing Obama way out front are polls of “adults” or “registered voters”; but the metric that matters and has always shown the most accuracy is that of “likely voters.” And those polls show a much tighter race.
Lastly, the polling organization rated most accurate since 2004 (Rasmussen) shows the race to currently be a statistical tie, at 47%-47%.
We have a long way to go before declaring this election over.