November 01, 2004

silhouette3.JPG From the desk of Jane Galt:

Department of Awful Statistics

OOPS! Gregg Easterbrook makes a mistake, in an otherwise interesting column about John Edwards:

Maybe this theory works for people's thoughts, but the key word is "wisdom," and numbers don't contain any wisdom. The electoral vote sites are averaging mere numbers--poll results known to contain margins of error. Remember, averaging errors does not make the errors go away [Emphasis mine]. It is a classic mistake to take a bunch of estimates, run them through a sophisticated formula, and then treat the result as if it is precise. This is basically what computer-modeling of the future does, and why it's almost always wrong. Poll averaging may have similar drawbacks.

But this is not true. Averaging errors does make them go away, as long as the error isn't systematic.

A systematic error, in statistics, is an error in your data-gathering design, which produces results which vary in a standard way from actual reality. An example of systematic error would be the fact that telephone polls do not get people who have only cell phones or VOIP; those households may (and probably do) represent a discreet subset of the population whose attitudes lean (I'd guess) more Democratic. So by excluding them, you bias each and every poll towards Republicans.

Non-systematic error is simply the kind of error you get from taking a smaller sample of the population and trying to generalise from it. Say I want to find out what people over 6'2 think about Bush, and I get a list from somewhere of all the tall people in the country, and call 2,000 of them to find out what they think. My sample of 2000 is probably not going to exactly mirror the entire population of tall people; even if I've avoided systematic error, random chance means that their opinions will probably be a few percentage points off, one way or the other. That's known as sampling error.

One of the ways to correct for sampling error is to get larger samples. For example, if I only called five people, there would be a pretty good chance that, purely by accident, I'd call too many Democrats or Republicans. On the other hand, if I call a million, the law of averages should catch up with me, and it's very likely that (as long as I have no systematic error) the results of my poll will match up pretty well with the opinions of the whole population of tall people.

Averaging results of different polls is another way. If the error in the polls is all non-systematic, then they'll cancel eachother out, and we'll get a pretty good look at where the race really stands: in effect, we've created a larger sample. Of course, it's possible that there's a lot of systematic error in the polls: people with cell phones or people who screen their calls being dropped in favour of lonely senior citizens with plenty of time to answer questions from the nice young man at Gallup. (Senior citizens are apparently, after african-americans and jewish people, one of the most consistently Democratic demographics, another problem for the Emergin Democratic Majority theorists: a good portion of their current Democratic Minority is developing a keen interest in the price of tombstones.) Systematic error won't be averaged out; it will show up in every poll running the same direction, and will thus remain in the averaged results. But while it's certainly not true that averaging polls can correct all errors, that doesn't mean it can't correct any: it can. Unfortunately, this race is so tight that systematic error, and late-breaking undecideds, overwhelm the predictive power of the averages.

Posted by Jane Galt at November 1, 2004 01:20 PM | TrackBack | Technorati inbound links
Comments

A question and/or thought: since most polls seem to take their sample and then massage it heavily according to the pollster's perception of what the elusive 2004 likely voter looks like, I would wager that more inaccuracy is introduced into the final results at this stage (call it "interpretive error" or "pollster bias error") than as a result of any hypothetical systematic error.

Since the various pollsters seem to be using widely different likely voter models (example: assuming turnout will look more like it did in 2000 than 2002), wouldn't that also be a classification of error that would tend to be reduced by averaging?

Posted by: HT on November 1, 2004 01:46 PM

Another thing to bear in mind is that most statistical analysis is implicitly stated in terms of normal distributions, with the hidden assumption that outliers fall off exponentially. If your data is not a good approximation to a normal distribution, this can come back and bite you.

Posted by: Kevin Marks on November 1, 2004 02:27 PM

It should, in fact, be possible to amplify the random errors in polls by averaging if, when averaging, you don't weight the polls appropriately (in your formulation, by n size).

Totally OT: It's sad, but I never read TNR anymore. It was pretty much my introduction to the "thoughtful class" of politically interested, but since Sullivan's editorship there it has just sucked. (FWIW: when I first read TNR, the standout reporter in my mind was Fred Barnes). Now, I'm much more likely to appreciate Slate than TNR (though Slate isn't nearly as good as the old TNR).

You see the same thing happening with other august institutions. When I was really young, TIME was considered an unimpeachable source; now, it's like another Entertainment Weekly, and I trust Newsweek much more. I trust the LA Times on political matters much more than I trust the WP.

Very wierd. I must be getting old.

Posted by: SomeCallMeTim on November 1, 2004 02:43 PM

Jane is exactly right on the facts. I despair at the lack of statistical knowledge of otherwise thoughtful people.

And it's not *that* hard to do a little research.

Posted by: old maltese on November 1, 2004 02:44 PM

"If your data is not a good approximation to a normal distribution, this can come back and bite you"

The error around the estimates of Bush and Kerry is probably really close to normally distributed. The error around Nader's estimate is probably going to be something like log normal. Since there's very little chance that Nader's going to get a negative number of votes, it would have to be some sort of right-skewed distribution.

Posted by: CatCube on November 1, 2004 02:45 PM

What I find interesting in statistics is the Law of Large Numbers. I ran through the math (requires two years of calculus) in my sadistics class, so I know it really works, but it still seems like a form of voodoo to me. As long as you are working with both a normal distribution curve and a truly random sample, the difference between sampling 2,000 people and 1,000,000 people is actually quite small. Sampling 2,000 people at random actually gives you a very close approximation to reality, but the rub lies in getting a random sample. The tricky statistical stuff lies in the realm of correcting for samples that are not truly random.

Posted by: Rex on November 1, 2004 04:39 PM

Im not much for polling stats analysis, but there are a few very obvious weaknesses. The polls depend on human honesty, I never considered this a polling weakness before this election, but the way the polls have vacillated and the high number of so called undecideds, I have begun to think people lie to the pollsters either out of boredom, or for no reason at all. Ive had an immediate hang up policy for phone surveys and solicators for years, but I now think some people like the conversation, and answer in ways to extend the contact. Then theres the fact these polls dont consider new voters, and imo this election is going to swing on them. The battleground states, and all the different polls from each state, from each polling company, further confuses the issue, and if this election is decided by the courts again, the electoral college has to go.

Posted by: Begbee on November 1, 2004 07:15 PM

The one aspect of this issue that is not covered by this post is that averaging of polls (like that done at RealClearPolitics.com) assumes the questions are the same.

Unfortunately, though they are probably similar, they aren't the same. Each polling organization asks certain questions before and after the "horse race" question of which candidate the respondent will vote for. And surprisingly, question wording and question order can make a pretty big difference, even when it is on a question that is easy to think about, like who a respondent would vote for in an election. This problem of question order and question wording is increased significantly when it comes to determining who is a likely voter, because those questions are more nebulous.

So, if every polling organization asked the same set of questions in the same order, you could average the results (weighted for sample size), and it would improve the error somewhat. With disparate questions, you cannot do that accurately. The result just includes all kinds of unknown systemic errors inherent in the questionnaires, and no way to account for those errors without disaggregating the data again.

Posted by: Sisyphus on November 1, 2004 08:24 PM

Sisyphus thats a very good point. It makes you ask yourself that if the polling companies choice of questions alters the results the polling companies are they shaping rather than reflecting the election?

Posted by: Begbee on November 1, 2004 09:25 PM

Did Hell just freeze over? A sensible comment from Begbee.

Yes, in fact there are many cases where a polling organization has deliberately slanted the poll by how they worded the question. But the blatant cases I know of concerned issues of some complexity, not "Who will you vote for?". E.g., although you'd think abortion was almost a binary issue, you'll get a large difference in responses to "Should abortion be legal" and "Should abortion at will be legal".

And right now over on "The Raving Atheist", a bunch of nonbelievers are having fun with an on-line religion quiz which tends to rate the utterly unspiritual as more Unitarian-Universalist than "nontheist" or "secular humanist" - because of questions like:

"What is the number and nature of the deity (God, gods, higher power)? Choose one."

with answers like: "No God or supreme force. Or not sure. Or not important."

So it lumps a fully committed atheist in with agnostics, the merely confused, and apparently somehow even a UU who sort of believes in some sort of warm fuzzy god. (The latter is my personal summary of several years of UU sunday school.) And the other questions that bear directly on religious issues (the afterlife, the origin of the universe and life, and the cause of evil), all also have no non-religious answer that doesn't end with "... Or not sure. Or not important."

That is, if you don't want a poll to reflect opinions you dislike, just don't let anyone answer it that way...

Posted by: markm on November 2, 2004 12:20 AM
Another thing to bear in mind is that most statistical analysis is implicitly stated in terms of normal distributions, with the hidden assumption that outliers fall off exponentially. If your data is not a good approximation to a normal distribution, this can come back and bite you.

This only applies to small samples (under 30). If have a population that is NOT normally distributed but you take large samples the sample averages WILL be normally distributed around the TRUE population mean. That's the Central Limit Theorem from stats.

Posted by: Boonton on November 2, 2004 10:06 AM

markm Im glad you found something I said to be agreeable, but backhanded compliments beg a nasty response. Keep your head on a swivel...

I understand why some take up the cause of Atheism, despite being unsure of the existance of a god. All of the worlds religions have done alot more harm then good, and there isnt anything that offers a compelling scientific argument for the existance of Jesus, Allah, Odin, Zeus, etc. The Bush administrations evangelical base and their belief that evolution is wrong and the earth is only 6000 years old has damaged both the American political process and christianity to those who believe in the truth of science. When Im in a foul mood, I'll take up the no god argument, but only because Im sure that if there is a God, and if hes aware of us, hes much more concerned with the entertainment we provide, then what we believe.

Posted by: Begbee on November 2, 2004 10:18 AM

I find it helpful to look at it the way my stats book did. Suppose you have a population of 10000 people and you took samples of 100. How many possible samples are there? Lots but you could write down every possible combination of 100 people. You can then write down their averages. If you mapped out the averages they would all cluster in a normal distribution around the average of the whole population.

When you imagine it that way it makes more sense. Drawing a single sample is like putting all those samples in a box and plucking one out at random. Chances are you'll get a sample close to the mean. If you pull multiple samples chances are they will be all close to the mean. You could draw an off-beat sample but the chances are unlikely that you'll do that multiple times.

Posted by: Boonton on November 2, 2004 11:49 AM

Easterbrook has a point, whether he knows it or not, and whether he's made it or not: averaging the errors don't in fact make the errors go away completely, if the sample size is finite.

Posted by: Slartibartfast on November 3, 2004 12:17 PM

Comments are Closed.