The article Daring to Discuss Women in Science by John Tierney in the New York Times on June 7, 2010 purports to present a dispassionate scientific defense of Larry Summers’s claims, in particular by reviewing and expanding his argument that observed differences in the length of the extreme right tail of the bell curves of men’s and women’s test scores indicate real differences in their innate ability. But in fact any argument like this has to acknowledge a serious difficulty: it is problematic to assume without comment that the abilities of a group can be inferred from the tail of a bell curve. We are so used to invoking bell curves to talk about group abilities, we don’t notice that such arguments usually use only the mean of the curve. Using the tail is a totally different story.
Think about it: it is reasonable to question whether a single data point — the test score of an individual person — is a true indication of his/her ability. It might not be. Maybe a single test score represents a dunce with hyper-overachieving parents who push him to study all the time. So does that single false reading destroy the validity of the curve? No of course not: because some other kid might have been a super-genius who was drunk last night and can barely keep his eyes open during the test. One is testing above his “true ability” and the other is testing below his “true ability,” and the effect cancels out. Thus the means of curves are a good way to measure the ability of large groups, because all the random false readings average out.
But tails are not. On the tail this “canceling out” effect doesn’t work. Look at the extreme right tail. The relatively slow but hyper-motivated kids are not canceled out by the hoard of far-above-the-mean super geniuses who had drunken revels the night before. There just aren’t that many super-geniuses and they just don’t party that much.
Or let’s look at it another way: imagine that you had a large group which you divided in half totally at random. At this point their bell curve of test scores looks exactly the same. Lets call one of the group “boys” and the other group “girls”. But they are two utterly randomly selected groups. Now lets inject the “boys” with a chemical that gives the ones who are very good already a burning desire to dominate any contest they enter into. And let us inject the “girls” with a chemical that makes the ones who are already good nonetheless unwilling to make anyone feel bad by making themselves look too good. What will happen to the two bell curves? Of course the upper tail of the “boys'” curve will stretch out, while the “girls'” tail will shrink in. It will look like the “boys” whipped the “girls” on the right tail of ability hands down, no contest. But the tail has nothing to do with ability. Remember they started out with the same distribution of abilities, before they got their injections. It is only the effect of the chemicals on motivation that makes it look like the “boys” beat the “girls” at the tail.
So, when you see different tails, you can’t automatically conclude that this is caused by difference in underlying innate ability. It is possible that other factors are at play — especially since if we were looking to identify these hypothetical chemicals we might find obvious candidates like “testosterone” and “estrogen”.
The possibility of alternative explanations for these findings calls into question Tierney and Summers’ claims to superior dispassionate scientific objectivity. Moving from the mean to the tail of a bell curve makes systematic effects on averages irrelevant, true, but it is instead susceptible to systematic effects on deviations, which are irrelevant at the mean. An argument that uses this trick to dodge gender differences in averages cannot claim the mantle of scientific responsibility without accounting for gender differences in deviations. I am deeply disappointed that Tierney and Summers did not accompany their assertions with a suitable reminder of this fact.Share: