Women, Science and The Right Tail of a Bell Curve

by Rebecca Frankel

The article Daring to Discuss Women in Science by John Tierney in the New York Times on June 7, 2010 purports to present a dispassionate scientific defense of Larry Summers’s claims, in particular by reviewing and expanding his argument that observed differences in the length of the extreme right tail of the bell curves of men’s and women’s test scores indicate real differences in their innate ability. But in fact any argument like this has to acknowledge a serious difficulty: it is problematic to assume without comment that the abilities of a group can be inferred from the tail of a bell curve. We are so used to invoking bell curves to talk about group abilities, we don’t notice that such arguments usually use only the mean of the curve. Using the tail is a totally different story.

Think about it: it is reasonable to question whether a single data point — the test score of an individual person — is a true indication of his/her ability. It might not be. Maybe a single test score represents a dunce with hyper-overachieving parents who push him to study all the time. So does that single false reading destroy the validity of the curve? No of course not: because some other kid might have been a super-genius who was drunk last night and can barely keep his eyes open during the test. One is testing above his “true ability” and the other is testing below his “true ability,” and the effect cancels out. Thus the means of curves are a good way to measure the ability of large groups, because all the random false readings average out.

But tails are not. On the tail this “canceling out” effect doesn’t work. Look at the extreme right tail. The relatively slow but hyper-motivated kids are not canceled out by the hoard of far-above-the-mean super geniuses who had drunken revels the night before. There just aren’t that many super-geniuses and they just don’t party that much.

Or let’s look at it another way: imagine that you had a large group which you divided in half totally at random. At this point their bell curve of test scores looks exactly the same. Lets call one of the group “boys” and the other group “girls”. But they are two utterly randomly selected groups. Now lets inject the “boys” with a chemical that gives the ones who are very good already a burning desire to dominate any contest they enter into. And let us inject the “girls” with a chemical that makes the ones who are already good nonetheless unwilling to make anyone feel bad by making themselves look too good. What will happen to the two bell curves? Of course the upper tail of the “boys'” curve will stretch out, while the “girls'” tail will shrink in. It will look like the “boys” whipped the “girls” on the right tail of ability hands down, no contest. But the tail has nothing to do with ability. Remember they started out with the same distribution of abilities, before they got their injections. It is only the effect of the chemicals on motivation that makes it look like the “boys” beat the “girls” at the tail.

So, when you see different tails, you can’t automatically conclude that this is caused by difference in underlying innate ability. It is possible that other factors are at play — especially since if we were looking to identify these hypothetical chemicals we might find obvious candidates like “testosterone” and “estrogen”.

The possibility of alternative explanations for these findings calls into question Tierney and Summers’ claims to superior dispassionate scientific objectivity. Moving from the mean to the tail of a bell curve makes systematic effects on averages irrelevant, true, but it is instead susceptible to systematic effects on deviations, which are irrelevant at the mean. An argument that uses this trick to dodge gender differences in averages cannot claim the mantle of scientific responsibility without accounting for gender differences in deviations. I am deeply disappointed that Tierney and Summers did not accompany their assertions with a suitable reminder of this fact.

Share:Facebooktwittergoogle_plusredditpinterestlinkedinmail

16 Comments

  1. Sue VanHattum:

    There is also the work of Janet Mertz et al, showing massive cultural variability in the percentage of women in the far right tail, making it clear that there is more nurture than nature in this.

    Thank you for this post. I hadn’t known about Tierney joining Summers in this sexist nonsense.

  2. Thomas:

    So “ability” now has a new definition. It is a hypothetical state of equality that is disturbed by a natural difference between males and females. And the fact that this natural difference has an influence on performance is somehow “proof” that males and females are born equally able. By that kind of reasoning, the fact that I cannot see well enough to hit a major-league fastball proves that I belong in the Hall of Fame, along with Babe Ruth. If you’re looking for “sexist nonsense,” look no further than Rebecca Frankel’s hypothesis.

  3. Max:

    It seems that this effect (if real) would thin out the tail, but not affect the middle, thus making the distribution deviate (more) from normal, but it would still have small effect on the standard deviation (the bulk of data being unaffected). What I’m trying to say is – higher variability shows not only on the tail, but in the bulk of the curve.

    Hypothesis and data aside, why does any discussion of this deteriorate into accusations of “sexist nonsense” – and not just eventually, as in Godwin’s law, but right away? Sure, actual real life behavior of individuals has real and tangible effects on real people. But is anyone accusing Tierney or Frankel of real life sexism? Answer arguments with counterarguments. After all, Merriam-Webster tells us, “diatribe” used to mean “prolonged discourse” before it became “bitter and abusive writing”.

  4. math games for the classroom:

    I like the example of looking at it as a large group divided! Thanks for the post. It got me thinking!

  5. Adam:

    Excuse me, as a mathematician, do you mind giving some enlightening comments here: http://community.discovery.com/eve/forums/a/tpc/f/7501919888/m/75719488501?r=66819698501
    Thanks.

  6. Rebecca Frankel:

    Note to Thomas: My article asserts no hypothesis about how to think about ability. My own views are complicated and would require more space to explain, which wouldn’t be appropriate in this venue anyway. The only assertion I made was that I was disappointed in Summers and Tierney for claiming scientific objectivity without including the disclaimers that I thought ought to be attached to such a non-standard statistical argument. I’m not even saying I think they are wrong, or even that they shouldn’t have presented their argument: only that they were using statistics with excessive license and insufficient qualification. I included the final paragraph and fought pretty hard through the editing process to get it right because I wanted to make it clear the conclusion about my political views that should be drawn from the otherwise technical argument. This blog is about math, not politics, the argument presented here is narrow and technical, and the only political fight in which I mean to be staking out a position at this time is a fight about the use of statistics, and more broadly, perhaps, claims of scientific objectivity, in political discourse.

  7. JBL:

    Rebecca Frankel, thank you for this excellent piece, and thank you Tanya for hosting it. It’s a shame that the three comments from identifiably male user names include one that is hostile and incoherrent and one that is completely off-topic. As for Max’s comment, I’m not sure what “real life sexism” is supposed to mean. Surely pushing the unconfirmable (for the reasons Rebecca details, among others) hypothesis that women are inherently less able serves no legitimate purpose and serves only to promote or reinforce a pernicious stereotype.

  8. colorblind:

    Hitler’s Nazism is to Germany as Mussolini’s Italy is to Fascism. Ok, now that we’ve satisfied Godwin’s law with a Nazi analogy…

    As in the article, let’s introduce 2 chemicals that act in a slightly different way than proposed by the author. Let’s say these two chemicals, we’ll call them
    “X chromosomes” and “Y chromosomes”, act in a manner to induce the production of other secondary chemicals suggested by the author – the “X chromosomes” induce a chemical that makes one not want to make others feel too bad and the “Y chromosomes” the other one…..

    I need not go further; the remainder is trivial. A chemical argument IS a genetic argument. Now if we were giving girls one set of shots at birth and boys another, that would be different. But we don’t.

    The mathematical model of what the author presents is only slightly more interesting. The means of the each set might be equivalent, but the variation might be different. But that’s not really how we ought to be viewing this argument. There is a variable that should be considered: the probability of making a meaningful advance in a field of study, in this case mathematics. At a certain point and left of that point, the probability is 0. To the right of that point, the variable increases approaching 100% as the deviance from the mean approaches infinity. The resultant curves will look significantly different, most likely favoring the gender with the greater variability.

    I suspect this is more what Summers and Tierney are looking at, not a bell curve of natural ability, but a secondary curve of ability to contribute, which is significantly different.

    Having 3 daughters, I know these debates will eventually enter their consciousness. And maybe they won’t be making critical advances in a given field. But that’s no reason to expect any less of them than their best.

  9. Felipe Pait:

    I don’t want to be disparaging of anyone. So let me say that I find this blog WAY above and beyond anything that appears in Tierney’s columns. Which I stopped reading.

  10. Anonymous:

    I don’t want to make any claims on whether Tierney is right, but I think there is some misunderstanding of his argument. Tierney’s point is that it is important to look at tails because those who work professionally in math or science are usually those at the tails of the overall distribution. To take Rebecca’s example, suppose to succeed in math one need to put in a huge amount of work, suppose further that people who are hyper-competitive are more likely to put in a lot of work, then the majority of people who succeed will be those who are hyper-competitive, even if there are no discrimination and no difference in innate ability.

  11. Dave L. Renfro:

    Here’s something I remember reading somewhere (in John Allen Paulos’ book “Innumeracy”?) that may be of interest.

    Consider two normal distributions with the same standard deviation, say sigma = 1, one with a mean of 0 and the other with a mean this is almost the same, say a mean of 0.1. If you DIVIDE the former by the latter, then you get (after simplifying some):

    exp[ 0.005 – (x)(0.1) ].

    For x = 0, 1, 2, 3 (i.e. for these numbers of standard deviations to the right of x = 0), the RATIOS wind up being approximately

    1.005, 0.91, 0.82, 0.74.

    Now if the normal distribution were a simple exponential (i.e. the exponent was linear in the independent variable), then these ratios would be constant, like you see in population growth or radioactive decay. However, the normal distribution is “super-exponential”, due to the fact that the exponent is quadratic in the independent variable, and thus the output RATIOS for constant input INCREMENTS are not constant.

    This illustrates that if two populations with the same standard deviation have slightly different means, then the relative proportions of the two populations can vary a lot when you look several standard deviations away from their (nearly identical) means.

  12. Dave L. Renfro:

    {{ Please use the following, which corrects one word in my original (“this in 2nd paragraph changed to “that”), and delete this introductory note. }}

    Here’s something I remember reading somewhere (in John Allen Paulos’ book “Innumeracy”?) that may be of interest.

    Consider two normal distributions with the same standard deviation, say sigma = 1, one with a mean of 0 and the other with a mean that is almost the same, say a mean of 0.1. If you DIVIDE the former by the latter, then you get (after simplifying some):

    exp[ 0.005 – (x)(0.1) ].

    For x = 0, 1, 2, 3 (i.e. for these numbers of standard deviations to the right of x = 0), the RATIOS wind up being approximately

    1.005, 0.91, 0.82, 0.74.

    Now if the normal distribution were a simple exponential (i.e. the exponent was linear in the independent variable), then these ratios would be constant, like you see in population growth or radioactive decay. However, the normal distribution is “super-exponential”, due to the fact that the exponent is quadratic in the independent variable, and thus the output RATIOS for constant input INCREMENTS are not constant.

    This illustrates that if two populations with the same standard deviation have slightly different means, then the relative proportions of the two populations can vary a lot when you look several standard deviations away from their (nearly identical) means.

  13. Jonathan:

    Rebecca, I appreciate the fact that you are just looking at the statistical side of this discussion, especially since the relevance of just about any of these measurements is questionable. Having said that, while you illustrate your statistical points well, I am having trouble understanding what your piece has to do with what Summers and Tierney have actually said.

  14. Tanya Khovanova:

    Just stumbled upon “Are tests biased against students who do not give a shit?” in the Onion:

    http://www.theonion.com/video/in-the-know-are-tests-biased-against-students-who,17966/

  15. involved:

    So boys are also injected with a chemical that makes those on the left tail of the curve try to fail? How’s that for nonsense!

  16. Sue VanHattum:

    I didn’t see these comments until now. The reason I refer to Tierney’s article as ‘sexist nonsense’ is because it’s written so badly. I’ve explained further in my post here.

    Thomas, many of the differences between women and men are not ‘natural’, they are enforced in many ways. Walk into a Toys R Us store some day to get a feel for how solid is the wall between pink and blue.