Archive for the ‘Statistics’ Category.

Statistics Jokes

* * *

Do you know a statistics joke?
Probably, but it’s mean!

* * *

Twelve different world statisticians studied Russian roulette. Ten of them proved that it is perfectly safe. The other two scientists were unfortunately not able to join the final discussion.

* * *

A statistician bought a new tool that finds correlations between different fields in databases. Hoping for new discoveries he ran his new tool on his large database and found highly correlated events. These are his discoveries:

  • The most correlated fields were the title and the gender. If the title is Mr., then the gender is male.
  • The children have the same last names as parents.
  • The children are much younger than the parents.
  • The main cause of divorces is weddings.

* * *

Scientists discovered that the main cause of living ’till old age is an error on the birth certificate.

* * *

Scientists concluded that children do not really use the Internet. This is proven by the fact that the percentage of people saying ‘No’ when asked ‘Are you over 18?’ is close to zero.

* * *

— Please, close the window, it is cold outside.
— Do you think it will get warmer, after I close it?


Puzzling Grades Resolved

This story started when my student asked for an explanation for his grade B in linear algebra. He was slightly above average on every exam and the cut-off for an A was the top 50 percent of the class. I wrote a post in which I asked my readers to explain the situation. Here is my explanation.

The picture below contains histogram for a typical first midterm linear algebra exam.

First Midterm Histogram

The spike in the lowest range indicates zeros for those who missed the exam.

The mean is 74.7 and the median 81.5. As you can see the median is 7 points higher than the mean. That means that if a student performs around average on all the exams, s/he is in the bottom half of the class.

But this is not the whole story. In addition to the above, MIT allows students to drop the class after the second midterm. Suppose 30 students with lower grades drop the class; then the recalculated median for the first midterm for the students who finish the course goes up to 85. This is a difference of more than 10 points from the original average.

If this was a statistics class, then I could have told the puzzled student that he deserves that B. Instead I told him that he didn’t even have the highest score among those with Bs. Somehow that fact made him feel better.


Puzzling Grades

I lead recitations for a Linear Algebra class at MIT. Sometimes my students are disappointed with their grades. The grades are based on the final score, which is calculated by the following formula: 15% for homework, 15% for each of the three midterms, and 40% for the final. After all the scores are calculated, we decide on the cutoffs for A, B, and other grades. Last semester, the first cutoff was unusually low. The top 50% got an A.

Some students who were above average on every exam assumed they would get an A, but nonetheless received a B. The average scores for the three midterm exams and for the final exam were made public, so everyone knew where they stood relative to the average.

The average scores for homework are not publicly available, but they didn’t have much relevance because everyone was close to 100%. However, a hypothetical person who is slightly above average on everything, including the homework, should not expect an A, even if half the class gets an A. There are two different effects that cause this. Can you figure them out?


Hat Puzzle: Create a Distribution

Here is a setup that works for the several puzzles that follow it:

The sultan decides to test his hundred wizards. Tomorrow at noon he will randomly put a red or a blue hat—from his inexhaustible supply—on every wizard’s head. Each wizard will be able to see every hat but his own. The wizards will not be allowed to exchange any kind of information whatsoever. At the sultan’s signal, each wizard needs to write down the color of his own hat. Every wizard who guesses wrong will be executed. The wizards have one day to decide together on a strategy.

I wrote about puzzles with this setup before in my essay The Wizards’ Hats. My first request had been to maximize the number of wizards who are guaranteed to survive. It is easy to show that you cannot guarantee more than 50 survivors. Indeed, each wizard will be right with probability 0.5. That means whatever the strategy, the expected number of wizards guessing correctly is 50. My second request had been to maximize the probability that all of them will survive. Again, the counting argument shows that this probability can’t be more than 0.5.

Now here are some additional puzzles, including the first two mentioned above, based on the same setup. Suggest a strategy—or prove that it doesn’t exist—in which:

  1. 50 wizards will be guaranteed to survive.
  2. 100 wizards will survive with probability 0.5.
  3. 100 wizards will survive with probability 0.25 and 50 wizards will survive with probability 0.5.
  4. 75 wizards will survive with probability 1/2, and 25 wizards survive with probability 1/2.
  5. 75 wizards will survive with probability 2/3.
  6. The wizards will survive according to a given distribution. For which distributions is it possible?

As I mentioned, I already wrote about the first two questions. Below are the solutions to those questions. If you haven’t seen my post and want to think about it, now is a good time to stop reading.

To guarantee the survival of 50 wizards, designate 50 wizards who will assume that the total number of red hats is odd, and the rest of the wizards will assume that the total number of red hats is even. The total number of red hats is either even or odd, so one of the groups is guaranteed to survive.

To make sure that all of them survive together with probability 0.5, they all need to assume that the total number of red hats is even.


Kolmogorov Student Olympiad in Probability

There are too many Olympiads. Now there is even a special undergraduate Olympiad in probability, called Kolmogorov Student Olympiad in Probability. It is run by the Department of Probability Theory of Moscow State University. I just discovered this tiny Olympiad, though it has been around for 13 years.

A small portion of the problems are accessible for high school students. These are the problems that I liked. I edited them slightly for clarity.

Second Olympiad. Eight boys and seven girls went to movies and sat in the same row of 15 seats. Assuming that all the 15! permutations of their seating arrangements are equally probable, compute the expected number of pairs of neighbors of different genders. (For example, the seating BBBBBBBGBGGGGGG has three pairs.)

Third Olympiad. One hundred passengers bought assigned tickets for a 100-passenger railroad car. The first 99 passengers to enter the car get seated randomly so that all the 100! possible permutations of their seating arrangements are equally probable. However, the last passenger decides to take his reserved seat. So he arrives at his seat and if it is taken he asks the passenger in his seat to move elsewhere. That passenger does the same thing: she arrives at her own seat and if it is taken, she asks the person to move, and so on. Find the expected number of moved passengers.

Third Olympiad. There are two 6-sided dice with numbers 1 through 6 on their faces. Is it possible to “load” the dice so that when the two dice are thrown the sum of the numbers on the dice are distributed uniformly on the set {2,…,12}? By loading the dice we mean assigning probabilities to each side of the dice. You do not have to “load” both dice the same way.

Sixth Olympiad. There are M green and N red apples in a basket. We take apples out randomly one by one until all the apples left in the basket are red. What is the probability that at the moment we stop the basket is empty?

Seventh Olympiad. Prove that there exists a square matrix A of order 11 such that all its elements are equal to 1 or −1, and det A > 4000.

Twelfth Olympiad. In a segment [0,1] n points are chosen randomly. For every point one of the two directions (left or right) is chosen randomly and independently. At the same moment in time all n points start moving in the chosen direction with speed 1. The collisions of all points are elastic. That means, after two points bump into each other, they start moving in the opposite directions with the same speed of 1. When a point reaches an end of the segment it sticks to it and stops moving. Find the expected time when the last point sticks to the end of the segment.

Thirteenth Olympiad. Students who are trying to solve a problem are seated on one side of an infinite table. The probability that a student can solve the problem independently is 1/2. In addition, each student will be able to peek into the work of his or her right and left neighbor with a probability of 1/4 for each. All these events are independent. Assume that if student X gets a solution by solving or copying, then the students who had been able to peek into the work of student X will also get the solution. Find the probability that student Vasya gets the solution.


IQ Migration

The Russian website has a big collection of math problems. I use it a lot in my work as a math Olympiad coach. Recently I was giving a statistics lesson. While there was only one statistics problem on the website, it was a good one.

Assume that every person in every country was tested for IQ. A country’s IQ rating is the average IQ of the population. We also assume that for the duration of this puzzle no one is born and no one dies.

  • A group of citizens of country A emigrated to B. Show that the rating of both countries can go up.
  • After that a group of citizens of B (which may include former citizens of A) emigrated to A. Is it possible that the ratings of both countries go up again?
  • A group of citizens of A emigrated to B, and a group of citizens of B emigrated to C. As a result, the ratings of each country increased. After that the migration went the opposite way: some citizens of C moved to B, and some citizens from B moved to A. As a result, the ratings of all three counties went up once more. Is this possible? If yes, then how? If no, then why not?

Reverse Bechdel Test

A movie passes the Bechdel Test if these three statements about it are true:

  • There are at least two named women in it
  • Who talk to each other
  • About something besides a man.

Surely there should be a movie where two women talk about the Bechdel test. But I digress.

The Bechdel test website rates famous movies. Currently they have rated 4,683 movies and 56% pass the test. More than half of the movies pass the test. There is hope. Right? Actually they have a separate list of the top 250 famous movies. Only 70 movies, or 28%, from this list pass the test.

My son Alexey suggested the obvious reverse Bechdel test, which is more striking than the Bechdel test. A movie doesn’t pass the test if it

  • Has at least two named men characters
  • Whenever they talk to each other
  • They only talk about women.

I can’t think of any movie like that. Can you?


Fraternal Birth Order and Fecundity

Two interesting research results about male homosexuality are intertwined. The first one shows that the probability of homosexuality in a man increases with the number of older brothers. That is, if a boy is the third son in a family, the probability of him being a homosexual is greater than the probability of a first son in a family being homosexual. The second research result shows that the probability of homosexuality increases with the number of children the mother has. So if a woman is fertile and has many children, the probability that each of her sons is a homosexual is greater than the probability that an only child is a homosexual.

Many people conclude from the first result that a woman undergoes hormonal or other changes while being pregnant with boys that influence the probability of future boys being homosexual. Looking at the second result, researchers conclude that homosexuality has a genetic component. Moreover, that component is tied up with the mother’s fecundity. The same genes are responsible for both the mother having many children and for her sons being homosexual. This assumption explains why homosexuality is not dying out in the evolution process.

In one of my previous essays I showed that the first results influences the second result. If each next son is homosexual with higher probability, then the more children a mother has the more probable it is that her sons are homosexuals. That means that the second result is a mathematical consequence of the first result. Therefore, the conclusion that the second result implies a genetic component might be wrong. The correlation between homosexuality and fecundity could be the consequence of hormonal changes.

Now let’s look at this from the opposite direction. I will show that the first result is the mathematical consequence of the second result: namely, if fertile women are more probable to give birth to homosexuals, then the probability that the second sons are is higher than the probability that the first sons are gay.

For simplicity let’s only consider mothers with one or two boys. Suppose the probability of a son of a one-son mother to be a homosexual is p1. Suppose the probability of a son of a two-sons mother to be a homosexual is p2. The data shows that p2 is greater than p1. What is the consequence? Suppose the number of mothers with one son is m1 and the number of mothers with two sons is m2. Then in the whole population the probability of a boy who is the first son to be gay is (p1m1+p2m2)/(m1+m2) and the probability of a boy who is the second son to be gay is p2. It is easy to see that the first probability is smaller than the second one.

Let me create an extreme hypothetical example. Suppose mothers of one son always have straight sons, and mothers of two sons always have gay sons. Now consider a random boy in this hypothetical setting. If he’s the second son, he is always gay, while if he is the first son he is not always gay.

We can conclude that if the probability of having homosexual sons depends on fecundity, then the higher numbered children would be gay with higher probability than the first-born. This means that if the genetics argument is true and being a homosexual depends on the mother’s fecundity gene, then it would follow mathematically that the probability of homosexuality increases with birth order. The conclusion that homosexuality depends on hormonal changes might not be valid.

So what is first, chicken or egg? Is homosexuality caused by fecundity, while birth order correlation is just the consequence? Or vice versa? Is homosexuality caused by the birth order, while correlation with fecundity is just the consequence?

What do we do when the research results are so interdependent? To untangle them we need to look at the data more carefully. And that is easy to do.

To show that homosexuality depends on the order of birth independently of the mother’s fertility, we need to take all the families with two boys (or the same number of boys) and show that in such families the second child is more probable to be homosexual than the first child.

To show the dependence on fertility, without the influence of the birth order, we need to take all first-born sons and show that they are more probable to be homosexuals if their mothers have more children.

It would be really interesting to look at this data.


Was I Dead?

Once when I was working at Telcordia, I received a phone call from my doctor’s office. Here is how it went:

— Are you Tanya Khovanova?
— Yes.
— You should come here immediately and redo your blood test ASAP.
— What’s going on?
— Your blood count shows that you are dead.
— If I’m dead, then what’s the hurry?
Given that I wasn’t dead, the conclusion was that there had been a mistake in the test. If there had been a mistake, the probability that something was wrong after the test was the same as it was before the test. There was no hurry.


Happy Nobel Prize Winners

I stumbled upon an article, Winners Live Longer, that says:

“When 524 nominees for the Nobel Prize were examined and compared to the actual winners from 1901 to 1950, the winners lived longer by 1.4 years. Why? It seems just having won and knowing you are on top gives you a boost of 1.8% to your life expectancy.”

This goes on top of the pile of Bad Conclusions From Statistics. With any kind of awards where people can be nominated several times, winners on average would live longer. The reason is that nominees who die early lose their chance to be nominated again and to win.

I wonder what would happen if we were to compare Fields medal nominees and winners. There is a cut off age of 40 for receiving a Fields medal. If we compare the life span of Fields medal winners and nominees who survived past 40, we might get a better picture of how winning affects life expectancy.

Living a long life increases your chances of getting a Nobel Prize, but doesn’t help you get a Fields medal.