## A Son Named Luigi

Suppose that we choose all families with two children, such that one of them is a son named Luigi. Given that the probability of a boy to be named Luigi is p, what is the probability that the other child is a son?

Here is a potential “solution.” Luigi is a younger brother’s name in one of the most popular video games: Super Mario Bros. Probably the parents loved the game and decided to name their first son Mario and the second Luigi. Hence, if one of the children is named Luigi, then he must be a younger son. The second child is certainly an older son named Mario. So, the answer is 1.

The solution above is not mathematical, but it reflects the fact that children’s names are highly correlated with each other.

Let’s try some mathematical models that describe how the parents might name their children and see what happens. It is common to assume that the names of siblings are chosen independently. In this case the first son (as well as the second son) will be named Luigi with probability p. Therefore, the answer to the puzzle above is (2-p)/(4-p).

The problem with this model is that there is a noticeable probability that the family has two sons, both named Luigi.

As parents usually want to give different names to their children, many researchers suggest the following naming model to avoid naming two children in the same family with the same name. A potential family picks a child’s name at random from a distribution list. Children are named independently of each other. Families in which two children are named the same are crossed out from the list of families.

There is a problem with this approach. When we cross out families we may disturb the balance in the family gender distributions. If we assume that boys’ and girls’ names are different then we will only cross out families with children of the same gender. Thus, the ratio of different-gender families to same-gender families will stop being 1/1. Moreover, it could happen that the number of boy-boy families will differ from the number of girl-girl families.

There are several ways to adjust the model. Suppose there is a probability distribution of names that is used for the first son. If another son is born, the name of the first son is crossed out from the distribution and following that we proportionately adjust the probabilities of all other names for this family. In this model the probability of naming the first son by some name and the second son by the same name changes. For example, the most popular name’s probability decreases with consecutive sons, while the least popular name’s probability increases.

I like this model, because I think that it reflects real life.

Here is another model, suggested by my son Alexey. Parents give names to their children independently of each other from a given distribution list. If they give the same name to both children the family is crossed-out and replaced with another family with children of the same genders. The advantage of this model is that the first child and the second child are named independently from each other with the same probability distribution. The disadvantage is that the probability distribution of names in the resulting set of families will be different from the probability distribution of names in the original preference list.

I would like my readers to comment on the models and how they change the answer to the original problem.

Share:

1. #### Tanya Khovanova:

This piece was inspired by an email I received from JeffJo:

Tanya

I tripped across your paper about the Tuesday Boy problem on arXiv.
Unfortunately, I doubt that your argument, which I agree with, will convince
anybody. When it comes to trivial probability problems like this one, people
have a way of making up reasons for their own answer to be right, and any
other answer to be wrong. Which is why such controversies exist in the first
problem, not an alternate event in the same problem.

What is doubly ironic is that (1) in 1889, using an almost identical problem
as a cautionary tale, Joseph Bertrand warned the world about the
incorrectness of the arguments that lead to the 1/3 answer; and (2) most
published accounts that claim 1/3 is correct will also discuss either the
Monty Hall Problem, or the Three Prisoner’s Problem, which are
mathematically (but not logically, in some senses) equivalent to the one
Bertrand used. Many even link them to it. They always use the solution
method that would get 1/2 in the Two Child Problem.

Say a random process produces a set of equally likely arrangements. Say you
learn information X about the arrangement of one trial, such that some
arrangements become impossible. It is tempting, but generally incorrect as
Bertrand warned us, to say the remaining cases are still equally likely.
Instead, you need to know the probability that you would learn X in each of
the remaining cases.

All of these problems fit a pattern where you could only learn X in A cases,
you would learn X half of the time in B cases, and you can’t learn X in C
cases. If you ask for the conditional probability for the A cases or B cases
as a set, which these problems also always do, the answers are
P(A|X)=A/(A+B/2) and P(B|X)=(B/2)/(A+B/2). And technically, the denominator
should be (A+B/2+0*C). I point that out because it helps to see what the
formula is doing with these probabilities.

In Monty Hall (you pick from three doors, one of which has a prize; Monty
Hall opens one you didn’t pick that does not, and offers to let you switch
to the remaining door), most people count the two cases that remain
possible. They say the answer is that the probability for both remaining
doors is 1/2. But A=1 is the case where you originally chose incorrectly and
Monty had to open the only other door without the prize, B=1 is the case
where you chose correctly and Monty could open either of the other two
doors, and C=1 is the case where you chose incorrectly but Monty opened the
door with the prize (i.e., it is impossible). The probability your door
holds the prize is P(B|X) = (1/2)/(1+1/2)=1/3, and that the other unopened
door holds it is P(A|X)=1/(1+1/2)=2/3.

In Three Prisoners (either you, or one of two other convicts, will be
pardoned; the warden knows who but can’t say, but you convince him to tell
you which of the others won’t get it), most people think you tricked the
warden and your chances have gone up to 1/2.
But A=1 is the case the other prisoner the warden didn’t name gets the
pardon so he had only one choice to name, B=1 is the case where you get it
and he had to choose between two names, and C=1 is the case where the named
prisoner gets it. Your chances are still P(B|X)=1/3.

Bridge offers a similar kind of problem, and they call this the Principle of
Restricted Choice (you are missing the King and Queen of a suit; East had a
chance where it was best for him to play one of them if he could, and played
the King; what are the chances he has the Queen?). Most people would say it
could be in two places, so the probability is 1/2 it is in either. But A=1
is the case where he had only the King so he had to play it, B=1 is the case
where he had both cards, and C=2 are the cases where he did not have the
King. P(B|X) = (1/2)/(1+1/2)=1/3; the case where his choice of cards to play
was restricted – King only – are increased in probability. I mention this
because it actually gets tested, and has been verified by experience.

Bertrand’s Box Paradox (you are told that three identical boxes each hold
two coins; one holds two gold coins, one has two silver coins, and one has
one of each; a box is chosen, and someone looks in and pulls out a gold
coin?). Most people will say this box is either GG or GS, so the probability
it is GG is 1/2. That is wrong. The arrangement is either G1G2-G1, G1G2-G2,
or G1S2-G1, where I indicated the coin you were shown, so the answer is 2/3.
A conditional probability analysis gets the same formulas as before, but is
logically different since the choices work differently. A=1 is the case
where the chosen box has two gold coins, B=1 is the case where it holds a
gold and a silver coin, and C=1 is the case where it holds two silver coins.

Now, in Bertrand’s Box Paradox replace silver coins with bronze ones, and
add a second box with one of each kind. This is now identical, even in the
labels it uses, to the Two Child Problem. A=1 is the BB case, B=2 is the BG
and GB cases, and C=1 is the GG case. P(BB|X)=1/(1+2/2)=1/2.

+++++

But you said that you had never seen a similar problem about girls. There
have been some published, with what appears to be a long history linking
them that makes it similar to Tuesday Boy. In 2008, it went just a little
less viral than that one did in 2010.

In 1988, John Allen Paulos of Temple University published a book titled
“Innumeracy,” comparing the lack of mathematical skills to illiteracy, the
lack of reading skills. It was updated sometime in the 1990s, and he
apparently clarified his. I read the original, but this quote is the edited
one: “Consider now some randomly selected family of four which is known to
have at least one daughter. Say Myrtle is her name. Given this, what is the
conditional probability that Myrtle’s sibling is a brother? Given that
Myrtle has a younger sibling, what is the conditional possibility that her
sibling is a brother? The answers are, respectively, 2/3 and 1/2.” The
version I recall was more like “Consider now some randomly selected family
of four. What is the probability that Myrtle has a brother?”

In 1995, J. Laurie Snell of Dartmouth edited a collection of articles titled
“Topics in Contemporary Probability and Its Applications.” In it, he
co-authored an article with Robert Vanderbei of Princeton, titled “Three
Bewitching Paradoxes.” You can preview the appropriate selection at
ook_result&ct=result&resnum=1&ved=0CBMQ6AEwAA#v=onepage&q&f=false. They said
that Paulos changed the problem when he called the girl Myrtle. Assume that
the probability a girl would be named Myrtle is p. Because we know the name,
just like knowing “Tuesday,” the answer to the more classic question, that
she has a sister, should be (2-p)/(4-p). This is wrong; and not just for the
“Myrtle-centric” arguments you might provide. There are at least two other
errors that I’ll get to. But for now note that a first-order approximation
to this function of p is 1/2-p/8.

In 2008, Leonard Mlodinow of Cal Tech published a book called “The
Drunkard’s Walk: How Randomness Rules Our Lives.” He asked both the
“classic” version, and the “name” version. These quotes may not be exactly
how they appeared in the book, but come from a NY Times on-line review: “You
know that a certain family has two children, and that at least one is a
girl. But you can’t recall whether both are girls. What is the probability
that the family has two girls?” And For the second version, he added “you
remember that at least one is a girl with a very unusual name (that, say,
one in a million females share).” He used the name “Florida” as an example.
I think it is quite possible that Mlodinow based this problem on Snell &
Vanderbei’s assessment of Paulos’ problem; but that is pure speculation.

Mlodinow hinted at Snell & Vanderbei’s first error, that they included a
family with two girls named Myrtle/Florida in their count (it’s also
possible that Gary Foshee got his by removing the concern). I assume he
didn’t want to try to redistribute that probability in the table, so he
argued that the p^2 term was too small to be concerned with, and the proper
answer was (2-p)/(4-P) which is “very nearly 1/2.” But in fact, by his other
assumptions (which include the second error I mentioned above) you don’t
need to do any complicated calculations. If the probability an older Florida
has a younger sister is 1/2, and the probability a younger Florida has an
older sister is 1/2, then the Law of Total Probability (which you use in
This is different than when people try, incorrectly, to use this law for the
classic problem. The law requires independent events, which we have if names
aren’t repeated, but not if the names are left out and “Tuesday” is used

An Italian mathematician named D’Agostini published a paper, rebutting
Mlodinow’s answer, on arXiv last January. He derived 1/2 as the answer, in a
quite roundabout and incorrect way, based on the assumption that you can’t
have two girls named Florida. It was wrong because you also must assume
every other name is similarly limited, which produces some interesting
results. It turns out that the table of the probability for the nine family
types (it’s 3×3 based on boys, non-Floridas, and Floridas) is not symmetric!
To see why, imagine a culture with only three girl’s names: Ann, Beth, and
Mary. The names Ann and Beth are equally popular, but Mary is twice as
popular as either one. It is easy to see that the probability the first girl
in a family gets named Mary is 1/2. But that means that half of the second
girls – who must be the younger in a 2-child family – can’t be named Mary,
while the other half have a 2/3 probability. And that in turn means the
overall probability a girl with an older sister will be named Mary is 1/3,
while it is 1/2 if she has an older brother! Similarly, the probabilities
for Anns (or Beths) are 1/3 if she has an older sister, and 1/4 if she has
an older brother. It is a general property that common names become less
probable, and uncommon ones become more probable.

I have derived the exact answer to the probability Florida has a sister,
based on the assumed distribution function for names, but it is rather
pointless since there is no good distribution to plug into it. But it does
have an interesting property. It depends on a function that has similar
properties to a moment-generating function, and if m is the moment under
that function, a first-order approximation is 1/2 + (m-p)/8. It is a
function that is essentially parallel to Snell and Vanderbei’s, but strictly
greater! And if we divide names into “common” or “uncommon” based on this
moment, common names mean the probability of a sister is less than 1/2,
uncommon names mean it is greater, and the difference from 1/2 for “very
unusual names” depends more on m than on p so it is observable!

But Mlodinow’s problem is not ambiguous at all. The Florida-centric
assumption is flat-out wrong, based on the wording above. You don’t have to
use age in the “classic” problem to divide family types into the four
groups. Any measure that is independent of gender works as well. I prefer to
alphabetize the children’s names. You could also order them by where they
sit, relative to mother, at the dinner table, or – and this is what provides
the answer – whichever child has a name that is more memorable to Leonard
Mlodinow. It does not matter that this ordering can’t be defined in general;
it does exist. Using it reduces the Florida problem to the Older Girl
problem, regardless of any funky properties caused by the probability
distribution function for names. The answer is an unambiguous 1/2.

And I claim this same argument – ordering by however you learned one gender
– applies to most versions of the Two Child Problem.

[…] Tanya Khovanova: A Son Named Luigi […]

3. #### JeffJo:

“Suppose there is a probability distribution of names that is used for the first son. If another son is born, the name of the first son is crossed out from the distribution and following that we proportionately adjust the probabilities of all other names for this family. In this model the probability of naming the first son by some name and the second son by the same name changes. For example, the most popular name’s probability decreases with consecutive sons, while the least popular name’s probability increases.

This clearly is the model to use; but you seem to balk at doing so – probably because the changes depend on the entire, unknowable distribution. But the entire distribution can be characterized with just one parameter, that can be expressed in a number of forms.

1. Assume the set of all possible names is {NI}, with distribution {PI}, for the first son. Note that sum(all I, PI)=1.

2. The second son can’t have the same name NJ as the first; but we can assume that the remaining names retain the same relative probabilities. So the distribution is {PI/(1-PJ)} for IJ.

3. Prob(2nd son named NI) = sum(all JI, PJ*(PI /(1-PJ)))
= PI * sum(all JI, PJ /(1-PJ))

4. We can now define QI=PI/(1-PI) and Q=sum(all I, QI). Here, QI is a number that is slightly more than PI, and Q is slightly more than 1. The actual value for Q depends on the entire distribution, but we don’t need to know that value.

5. Prob(2nd son named NI) = PI*(Q-QI). As you can see, this now depends only on the parameter QI and the probability PI. In practice, some of these probabilities must increase and some must decrease so that the sum remains 1.

6. And since QI is a monotonically decreasing function of PI, it is the more common names that decrease in probability, and the rarer ones that increase. Let C be the probability of a name that doesn’t change for the second son, so QC=C/(1-C). There may not be an actual name with this probability, but the Mean Value Theorem guarantees exactly one such probability exists. It happens to make a good dividing line between “common” and “uncommon.” Then, Q-QC=Q-C/(1-C)=1, so Q=1/(1-C). Any one of the three parameters (Q, QC, or C) can be used by itself to describe the distribution, but it will be easier to continue using Q in this derivation for a little while.

7. We can now make a table of the probability for each of the nine possible family combinations that include or exclude boy with any particular name. Here, “boy” means a boy not named NI, and all probabilities are multiplied by 4 to eliminate denominators:

Younger Child
NI Boy Girl
Older NI 0 PI PI
Child Boy PI*(Q-QI) 1-PI*(1+Q-QI) 1-PI
Girl PI 1-PI 1

The term in the center cell was chosen to make the sum of all nine equal 4.

8. The definition of conditional probability says P(2 boys|NI) = P(2 boys and NI)/P(NI) = (1+Q-QI)/(3+Q-QI) from the table above. It can also be written (2+QC-QI)/4+QC-QI), or (2+2C*PI-C-3PI)/ (4+4C*PI-3C-5PI). The second version compares most directly to the incorrect answer that allows duplicate names, (2-PI)/(4-PI).

9. I won’t show the derivation, but you can expand the denominator in a power series. Ignoring any terms with two or more of these small probabilities multiplied together, the old, incorrect formula is approximated 1/2–PI/8, and the new one is 1/2+C/8–PI/8. When the number of possible names is large, and it is, this is a very good approximation. It should also be true that C is of the order of magnitude of 1/(# names of reasonable expectancy). By that I mean you should count John and Rufus and Luigi, but not John-Jacob-Jinglehimmer-Smith.