Suppose that we choose all families with two children, such that one of them is a son named Luigi. Given that the probability of a boy to be named Luigi is p, what is the probability that the other child is a son?
Here is a potential “solution.” Luigi is a younger brother’s name in one of the most popular video games: Super Mario Bros. Probably the parents loved the game and decided to name their first son Mario and the second Luigi. Hence, if one of the children is named Luigi, then he must be a younger son. The second child is certainly an older son named Mario. So, the answer is 1.
The solution above is not mathematical, but it reflects the fact that children’s names are highly correlated with each other.
Let’s try some mathematical models that describe how the parents might name their children and see what happens. It is common to assume that the names of siblings are chosen independently. In this case the first son (as well as the second son) will be named Luigi with probability p. Therefore, the answer to the puzzle above is (2-p)/(4-p).
The problem with this model is that there is a noticeable probability that the family has two sons, both named Luigi.
As parents usually want to give different names to their children, many researchers suggest the following naming model to avoid naming two children in the same family with the same name. A potential family picks a child’s name at random from a distribution list. Children are named independently of each other. Families in which two children are named the same are crossed out from the list of families.
There is a problem with this approach. When we cross out families we may disturb the balance in the family gender distributions. If we assume that boys’ and girls’ names are different then we will only cross out families with children of the same gender. Thus, the ratio of different-gender families to same-gender families will stop being 1/1. Moreover, it could happen that the number of boy-boy families will differ from the number of girl-girl families.
There are several ways to adjust the model. Suppose there is a probability distribution of names that is used for the first son. If another son is born, the name of the first son is crossed out from the distribution and following that we proportionately adjust the probabilities of all other names for this family. In this model the probability of naming the first son by some name and the second son by the same name changes. For example, the most popular name’s probability decreases with consecutive sons, while the least popular name’s probability increases.
I like this model, because I think that it reflects real life.
Here is another model, suggested by my son Alexey. Parents give names to their children independently of each other from a given distribution list. If they give the same name to both children the family is crossed-out and replaced with another family with children of the same genders. The advantage of this model is that the first child and the second child are named independently from each other with the same probability distribution. The disadvantage is that the probability distribution of names in the resulting set of families will be different from the probability distribution of names in the original preference list.
I would like my readers to comment on the models and how they change the answer to the original problem.