## A Random Pair of Friends

Consider a group of people in which some are friends. We assume that friendship is symmetric: if Alice is Bob’s friend, then Bob is Alice’s friend. That means we can build a friendship graph where vertices are people and edges correspond to friendships. Let’s assume that every person has at least one friend, so the friendship graph doesn’t have isolated vertices.

Darla needs to conduct research by surveying random pairs of friends. But first, she has to find those pairs. To ensure that the pairs are randomly selected, she must pick two random people from the group, contact them, and ask them whether or not they are friends. If they are, she gives them her questionnaire. If not, Darla wasted tons of time and had to keep looking.

The group she is surveying is enormous. So, when she picks two random people, they typically have never even heard of each other. Bother!

Darla decides to speed up the process. She would pick a random person, ask them for a list of friends, and then randomly pick one person from the list. Since every person has at least one friend, Darla always ends up with a filled questionnaire.

Puzzle question. Why is Darla’s method wrong? Can you describe the pairs of friends her method favors?

Share:

1. #### Carl Feynman:

It favors people who have lots of friends who, in turn, have few friends.

2. #### Bruce Fleischer:

The answer should apply to an edge in the graph. Edges belonging to isolated couples, who have no other friends besides each other, are most over-represented.
Darla can do better the more work she is willing to do per questionnaire. Set a probability p < 1. Pick a random person. If that person has k friends and kp 1, generalize this to picking either
ceil(kp) or floor(kp) of their friendships. Smaller p reduces the favoring of friendships sharing a common popular person. If p is small enough, Darla will likely
check every person and not save any work.

The question makes implicit assumptions about the data structure representing the friendship graph, since it is computationally cheap to pick a random vertex but expensive to pick a random edge.

3. #### lvps1000vm:

Does it make sense to weight the answers by number of friends? Say she calls a random person, picks a name from the list of f friends, but the answers are weighted by f when tallying the results — instead of just tallying them take a weighted mean.

With population N, a frienship whose members have f and g friends might be either picked with probability 1/fN weighing f or probability 1/gN weighing g, so on average all friendhips weigh equally 2/N.