Archive for the ‘Algorithms’ Category.

## Hiring the Smartest People in the World

There is an array containing all the integers from 1 to n in some order, except that one integer is missing. Suggest an efficient algorithm for finding the missing number.

A friend gave me the problem above as I was driving him from the airport. He had just been at a job interview where they gave him two problems. This one can be solved in linear time and constant space.

But my friend was really excited by the next one:

There is an array containing all the integers from 1 to n in some order, except that one integer is missing and another is duplicated. Suggest an efficient algorithm for finding both numbers.

My friend found an algorithm that also works in linear time and constant space. However, the interviewer didn’t know that solution. The interviewer expected an algorithm that works in n log n time.

The company claims that they are looking for the smartest people in the world, and my friend had presented them with an impressive solution to the problem. Despite his excitement, I predicted that they would not hire him. Guess who was right?

I reacted like this because of my own story. Many years ago I was interviewing for a company that also wanted the smartest people in the world. At the interview, the guy gave me a list of problems, but said that he didn’t expect me to solve all of them — just a few. The problems were so difficult that he wanted to sit with me and read them together to make sure that I understood them.

The problems were Olympiad style, which is my forte. While we were reading them, I solved half of them. During the next hour I solved the rest. The interviewer was stunned. He told me of an additional problem that he and his colleagues had been trying to solve for a long time and couldn’t. He asked me to try. I solved that one as well. Guess what? I wasn’t hired. Hence, my reaction to my friend’s interview.

The good news: I still remember the problem they couldn’t solve:

A car is on a circular road that has several gas stations. The gas stations are running low on gas and the total amount of gas available at the stations and in the car is exactly enough for the car to drive around the road once. Is it true that there is a place on the road where the car can start driving, stopping to refuel at each station, so that the car completes a full circle without running out of gas? Assume that the car’s tank is large enough not to present a limitation.

## Interlocking Polyominoes

Sid Dhawan was one of our RSI 2011 math students. He was studying interlocking polyominoes under the mentorship of Zachary Abel.

A set of polyominoes is interlocked if no subset can be moved far away from the rest. It was known that polyominoes that are built from four or fewer squares do not interlock. The project of Dhawan and his mentor was to investigate the interlockedness of larger polyominoes. And they totally delivered.

They quickly proved that you can interlock polyominoes with eight or more squares. Then they proved that pentominoes can’t interlock. This left them with a gray area: what happens with polyominoes with six or seven squares? After drawing many beautiful pictures, they finally found the structure presented in our accompanying image. The system consists of 12 hexominoes and 5 pentominoes, and it is rigid. You cannot move a thing. That means that hexominoes can be interlocked and thus the gray area was resolved.

You can find the proofs and the details in their paper “Complexity of Interlocking Polyominoes”. As you can guess by the title, the paper also discusses complexity. The authors proved that determining interlockedness of a a system that includes hexominoes or larger polyominoes is PSPACE hard.

## Rubik’s Cube Game

My son Sergei invented the following game a couple of years ago. Two people, Alice and Bob, agree on a number, say, four. Alice takes a clean Rubik’s cube and secretly makes four moves. Bob gets the resulting cube and has to rotate it to the initial state in not more than four moves. Bob doesn’t need to retrace Alice’s moves. He just needs to find a short path back, preferably the shortest one. If he is successful, he gets a point and then it is Alice’s turn.

If they are experienced at solving Rubik’s cube, they can increase the difficulty and play this game with five or six moves.

By the way, how many moves do you need to solve any position on a Rubik’s cube if you know the optimal way? The cube is so complicated that people can’t always know the optimal way. They think that God can, so they called the diameter of the set of all possible Rubik’s cube positions, God’s Number. It was recently proven that God’s Number is 20. If Alice and Bob can increase the difficulty level to 20, that would mean that they can find the shortest path back to the initial state from any position of the cube, or, in short, that they would master God’s algorithm.

## Guessing the Suit

I recently published my new favorite math problem:

A deck of 36 playing cards (four suits of nine cards each) lies in front of a psychic with their faces down. The psychic names the suit of the upper card; after that the card is turned over and shown to him. Then the psychic names the suit of the next card, and so on. The psychic’s goal is to guess the suit correctly as many times as possible.
The backs of the cards are asymmetric, so each card can be placed in the deck in two ways, and the psychic can see which way the top card is oriented. The psychic’s assistant knows the order of the cards in the deck; he is not allowed to change the order, but he may orient any card in either of the two ways.
Is it possible for the psychic to make arrangements with his assistant in advance, before the latter learns the order of the cards, so as to ensure that the suits of at least (a) 19 cards, (b) 23 cards will be guessed correctly?
If you devise a guessing strategy for another number of cards greater than 19, explain that too.

If the psychic is only allowed to look at the backs of the cards, then the amount of transmitted information is 236, which is the same amount of information as suits for 18 cards. This number of guesses is achievable: the backs of every two cards can clue in the suit of the second card in the pair. This way the psychic can guess the suits of all even-numbered cards in the deck. So the problem is to improve on that. Using the info from the cards that the psychic is permitted to turn over can help too.

The problem is from the book Moscow Mathematical Olympiads, 2000-2005. The book and Russian blog discussions provide many different ideas on how to guess more than half of the deck.

Here is the list of ideas.

Idea 1. Counting cards. If you count cards you will know the suits of the last cards.

Idea 2. Trading. As we discussed before, the psychic can correctly guess the suits of even-numbered cards. By randomly guessing the odd-numbered cards she can correctly guess on average the suits of 4.5 additional cards. Unfortunately, this is not guaranteed. But wait. What if we trade the knowledge of the second card’s suit for the majority suit among odd-numbered cards?

Idea 3. Three cards. Suppose we have three cards. Three bits can provide the following knowledge: the majority color, plus the suit of the first and of the second cards in the majority color. Thus, three bits of information will allow the psychic to guess the suits of two cards out of three.

Idea 4. Which card. Suppose the assistant signals the suits of even-numbered cards. With no loss, the psychic can guess the even-numbered card and repeat the same suit for the next card. If this is the plan, the assistant can choose which of the two cards to describe. Which card of the two matches the psychic’s guess provides an additional bit of information.

Idea 5. Surprise. Suppose we have a strategy to inform the psychic about some cards. Suppose the assistant deliberately fails on one of the cards. Then the index of this card provides info to the psychic.

I leave it to my readers to use these ideas to find the solution for 19, 23, 24 and maybe even for 26 cards.

## Binary Bulls Explained

I recently posted an essay Binary Bulls without Cows with the following puzzle:

The test Victor is taking consists of n “true” or “false” questions. In the beginning, Victor doesn’t know any answers, but he is allowed to take the same test several times. After completing the test each time, Victor gets his score — that is, the number of his correct answers. Victor uses the opportunity to re-try the test to figure out all the correct answers. We denote by a(n) the smallest numbers of times Victor needs to take the test to guarantee that he can figure out all the answers. Prove that a(30) ≤ 24, and a(8) ≤ 6.

There are two different types of strategies Victor can use to succeed. First, after each attempt he can use each score as feedback to prepare his answers for the next test. Such strategies are called adaptive. The other type of strategy is one that is called non-adaptive, and it is one in which he prepares answers for all the tests in advance, not knowing the intermediate scores.

Without loss of generality we can assume that in the first test, Victor answers “true” for all the questions. I will call this the base test.

I would like to describe my proof that a(30) ≤ 24. The inequality implies that on average five questions are resolved in four tries. Suppose we have already proven that a(5) = 4. From this, let us map out the 24 tests that guarantee that Victor will figure out the 30 correct answers.

As I mentioned earlier, the first test is the base test and Victor answers every question “true.” For the second test, he changes the first five answers to “false,” thus figuring out how many “true” answers are among the first five questions. This is equivalent to having a base test for the first five questions. We can resolve the first five questions in three more tests and proceed to the next group of five questions. We do not need the base test for the last five questions, because we can figure out the number of “true” answers among the last five from knowing the total score and knowing the answers for the previous groups of five. Thus we showed that a(mn) ≤ m a(n). In particular, a(5) = 4 implies a(30) ≤ 24.

Now I need to prove that a(5) = 4. I started with a leap of faith. I assumed that there is a non-adaptive strategy, that is, that Victor can arrange all four tests in advance. The first test is TTTTT, where I use T for “true” and F for “false.” Suppose for the next test I change one of the answers, say the first one. If after that I can figure out the remaining four answers in two tries, then that would mean that a(4) = 3. This would imply that a(28) ≤ 21 and, therefore, a(30) ≤ 23. If this were the case, the problem wouldn’t have asked me to prove that a(30) ≤ 24. By this meta reasoning I can conclude that a(4) ≠ 3, which is easy to check anyway. From this I deduced that all the other tests should differ from the base test in more than one answer. Changing one of the answers is equivalent to changing four answers, and changing two answers is equivalent to changing three answers. Hence, we can assume that all the other tests contain exactly two “false” answers. Without loss of generality, the second test is FFTTT.

Suppose for the third test, I choose both of my “false” answers from among the last three questions, for example, TTFFT. This third test gives us the exactly the same information as the test TTTTF, but I already explained that having only one “false” answer is a bad idea. Therefore, my next tests should overlap with my previous non-base tests by exactly one “false” answer. The third test, we can conclude, will be FTFTT. Also, there shouldn’t be any group of questions that Victor answers the same for every test. Indeed, if one of the answers in the group is “false” and another is “true,” Victor will not figure out which one is which. This uniquely identifies the last test as FTTFT.

So, if the four tests work they should be like this: TTTTT, FFTTT, FTFTT, FTTFT. Let me prove that these four tests indeed allow Victor to figure out all the answers. Summing up the results of the last three tests modulo 2, Victor will get the parity of the number of correct answers for the first four questions. As he knows the total number of correct answers, he can deduce the correct answer for the last question. After that he will know the number of correct answers for the first four questions and for every pair of them. I will leave it to my readers to finish the proof.

Knop and Mednikov in their paper proved the following lemma:

If there is a non-adaptive way to figure out a test with n questions by k tries, then there is a non-adaptive way to figure out a test with 2n + k − 1 questions by 2k tries.

Their proof goes like this. Let’s divide all questions into three non-overlapping groups A, B, and C that contain n, n, and k − 1 questions correspondingly. By our assumptions there is a non-adaptive way to figure out the answers for A or B using k tries. Let us denote subsets from A that we change to “false” for k − 1 non-base tests as A1, …, Ak-1. Similarly, we denote subsets from B as B1, …, Bk-1.

Our first test is the base test that consists of all “true” answers. For the second test we change the answers to A establishing how many “true” answers are in A. In addition we have k − 1 questions of type Sum: we switch answers to questions in Ai ∪ Bi ∪ Ci; and type Diff: we switch answers to (A ∖ Ai) ∪ Bi. The parity of the sum of “false” answers in A − Ai + Bi and Ai + Bi + Ci is the same as in A plus Ci. But we know A’s score from the second test. Hence we can derive Ci. After that we have two equations with two unknowns and can derive the scores of Ai and Bi. From knowing the number of “true” answers in A and C, we can derive the same for B. Knowing A and Ai gives all the answers in A. Similarly for B. QED.

This lemma is powerful enough to answer the original puzzle. Indeed, a(2) = 2 implies a(5) ≤ 4, and a(3) = 3 implies a(8) ≤ 6.

## Binary Bulls without Cows

The following variation of a Bulls and Cows problem was given at the Fall 2008 Tournament of the Towns:

A test consists of 30 true or false questions. After the test (answering all 30 questions), Victor gets his score: the number of correct answers. Victor doesn’t know any answer, but is allowed to take the same test several times. Can Victor work out a strategy that guarantees that he can figure out all the answers after the 29th attempt? after the 24th attempt?

Let’s assume that we have a more general problem. There are n questions, and a(n) is the smallest number of times we need to take the test to guarantee that we can figure out the answers. First we can try all combinations of answers. This way we are guaranteed to know all the answers after 2n attempts. The next idea is to start with a baseline test, for example, to say that all the answers are true. Then we change answers one by one to see if the score goes up or down. After changing n − 1 answers we will know the answers to the first n − 1 questions. Plus we know the total number of true answers, so we know the answers to all the questions. We just showed that a(n)n.

This is not enough to answer the warm-up question in the problem. We need something more subtle.

Let’s talk about the second part of the problem. As we know, 24 = 4 ⋅ 6. So to solve the second part, on average, we need to find five correct answers per four tests. Is it true that a(5) ≤ 4? If so, can we use it to show that a(30) ≤ 24?

The following three cases are the most fun to prove: a(5) = 4, a(8) ≤ 6, and a(30) ≤ 24. Try it!

By the way, K. Knop and L. Mednikov wrote a paper (available in Russian) where they proved that a(n) is not more than the smallest number k such that the total number of ones in the binary expansion of numbers from 1 to k is at least n − 1. Which means they proved that a(30) ≤ 16.

## Rotor-Router Networks

I have two admirers, Alex and Mike. Alex lives next to my home and Mike lives next to my MIT office. I have a lousy memory, so I invented the following system to guarantee that I see both of my friends and also manage to come to my office from time to time. I have a sign hanging on the inside of my home door that says Office on one side and Alex on the other. When I approach the door, I can see right away where I went last time. So I flip the sign and that tells me where next to go. I have a similar sign inside my office door that tells me to go either to home or to Mike. Every evening I spend with one of my admirers discussing puzzles or having coffee. Late at night I come home to sleep in my own bed. Now let’s see what happens if today my home sign shows Office and the office sign shows Mike:

• Today. I flip the home sign to Alex and spend the evening with Alex.
• Tomorrow. I flip the home sign to Office and go to MIT. Later I flip the office sign to Home and return home. As I cannot stand to spend the evening at home alone, I go out again. I flip the home sign to Alex and spend the evening with Alex.
• The day after tomorrow. I flip the home sign to Office and go there. Later I flip the office sign to Mike and spend the evening with Mike.

After three days the signs return to their original positions, meaning that the situation is periodic and I will repeat this three-day pattern forever.

Let’s get back to reality. I am neither memory-challenged nor addicted to coffee. I invented Alex and Mike to illustrate a rotor-router network. In general my home is called a source: the place where I wake up and start the day. There can only be one source in the network. My admirers are called targets and I can have an infinite number of them. The network needs to be constructed in such a way that I always end up with a friend by the end of the day. There could be many other places that I can visit, other than my office: for example, the library, the gym, opera and so on. These places are other vertices of a network that could be very elaborate. Any place where I go, there is a sign that describes a pattern of where I go from there. The sign is called a rotor.

The patterns at every rotor might be more complicated than a simple sign. Those patterns are called rotor types. My sign is called 12 rotor type as it switches between the first and the second directions at every non-friend place I visit.

The sequence of admirers that I visit is called a hitting sequence and it can be proved that the sequence is eventually periodic. Surprisingly, the stronger result is also true: the hitting sequence is purely periodic.

The simple 12 rotor is universal. That means that given a set of friends and a fancy periodic schedule that designates the order I want to visit them in, I can create a network of my activities where every place has a sign of this type 12 and where I will end up visiting my friends according to my pre-determined periodic schedule.

It is possible to see that not every rotor type is universal. For example, palindromic rotor types generate only palindromic hitting sequences, thus they are not universal. The smallest such example, is rotor type 121. Also, block-repetitive rotor types, like 1122, generate block-repetitive hitting sequences.

It is a difficult and an interesting question to describe universal rotor types. My PRIMES student Xiaoyu He was given a project, suggested by James Propp, to prove or disprove the universality of the 11122 rotor type. This was the smallest rotor type the universality of which was not known. Xiaoyu He proved that 11122 is universal and discovered many other universal rotor types. His calculations support the conjecture that only palindromic or block-repetitive types are not universal. You can find these results and many more in his paper: On the Classification of Universal Rotor-Routers.

## All Roads Lead to Philosophy

Recently I stumbled on a cute xkcd comic with the hidden message:

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”.

Naturally, I started to experiment. The first thing I tried was mathematics. Here is the path: Mathematics — Quantity — Property — Modern philosophy — Philosophy.

Then I tried physics, which led me to mathematics: Physics — Natural science — Science — Knowledge — Fact — Information — Sequence — Mathematics.

Then I tried Pierre de Fermat, who for some strange reason led to physics first: Pierre de Fermat — French — France — Unitary state — Sovereign state — State — Social sciences — List of academic disciplines — Academia — Community — Living — Life — Objects — Physics.

The natural question is: what about philosophy? Yes, philosophy goes in a cycle: Philosophy — Reason — Rationality — philosophy.

The original comic talks about spark plugs. So I tried that and arrived at physics: Spark plug — Cylinder head — Internal combustion engine — Engine — Machine — Machine (mechanical) — Mechanical system — Power — Physics.

Then I tried to get far away from philosophy and attempted sex, unsuccessfully: Sex — Biology — Natural science. Then I tried dance: Dance — Art — Sense — Physiology — Science.

It is interesting to see how many steps it takes to get to philosophy. Here is the table for the words I tried:

Word # Steps
Mathematics 4
Physics 11
Pierre de Fermat 24
Spark plug 19
Sex 12
Dance 13

Mathematics wins. It thoroughly beats all the other words I tried. For now. Fans of sex might be disappointed by these results and tomorrow they might change the wiki essay about sex to start as:

Modern philosophy considers sex …

## Complexity of Periodic Strings

I recently stumbled upon some notes (in Russian) of a public lecture given by Vladimir Arnold in 2006. In this lecture Arnold defines a notion of complexity for finite binary strings.

Consider a set of binary strings of length n. Let us first define the Ducci map acting on this set. The result of this operator acting on a string a1a2…an is a string of length n such that its i-th character is |ai − a(i+1)| for i < n, and the n-th character is |an − a1|. We can view this as a difference operator in the field F2, and we consider strings wrapped around. Or we can say that strings are periodic and infinite in both directions.

Let’s consider as an example the action of the Ducci map on strings of length 6. Since the Ducci map respects cyclic permutation as well as reflection, I will only check strings up to cyclic permutation and reflection. If I denote the Ducci map as D, then the Ducci operator is determined by its action on the following 13 strings, which represent all 64 strings up to cyclic permutation and reflection: D(000000) = 000000, D(000001) = 000011, D(000011) = 000101, D(000101) = 001111, D(000111) = 001001, D(001001) = 011011, D(001011) = 011101, D(001111) = 010001, D(010101) = 111111, D(010111) = 111101, D(011011) = 101101, D(011111) = 100001, D(111111) = 000000.

Now suppose we take a string and apply the Ducci map several times. Because of the pigeonhole principle, this procedure is eventually periodic. On strings of length 6, there are 4 cycles. One cycle of length 1 consists of the string 000000. One cycle of length 3 consists of the strings 011011, 101101 and 110110. Finally, there are two cycles of length 6: the first one is 000101, 001111, 010001, 110011, 010100, 111100, and the second one is shifted by one character.

We can represent the strings as vertices and the Ducci map as a collection of directed edges between vertices. All 64 vertices corresponding to strings of length 6 generate a graph with 4 connected components, each of which contains a unique cycle.

The Ducci map is similar to a differential operator. Hence, sequences that end up at the point 000000 are similar to polynomials. Arnold decided that polynomials should have lower complexity than other functions. I do not completely agree with that decision; I don’t have a good explanation for it. In any case, he proposes the following notion of complexity for such strings.

Strings that end up at cycles of longer length should be considered more complex than strings that end up at cycles with shorter length. Within the connected component, the strings that are further away from the cycle should have greater complexity. Thus the string 000000 has the lowest complexity, followed by the string 111111, as D(111111) = 000000. Next in increasing complexity are the strings 010101 and 101010. At this point the strings that represent polynomials are exhausted and the next more complex strings would be the three strings that form a cycle of length three: 011011, 101101 and 110110. If we assign 000000 a complexity of 1, then we can assign a number representing complexity to any other string. For example, the string 111111 would have complexity 2, and strings 010101 and 101010 would have complexity 3.

I am not completely satisfied with Arnold’s notion of complexity. First, as I mentioned before, I think that some high-degree polynomials are so much uglier than other functions that there is no reason to consider them having lower complexity. Second, I want to give a definition of complexity for periodic strings. There is a slight difference between periodic strings and finite strings that are wrapped around. Indeed, the string 110 of length 3 and the string 110110 of length 6 correspond to the same periodic string, but as finite strings it might make sense to think of string 110110 as more complex than string 110. As I want to define complexity for periodic strings, I want the complexity of the periodic strings corresponding to 110 and 110110 to be the same. So this is my definition of complexity for periodic strings: let’s call the complexity of the string the number of edges we need to traverse in the Ducci graph until we get to a string we saw before. For example, let us start with string 011010. Arrows represent the Ducci map: 011010 → 101110 → 110011 → 010100 → 111100 → 000101 → 001111 → 010001 → 110011. We saw 110011 before, so the number of edges, and thus the complexity, is 8.

The table below describes the complexity of the binary strings of length 6. The first column shows one string in a class up to a rotation or reflections. The second column shows the number of strings in a class. The next column provides the Ducci map of the given string, followed by the length of the cycle. The last two columns show Arnold’s complexity and my complexity.

String s # of Strings D(s) Length of the end cycle Arnold’s complexity My complexity
000000 1 000000 1 1 1
000001 6 000011 6 9 8
000011 6 000101 6 8 7
000101 6 001111 6 7 6
000111 6 001001 3 6 5
001001 3 011011 3 5 4
001011 12 011101 6 9 8
001111 6 010001 6 7 6
010101 2 111111 1 3 3
010111 6 111001 6 8 7
011011 3 101101 3 4 3
011111 6 100001 6 9 8
111111 1 000000 1 2 2

As you can see, for examples of length six my complexity doesn’t differ much from Arnold’s complexity, but for longer strings the difference will be more significant. Also, I am pleased to see that the sequence 011010, the one that I called The Random Sequence in one of my previous essays, has the highest complexity.

I know that my definition of complexity is only for periodic sequences. For example, the binary expansion of pi will have a very high complexity, though it can be represented by one Greek letter. But for periodic strings it always gives a number that can be used as a measure of complexity.

## A Nerd’s Way to Walk Up the Stairs

The last time I talked to John H. Conway, he taught me to walk up the stairs. It’s not that I didn’t know how to do that, but he reminded me that a nerd’s goal in climbing the steps is to establish the number of steps at the end of the flight. Since it is boring to just count the stairs, we’re lucky to have John’s fun system.

His invention is simple. Your steps should be in a cycle: short, long, long. Long in this case means a double step. Thus, you will cover five stairs in one short-long-long cycle. In addition, you should always start the first cycle on the same foot. Suppose you start on the left foot, then after two cycles you are back on the left foot, having covered ten stairs. While you are walking the stairs in this way, it is clear where you are in the cycle. By the end of the staircase, you will know the number of stairs modulo ten. Usually there are not a lot of stairs in a staircase, so you can easily estimate the total if you know the last digit of that number.

I guess I am not a true nerd. I have lived in my apartment for eight years and have never bothered to count the number of steps. That is, until now. Having climbed my staircase using John’s method, I now know that the ominous total is 13. Oh dear.