## Find the Disappearing Bits

Konstantin Knop posted the following puzzle on Facebook.

Puzzle. An agent sends messages to the command center. The messages have to be encrypted as a stream of 512 characters that can only be zeros and ones. Unfortunately, his transmitter is malfunctioning and gobbles 16 characters of each message. The missing 16 characters are always in the same positions in any message. As a result, the command center receives a sequence of 496 bits. Neither the center nor the agent knows which 16 bits of the sequence are eaten up by the device.
They cannot replace the broken transmitter. However, they can agree ahead of time to send K test messages, the content of which they both know. Find the smallest possible K needed to determine the positions of the disappearing 16 bits.

Share:

1. #### Lazar Ilic:

5 suffices with the test messages: 11…100…0…, 1111111100000000111…, 1111000011110000111…, 11001100110011001…, 10101010101010101…. I would imagine that a relatively simple proof exists. For example, consider the 16 subsets where the final bit and 15 of the first 16 bits are removed. Then by the Pigeonhole Principle >= 8 of these subsets return the same string after our first query no matter what it is… and then deducing which 1 of the first 16 bits was not disappearing… something like this might be easily repairable and made to work.

2. #### Sanandan Swaminathan:

My initial thought is that K = 5 test messages is the upper bound. Split the 512 bits into 32 alternating groups. Left to right, the first group would contain 16 zeros, the second group would contain 16 ones, the third group 16 zeros, and so on, and ending with the 32nd group containing 16 ones. The agent sends the 512-bit message. The command center receives 496 bits, and immediately knows how many bits are missing in various groups. For example, there could be 2 zeros missing in group1, 5 ones missing in group6, etc. Also, the command center now knows the boundaries of the 32 groups in a 496-bit string. There could be 1 to 16 “faulty” groups which have been identified, and a faulty group can have 1 to 16 missing bits. The command center knows how many bits are missing in each of the faulty groups. The problem is now reduced to 16 smaller problems which can be solved in parallel using subsequent test messages. The second message can contain 64 alternating groups. The first group would contain 8 zeros, the second group 8 ones, and so on. Within each faulty group identified after the first message, the command center would now narrow it down further. For example, if group1 of the first message had 6 zeros missing, then the command center knows that the first 10 bits of the 496-bit string belong to group1. If the first 10 bits of the second message contain 6 zeros and 4 ones, then the command center know that group1 of the second message has 2 bits missing and group2 of the second message has 4 bits missing. The third message can contain 128 alternating groups. The first group would contain 4 zeros, the second group 4 ones, and so on. The fourth message would contain 256 alternating groups. The fifth message would contain 512 alternating groups, basically a string of alternating zeros and ones. In the worst case, the positions of all 16 missing bits would be identified after 5 test messages. An example of the worst case would be a single bit missing in 16 of the 32 groups of the first message. Of course, in the best case, only 1 test message is needed, like if a single group of 16 zeros (or 16 ones) was found to be missing in the first message.
Regardless of however long the message is supposed to be, if there are 2^M bits known to be missing, they should be able to determine the positions of the missing bits with K = M+1 test messages in the worst case. Wondering if a lower number of tests can be used in the worst case.

3. #### Lazar Ilic:

This is not a Lean theorem proof, but my argument above from yesterday seems repaired by considering the 17 subsets where 16 of the first 17 bits are removed. So then by the Pigeonhole Principle >= 9 of these subsets return the same string after our first query no matter what and then deducing which of these 9 bits was the unique remaining 1 would require >= 4 more queries via the usual binary searching argumentation.

4. #### tanyakh:

Update: The problem is from Code Jam 2019 – Qualification Round.

5. #### Andrew:

The optimal solution may be 6, but it’s possible to do it in 9 without even thinking hard (let the n’th bit of the m’th message be the m’th binary digit of n, collect all 9 messages, and see which values are missing). Unless sending a message is *very* expensive or dangerous, that’s the solution I would use in the real world.

Yeah, I know. Engineer’s solution, not puzzler’s solution.