## Archive Labeling Sequences, jointly with Gregory Marton

What follows is the story of a pair of integer sequences, which started life as a Google interview puzzle back in the previous century when VHS video tapes were in use:

Suppose you are buying VHS tapes and want to label them using the stickers that came in the package. You want to number the tapes consecutively starting from 1 and the stickers that come with each package are exactly one of each digit [“0″…”9”]. For your first tape you use only the digit “1”, and save all the other digit stickers for later tapes. The next time you will need a digit “1” will be for tape number 10. By this time you will have several unused “1” stickers. What is the next tape number such that after labeling the tape with that number you will not have any “1” stickers remaining?

We can formalize this puzzle in the following way:

Consider a function

fwhich, for a given whole numberx, returns the number of times that the digit 1 is needed to write all of the numbers between one andx. For example,f(13) = 6. Notice thatf(1) = 1. What is the next largestxsuch thatf(x) = x?

Thus *f(x)* is the number of “1” stickers needed to label all the tapes up to tape *x*. When *f(x) = x* then we have used all of the “1” stickers in labeling the first *x* tapes.

Let’s consider any non-zero digit. In the single and double-digit numbers, there are ten of each digit in the ones column, and ten of each digit in the tens column, so 20 altogether. Early on, the tape number is ahead of the digit count. By the time we get to 20-digit numbers, though, there should be, on average, two of any single non-zero digit per number. Thus, the number of times that any digit is used should eventually catch up with the tape numbers.

Encouraged by assurance of reaching our goal somewhere, we might continue our estimate. In the up-to three-digit numbers there are 300 of each non-zero digit; in the numbers below 10,000 there are 4000; then 50,000, and so on up to 10^{10}, where *f(x)* and *x* must (almost) meet. In particular, there are 10,000,000,000 counts for any non-zero digit in the numbers below 10,000,000,000. Hence, were the puzzle asking about any of the digits 2–9, then ten billion could have been an easy answer or, at least a limit on how far we need to search.

Sadly, there is a 1 in the decimal representation of ten billion (and a few zeroes), so we require 10^{10}+1 of the digit “1” to write the numbers [1…10^{10}]. Thus, *f*(10^{10}) ≠ 10^{10}, so 10^{10} cannot be the answer to the original puzzle. Thus stymied, Gregory wrote a program to find the solution to the original Google’s puzzle. And the answer turned out to be 199,981.

Gregory was overstymied, so he actually wrote a program to solve the puzzle for any non-zero digit. He calculated the beginning of the sequence *a(n)*, where *a(n)* is the smallest number *x* > 1 such that the decimal representation of *n* appears as a substring of the decimal representations of the numbers *[1…x]* exactly *x* times.

We already know that *a(1)* is 199,981. The sequence continues as follows: 199,981, 28,263,827, 371,599,983, 499,999,984, 10,000,000,000, 9,500,000,000, 9,465,000,000, 9,465,000,000, 10,000,000,000, ….

Did you expect this sequence to be increasing? You could have, because smaller numbers tend to contain smaller digits than larger numbers. Then why is the sequence not increasing? As Gregory failed to find a value for the digit 5 below ten billion, he noticed that it’s fairly easy to imagine a scenario where you have one less than the number you need, and then the next value has more than you need for equality, and then you equalize again later. In response, Alexey Radul, a friend of one author and a son of the other, suggested a related sequence:

Let

a(n)be the smallest numberx> 1 such that the decimal representation ofnappears as a substring of the decimal representations of the numbers[1…x]more thanxtimes.

The key difference being “more than” rather than “exactly”. Starting at 1 then, here are the first nine terms of each sequence:

n | = | > |
---|---|---|

1 | 199,981 | 199,991 |

2 | 28,263,827 | 28,263,828 |

3 | 371,599,983 | 371,599,993 |

4 | 499,999,984 | 499,999,994 |

5 | 10,000,000,000 | 5,555,555,555 |

6 | 9,500,000,000 | 6,666,666,666 |

7 | 9,465,000,000 | 7,777,777,777 |

8 | 9,465,000,000 | 8,888,888,888 |

9 | 10,000,000,000 | 9,999,999,999 |

Some of these pairs are interesting in their own right. Notice that 199,991 is ten more than the previously found 199,981. For all the numbers in between, the initial equality holds. Likewise for *n=3*, each of the numbers between 371,599,983 and 371,599,993 has exactly one three. Hence, the increase in a number by one is the same as the increase in the count of threes. A similar situation holds for *n=4*.

Gregory has submitted these two new sequences to the Online Encyclopedia of Integer Sequences, as they turned out not to be there yet, and they can be found using the identifiers A163500 for the “=” sequence and A164321 for the “>” sequence. It’s not surprising that the values matching this relaxed second condition are more well behaved than those with equality. Do you think the second sequence is always increasing? Wait a minute, let’s check that sequence for zero.

In counting zeroes, it is important to remember that we start with one, not zero. In this case the smallest number *x* such that *x* is less than or equal to the number of 0s in the decimal representations of [1 … *x*] is 100,559,404,366. But what is the corresponding number for the “=” sequence? It appears that no such number exists. The challenge of accurately proving it, as they say, is left as an exercise to the reader.

There is no reason that we should be constrained to single digits. The formal statement of the problem provides an obvious generalization, where we consider substrings of each of the numbers [1 … *x*] rather than digits in those numbers. We should note that we count every occurrence of a substring separately. Thus 11 will be counted twice as a substring of 1113.

We can prove that the “more than” sequence is increasing after the first term. Indeed, for two integers *i* and *j*, if *i* is less than *j*, then for every occurrence of *j*, by replacing *j* with *i* we get a smaller number with an occurrence of *i*.

Inspired, Tanya wrote another fancier and faster program to find values of this sequence for two-digit numbers. Here is the smallest number *x* for which the number of “10”s as substrings of the numbers *[1…x]* is more than or equal to *x*. And by a lucky strike the equality holds. The number is: 109,

The value of *a _{≥}(11)* might seem like a miracle: 119,

*a*and

_{≥}(10)*a*?

_{≥}(11)Sadly, *a _{≥}(12)* is not so pretty: 1,296,624,070,230,872,

We cannot leave off without at least mentioning that the sequence function should next take one more parameter: the base of representation.

We have found only the first few members of these new sequences, and there are many related sequences to be catalogued. We would love to hear tales from your explorations. Enjoy the sequence hunt!

Share:
## Gregory Marton:

Donovan Johnson has recently noted (correctly) that a_{=}(6) should be 9,500,000,000. I’m about to submit the correction to OEIS. Sorry, and thanks to Donovan for double-checking!

31 December 2010, 2:32 pm