I recently stumbled upon some notes (in Russian) of a public lecture given by Vladimir Arnold in 2006. In this lecture Arnold defines a notion of complexity for finite binary strings.
Consider a set of binary strings of length n. Let us first define the Ducci map acting on this set. The result of this operator acting on a string a1a2…an is a string of length n such that its i-th character is |ai − a(i+1)| for i < n, and the n-th character is |an − a1|. We can view this as a difference operator in the field F2, and we consider strings wrapped around. Or we can say that strings are periodic and infinite in both directions.
Let’s consider as an example the action of the Ducci map on strings of length 6. Since the Ducci map respects cyclic permutation as well as reflection, I will only check strings up to cyclic permutation and reflection. If I denote the Ducci map as D, then the Ducci operator is determined by its action on the following 13 strings, which represent all 64 strings up to cyclic permutation and reflection: D(000000) = 000000, D(000001) = 000011, D(000011) = 000101, D(000101) = 001111, D(000111) = 001001, D(001001) = 011011, D(001011) = 011101, D(001111) = 010001, D(010101) = 111111, D(010111) = 111101, D(011011) = 101101, D(011111) = 100001, D(111111) = 000000.
Now suppose we take a string and apply the Ducci map several times. Because of the pigeonhole principle, this procedure is eventually periodic. On strings of length 6, there are 4 cycles. One cycle of length 1 consists of the string 000000. One cycle of length 3 consists of the strings 011011, 101101 and 110110. Finally, there are two cycles of length 6: the first one is 000101, 001111, 010001, 110011, 010100, 111100, and the second one is shifted by one character.
We can represent the strings as vertices and the Ducci map as a collection of directed edges between vertices. All 64 vertices corresponding to strings of length 6 generate a graph with 4 connected components, each of which contains a unique cycle.
The Ducci map is similar to a differential operator. Hence, sequences that end up at the point 000000 are similar to polynomials. Arnold decided that polynomials should have lower complexity than other functions. I do not completely agree with that decision; I don’t have a good explanation for it. In any case, he proposes the following notion of complexity for such strings.
Strings that end up at cycles of longer length should be considered more complex than strings that end up at cycles with shorter length. Within the connected component, the strings that are further away from the cycle should have greater complexity. Thus the string 000000 has the lowest complexity, followed by the string 111111, as D(111111) = 000000. Next in increasing complexity are the strings 010101 and 101010. At this point the strings that represent polynomials are exhausted and the next more complex strings would be the three strings that form a cycle of length three: 011011, 101101 and 110110. If we assign 000000 a complexity of 1, then we can assign a number representing complexity to any other string. For example, the string 111111 would have complexity 2, and strings 010101 and 101010 would have complexity 3.
I am not completely satisfied with Arnold’s notion of complexity. First, as I mentioned before, I think that some high-degree polynomials are so much uglier than other functions that there is no reason to consider them having lower complexity. Second, I want to give a definition of complexity for periodic strings. There is a slight difference between periodic strings and finite strings that are wrapped around. Indeed, the string 110 of length 3 and the string 110110 of length 6 correspond to the same periodic string, but as finite strings it might make sense to think of string 110110 as more complex than string 110. As I want to define complexity for periodic strings, I want the complexity of the periodic strings corresponding to 110 and 110110 to be the same. So this is my definition of complexity for periodic strings: let’s call the complexity of the string the number of edges we need to traverse in the Ducci graph until we get to a string we saw before. For example, let us start with string 011010. Arrows represent the Ducci map: 011010 → 101110 → 110011 → 010100 → 111100 → 000101 → 001111 → 010001 → 110011. We saw 110011 before, so the number of edges, and thus the complexity, is 8.
The table below describes the complexity of the binary strings of length 6. The first column shows one string in a class up to a rotation or reflections. The second column shows the number of strings in a class. The next column provides the Ducci map of the given string, followed by the length of the cycle. The last two columns show Arnold’s complexity and my complexity.
|String s||# of Strings||D(s)||Length of the end cycle||Arnold’s complexity||My complexity|
As you can see, for examples of length six my complexity doesn’t differ much from Arnold’s complexity, but for longer strings the difference will be more significant. Also, I am pleased to see that the sequence 011010, the one that I called The Random Sequence in one of my previous essays, has the highest complexity.
I know that my definition of complexity is only for periodic sequences. For example, the binary expansion of pi will have a very high complexity, though it can be represented by one Greek letter. But for periodic strings it always gives a number that can be used as a measure of complexity.