Complexity of Periodic Strings

I recently stumbled upon some notes (in Russian) of a public lecture given by Vladimir Arnold in 2006. In this lecture Arnold defines a notion of complexity for finite binary strings.

Consider a set of binary strings of length n. Let us first define the Ducci map acting on this set. The result of this operator acting on a string a1a2…an is a string of length n such that its i-th character is |ai − a(i+1)| for i < n, and the n-th character is |an − a1|. We can view this as a difference operator in the field F2, and we consider strings wrapped around. Or we can say that strings are periodic and infinite in both directions.

Let’s consider as an example the action of the Ducci map on strings of length 6. Since the Ducci map respects cyclic permutation as well as reflection, I will only check strings up to cyclic permutation and reflection. If I denote the Ducci map as D, then the Ducci operator is determined by its action on the following 13 strings, which represent all 64 strings up to cyclic permutation and reflection: D(000000) = 000000, D(000001) = 000011, D(000011) = 000101, D(000101) = 001111, D(000111) = 001001, D(001001) = 011011, D(001011) = 011101, D(001111) = 010001, D(010101) = 111111, D(010111) = 111101, D(011011) = 101101, D(011111) = 100001, D(111111) = 000000.

Now suppose we take a string and apply the Ducci map several times. Because of the pigeonhole principle, this procedure is eventually periodic. On strings of length 6, there are 4 cycles. One cycle of length 1 consists of the string 000000. One cycle of length 3 consists of the strings 011011, 101101 and 110110. Finally, there are two cycles of length 6: the first one is 000101, 001111, 010001, 110011, 010100, 111100, and the second one is shifted by one character.

We can represent the strings as vertices and the Ducci map as a collection of directed edges between vertices. All 64 vertices corresponding to strings of length 6 generate a graph with 4 connected components, each of which contains a unique cycle.

The Ducci map is similar to a differential operator. Hence, sequences that end up at the point 000000 are similar to polynomials. Arnold decided that polynomials should have lower complexity than other functions. I do not completely agree with that decision; I don’t have a good explanation for it. In any case, he proposes the following notion of complexity for such strings.

Strings that end up at cycles of longer length should be considered more complex than strings that end up at cycles with shorter length. Within the connected component, the strings that are further away from the cycle should have greater complexity. Thus the string 000000 has the lowest complexity, followed by the string 111111, as D(111111) = 000000. Next in increasing complexity are the strings 010101 and 101010. At this point the strings that represent polynomials are exhausted and the next more complex strings would be the three strings that form a cycle of length three: 011011, 101101 and 110110. If we assign 000000 a complexity of 1, then we can assign a number representing complexity to any other string. For example, the string 111111 would have complexity 2, and strings 010101 and 101010 would have complexity 3.

I am not completely satisfied with Arnold’s notion of complexity. First, as I mentioned before, I think that some high-degree polynomials are so much uglier than other functions that there is no reason to consider them having lower complexity. Second, I want to give a definition of complexity for periodic strings. There is a slight difference between periodic strings and finite strings that are wrapped around. Indeed, the string 110 of length 3 and the string 110110 of length 6 correspond to the same periodic string, but as finite strings it might make sense to think of string 110110 as more complex than string 110. As I want to define complexity for periodic strings, I want the complexity of the periodic strings corresponding to 110 and 110110 to be the same. So this is my definition of complexity for periodic strings: let’s call the complexity of the string the number of edges we need to traverse in the Ducci graph until we get to a string we saw before. For example, let us start with string 011010. Arrows represent the Ducci map: 011010 → 101110 → 110011 → 010100 → 111100 → 000101 → 001111 → 010001 → 110011. We saw 110011 before, so the number of edges, and thus the complexity, is 8.

The table below describes the complexity of the binary strings of length 6. The first column shows one string in a class up to a rotation or reflections. The second column shows the number of strings in a class. The next column provides the Ducci map of the given string, followed by the length of the cycle. The last two columns show Arnold’s complexity and my complexity.

String s # of Strings D(s) Length of the end cycle Arnold’s complexity My complexity
000000 1 000000 1 1 1
000001 6 000011 6 9 8
000011 6 000101 6 8 7
000101 6 001111 6 7 6
000111 6 001001 3 6 5
001001 3 011011 3 5 4
001011 12 011101 6 9 8
001111 6 010001 6 7 6
010101 2 111111 1 3 3
010111 6 111001 6 8 7
011011 3 101101 3 4 3
011111 6 100001 6 9 8
111111 1 000000 1 2 2

As you can see, for examples of length six my complexity doesn’t differ much from Arnold’s complexity, but for longer strings the difference will be more significant. Also, I am pleased to see that the sequence 011010, the one that I called The Random Sequence in one of my previous essays, has the highest complexity.

I know that my definition of complexity is only for periodic sequences. For example, the binary expansion of pi will have a very high complexity, though it can be represented by one Greek letter. But for periodic strings it always gives a number that can be used as a measure of complexity.

Share:Facebooktwitterredditpinterestlinkedinmail

Leon Vaserstein’s Problems

I met Leon Vaserstein at a party. What do you think I do at parties? I bug people for their favorite problems, of course. The first riddle Leon gave me is a variation on a famous problem I had already written about. Here’s his version:

The hypotenuse of a right triangle is 10 inches, and one of the altitudes is 6 inches. What is the area?

When Leon told me that he had designed some problems for the Soviet Olympiads, naturally I wanted to hear his favorite:

A closed polygonal chain has its vertices on the vertices of a square grid and all the segments are the same length. Prove that the number of segments is even.

Share:Facebooktwitterredditpinterestlinkedinmail

Recent Geeky Jokes

* * * A Generic Limerick (submitted by Michael Chepovetsky)

There once was an X from place B,
Who satisfied predicate P,
The X did thing A,
In a specified way,
Resulting in circumstance C.

* * *

I just learned that 4,416,237 people got married in the US in 2010. Not to nitpick, but shouldn’t it be an even number?

* * *

We are happy to announce that 100% of Russian citizens are computer-savvy and use the Internet on a regular basis (according to a recent Internet survey).

* * *

Two math teachers had a fight. It seems they couldn’t divide something.

* * *

Do you know that if you start counting seconds, once you reach 31,556,926 you discover that you have wasted a whole year?

* * *

What I need after a visit to the hairdresser is a “Save” button.

* * *

— Hello! Is this a fax machine?
— Yes.

* * *

— I am not fat at all! My girlfriend tells me that I have a perfect figure.
— Your girlfriend is a mathematician. For her a perfect figure is a sphere.

* * *

A: Hi, how are you?
B: +
A: Will you come to classes today?
B: –
A: You will be kicked out!
B: =
A: Are you using your calculator to chat?

Share:Facebooktwitterredditpinterestlinkedinmail

Large Numbers, Few Characters

I wonder what the largest number is that can be represented with one character. Probably 9. How about two characters? Is it 99? What about three or four?

I guess I should define a character. Let’s have two separate cases. In
the first one you can only use keyboard characters. In the second one
you can use any Unicode characters.

I’m awaiting your answers to this.

Share:Facebooktwitterredditpinterestlinkedinmail

A Nerd’s Way to Walk Up the Stairs

The last time I talked to John H. Conway, he taught me to walk up the stairs. It’s not that I didn’t know how to do that, but he reminded me that a nerd’s goal in climbing the steps is to establish the number of steps at the end of the flight. Since it is boring to just count the stairs, we’re lucky to have John’s fun system.

His invention is simple. Your steps should be in a cycle: short, long, long. Long in this case means a double step. Thus, you will cover five stairs in one short-long-long cycle. In addition, you should always start the first cycle on the same foot. Suppose you start on the left foot, then after two cycles you are back on the left foot, having covered ten stairs. While you are walking the stairs in this way, it is clear where you are in the cycle. By the end of the staircase, you will know the number of stairs modulo ten. Usually there are not a lot of stairs in a staircase, so you can easily estimate the total if you know the last digit of that number.

I guess I am not a true nerd. I have lived in my apartment for eight years and have never bothered to count the number of steps. That is, until now. Having climbed my staircase using John’s method, I now know that the ominous total is 13. Oh dear.

Share:Facebooktwitterredditpinterestlinkedinmail

Moscow Math Olympiad

The Moscow Math Olympiad has a different set of problems for every grade. Students need to write a proof for every problem. These are the 8th grade problems from this year’s Olympiad:

Problem 1. There were 6 seemingly identical balls lying at the vertices of the hexagon ABCDEF: at A — with a mass of 1 gram, at B — with a mass of 2 grams, …, at F — with a mass of 6 grams. A hacker switched two balls that were at opposite vertices of the hexagon. There is a balance scale that allows you to say in which pan the weight of the balls is greater. How can you decide which pair of balls was switched, using the scale just once?

Problem 2. Peter was born in the 19th century, while his brother Paul was born in the 20th. Once the brothers met at a party celebrating both birthdays. Peter said, “My age is equal to the sum of the digits of my birth year.” “Mine too,” replied Paul. By how many years is Paul younger than Peter?

Problem 3. Does there exist a hexagon which can be divided into four congruent triangles by a single line?

Problem 4. Every straight segment of a non-self-intersecting path contains an odd number of sides of cells of a 100 by 100 square grid. Any two consecutive segments are perpendicular to each other. Can the path pass through all the grid vertices inside and on the border of the square?

Problem 5. Denote the midpoints of the non-parallel sides AB and CD of the trapezoid ABCD by M and N respectively. The perpendicular from the point M to the diagonal AC and the perpendicular from the point N to the diagonal BD intersect at the point P. Prove that PA = PD.

Problem 6. Each cell in a square table contains a number. The sum of the two greatest numbers in each row is a, and the sum of the two greatest numbers in each column is b. Prove that a = b.

Share:Facebooktwitterredditpinterestlinkedinmail

How Much is Two and Two?

A mathematician is someone who pauses when asked “How much is two and two?”

Indeed, the answer might be:

  • 0 — modulo 4.
  • 1 — in characteristic 3.
  • 2 — if AND is considered as a logical operation.
  • 4 ± ε — for approximate values of 2.
  • 4 — (almost forgot this case).
  • 10 — in base 4.
  • 11 — in base 3.
  • 22 — for string concatenations.
  • MSC — (this one is your homework).
Share:Facebooktwitterredditpinterestlinkedinmail

Self-Mutilating DNA

I already wrote about the research of my friend Olga Amosova who studied the sickle-cell anemia mutation. She and her colleagues needed to store short fragments of hemoglobin genes for their experiments. All the fragments were identical. They noticed that with time the fragments always broke down in the same place. It was a mystery. When good scientists stumble on a mystery, they start digging.

They found that one of the nucleotides rips off the DNA fragment at the site of the Sickle-cell mutation. That place on the DNA becomes fragile and later breaks down. These sites need to be repaired. The repair is very error-prone and often leads to a mutation.

When DNA strands are left unattended, they want to pair up. There are four types of nucleotides: A, C, G and T. So mathematically the fragment of DNA is a string in the alphabet A, C, G, T. These nucleotides are matched to each other. When two DNA strands pair up, A on one strand always matches T and C matches G. So it is logical that if there are two complementary DNA pieces on the same fragment, they will find each other and pair up. They form a hydrogen bond. For example, a piece AACGT matches perfectly another piece TTGCA. Suppose a substring of DNA consists of a piece AACGT and somewhere later the reverse of the match: ACGTT. Such a string is called an inverted repeat. The DNA fragment I mentioned contains a string AACGT****ACGTT. Two pieces AACGT and ACGTT are complementary and not too far from each other in space. So it is easy for them to find each other and to bond to form a so-called stem-loop or a hairpin structure. The site of Sickle-cell mutation falls into the loop.

Olga and her colleagues discovered that for some particular loops the orientation in space becomes awkward and one of the nucleotides rips off. Such a rip off is called depurination. In further investigation, Olga found examples of when depurination happens. The first sequence of the pair that will bond later has to have at least five nucleotides and has to end in T. Correspondingly the second part in the pair has to begin with A. In the middle there needs to be four nucleotides GTGG. The first G flies away. Enzymes rush like a first aid squad to repair it and introduce mistakes that lead to mutation and diseases like cancer.

DNA was thought to be simply a passive information storage system, not capable of any action. Now we see that DNA is capable of action. DNA can damage itself. Damage provokes a mutation. For all practical purposes it is self-mutilation. Olga and her colleagues scanned the human genome for other sequences that are capable of self-mutilation. They found that such sequences are overwhelmingly present. They are present in much higher numbers than would be expected statistically. The pieces that are capable of damaging themselves occur 40 times more often than would occur if the nucleotides were distributed randomly. They are especially overrepresented in genes linked to cancer.

Self-damaging shouldn’t happen in normal situations. It can be provoked by the environment, for example, the chemistry of the cell. That means, that our cancers are not only in our genes but also in our life-style. There was, for example, a suggestion in a recent NY Times article, Is Sugar Toxic?, that too much sugar in a diet might provoke cancer. If the rate of mutation depends on the environment, we can influence it and prolong our lives.

It is not clear why the ability to self-mutilate survives in the evolutionary process. It is quite possible that if something very bad happens to our planet, we need our genes to be able to mutate very fast in order to adjust to the environment so that humans can survive.

Though I never tried to donate my sperm to a sperm bank, because of my inability to produce it, I know that sperm banks look for people who have ancestors who lived for a very long time. Such sperm is in bigger demand as everyone wants their children to live longer. I wonder if this tendency is a mistake. Global warming is upon us. People with longevity genes might not be flexible enough for their children to survive the changing of the Earth.

Share:Facebooktwitterredditpinterestlinkedinmail

Freedom and Diamonds

As you might have guessed from the title, this essay is about domino tilings.

Suppose a subset of a square grid has area N, and the number of possible domino tilings is T. Let’s imagine that each cell is contributing a factor of x tilings to the total independently of the others. Then we get that xN = T. This mental exercise suggests a definition: we call the nth root of T the degree of freedom per square for a given region.

Let’s consider a 1 by 2k rectangle. There is exactly one way to tile it with dominoes. So the degree of freedom per square of such a rectangle is 1. Now consider a 2 by k rectangle. It has the same area as before, and we know that there should be more than one tiling. Hence, we expect the degree of freedom to be larger than the one in the previous example. The number of tilings of a 2 by k rectangle is Fk-2, where Fk is kth Fibonacci number. So the degree of freedom for large k will be approximately the square root of the golden ratio, which is about 1.272.

You might expect that squares should give larger degrees of freedom than rectangles of the same area. The degree of freedom for a large square is about 1.3385. You can find more information in the beautiful paper Tilings by Federico Ardila and Richard P. Stanley.

 

Aztec Diamonds

Let’s move from rectangles to Aztec diamonds. They are almost like squares but the side of the diamond is aligned with diagonals of the dominoes rather than with their sides. See the sample diamonds in the picture above, which Richard Stanley kindly sent to me for this essay.

It is easier to calculate the degree of freedom for Aztec diamonds than for regular squares. The degree is the fourth root of 2, or 1.1892…. In the picture below created by James Propp’s tiling group you can see a random tiling of a large Aztec diamond.

Look at its colors: horizontal dominoes are yellow and blue; vertical ones are red and aquamarine. You might wonder what rule decides which of the horizontal dominoes are yellow and which are blue. I will not tell you the rule; I will just hint that it is simple.

 

Aztec Diamond

Back to freedom. As you can see from the picture, freedom is highly non-uniform and depends on where you live. Freedom is concentrated inside a circle called the arctic circle, perhaps because the areas outside it are frozen for lack of freedom.

Now I would like to expand the notion of freedom to give each cell its own freedom. For a large Aztec diamond, I will approximate freedom with a function that is one outside the arctic circle and is uniform inside. The Aztec diamond AZ(n) consists of 2n(n+1) squares, shaped like a square with side-length n√2. So the area of the circle is πn2/2. Hence we can calculate the freedom inside the circle as the πth root of 2, which is about 1.247. This number is still much less than the degree of freedom of a cell in a large square.

Share:Facebooktwitterredditpinterestlinkedinmail

Averaging Averages

Jorge Tierno sent me a link to the following puzzle:

There is a certain country where everybody wants to have a son. Therefore, each couple keeps having children until they have a boy, then they stop. What fraction of the children are female?

If we assume that a boy is born with probability 1/2 and children do not die, then every birth will produce a boy with the same probability as a girl, so girls will comprise half of all children.

Now, I wonder why everyone would want a boy? Y-chromosomes are much shorter than X-chromosomes. If a man wants to pass his genes to the next generation, a daughter should be preferable as she keeps more genes from the father. I am a mother of two boys, so my granddaughters will have my X-chromosome while my grandsons will have my ex-husband’s Y-chromosome, so to keep my genes in the pool I should be more interested in granddaughters.

But I digress. I started writing this essay because in the original puzzle link the answer was different from mine. Here is how the other argument goes:

Half of all families have zero girls, a quarter have 1/2 girls, 1/8 have 2/3 girls, and so on. If we sum this up the expected ratio of girls to boys is (1/2)0 + (1/4)(1/2) + (1/8)(2/3) + (1/16)(3/4) + … which adds to 1 − ln 2, which is about 30%.

What’s wrong with this solution?

Share:Facebooktwitterredditpinterestlinkedinmail