Archive for the ‘Privacy’ Category.

I Was Attacked

Not personally. Someone hacked into my website.

I would like to thank my readers Qiaochu Yuan, Mark Rudkin, “ano” and Paul who alerted me to the problem. Viewers who were using the Google Chrome browser and who tried to visit my website got this message: “This site contains content from, a site known to distribute malware.”

It took me some time to figure out what was going on. It appears that on June 19 someone from hacked into my hosting account and added a script to all my html files and to my blog header. It seems that the script was dormant and wasn’t yet doing bad things.

As soon as I grasped what was going on, I replaced all the affected files.

I have had my website for many years without changing my hosting password. Unfortunately, passwords, not dissimilar to humans, have this annoying tendency to become weaker with age. I wasn’t paying attention to the declining strength of my password and so I was punished.

Now I have fixed the website and my new password is: qwP35q2054uWiedfj052!@#$%.

Just kidding.


Turing Tests’ Race

In a Turing test a human judge on one end of an interface interacts with either a computer or another human through this interface. If the judge can’t differentiate a machine from a human, then the computer is said to pass the test. One big goal of folks working in Artificial Intelligence is to build a computer that, when subjected to this test, is indistinguishable from a human.

However, while some people are working hard trying to build programs that can pass as humans, other people are working hard inventing tests that can differentiate between humans and those programs. Such tests are sometimes called Reverse Turing tests. As computer science progresses, the programs that are pretending to be humans as well as reverse tests are becoming increasingly complex.

For example, banks frequently want to prevent malicious computer programs from trying to log into their customers’ accounts. As a nice touch the judges are computers in this case. There are different methods designed to confirm that a human is trying to log in. In one of them a picture of a word, called CAPTCHA is presented on a screen, and the program requires that this word be typed in.

I wanted a CAPTCHA with words “Turing Test” in it for this posting. I looked online trying to find a way to do it. I couldn’t. There is a ton of software that can produce random CAPTCHAs from a dictionary but nothing could do a particular word. Finally, rather than looking for software, I found a human, a kind gentlemen named Leonid Grinberg who with some GIMP help manually implemented a self-referencing ” symbolizing the race between computers and tests.

CAPTCHAAs text recognition software becomes better and better, these CAPTCHAs become more and more difficult to read by a human. The last time I tried to login, I was only able to type the right word on my fourth try. Very soon computers will be better than humans at parsing CAPTCHAs. Humans are loosing the race on visual methods like this one.

Here’s another example. Some malicious software can recognize and capture email addresses on webpages to use for spam. While we don’t want them to recognize email addresses, we do want people to be able to do so. Thus we need a way to present email addresses as a reverse Turing test.

The standard safety recommendation is to avoid writing out the full and exact email address. Here’s an example: billgates AT gmail DOT com. Actually, I think computers are so smart nowadays that they can learn this trick. Another idea is to show a picture of your email address instead of using characters. Here we return to the image idea, which most computers can nowadays recognize.

Another idea of how to hide an email address is to give simple clues, which point to characters in the email address. For example, if you have “4” in your address, you might say that the character is the sum of two and two. I already invented a version of my email address in which each letter of my username is an answer to a simple question. Unfortunately, I think that the question answering systems like Start, as well as its huge new competitor Wolfram Alpha, will learn to answer these questions very soon. I can construct more sophisticated questions, but that would require my readers to spend more time to figure it out including going back to school for a calculus class.

So, recently, I’ve come up with a new idea. I made the description of my email simpler, but the paragraph describing my email didn’t contain all the necessary information:

I have an email account with Yahoo. My account name consists of seven lower case letters: five letters of my first name concatenated with the first two letters of my last name.

People who want to contact me can easily find my name in the title of my webpage or in my url, but I hoped it would take the evil computers some time to figure out what to look for, where it’s located and how to turn it into an address.

The day after I changed my contact web page, I went to my math coaching work at AMSA. During my break, I wanted to unwind by solving a light up puzzle, but it appeared that the new security system at AMSA forbids Internet access to all gaming sites. Thus, being still wound up I decided to do some work and went to my personal page for some materials. I was blocked again. The software politely informed me that access to personal websites was not permitted either. Oops. If a computer can understand that it is a personal website, it probably can figure out the name of the corresponding person. Oops-Oops-Oops. I am loosing the race against computers again. My recent idea to protect my email address from spam lasted one day until my first reality check.


Contact Me

I enjoyed a recent discussion on the sequences fans mailing list. David Wilson suggested an idea for hiding email addresses on webpages from bots: change your email slightly and explain how to change it back.

For example, if you want to contact me, you should reverse my login name in the following email address:

Or, remove all the digits from the following email address:


Unrevealing Coin Weighings

In 2007 Alexander Shapovalov suggested a very interesting coin problem. Here is the kindergarten version:

You present 100 identical coins to a judge, who knows that among them are either one or two fake coins. All the real coins weigh the same and all the fake coins weigh the same, but the fake coins are lighter than the real ones.

You yourself know that there are exactly two fake coins and you know which ones they are. Can you use a balance scale to convince the judge that there are exactly two fake coins without revealing any information about any particular coin?

To solve this problem, divide the coins into two piles of 50 so that each pile contains exactly one fake coin. Put the piles in the different cups of the scale. The scale will balance, which means that you can’t have the total of exactly one fake coin. Moreover, this proves that each group contains exactly one fake coin. But for any particular coin, the judge won’t have a clue whether it is real or fake.

The puzzle is solved, and though you do not reveal any information about a particular coin, you still give out some information. I would like to introduce the notion of a revealing coefficient. The revealing coefficient is a portion of information you reveal, in addition to proving that there are exactly two fake coins. Before you weighed them all, any two coins out of 100 could have been the two fakes, so the number of equally probable possibilities was 100 choose 2, which is 5050 4950. After you’ve weighed them, it became clear that there was one fake in each pile, so the number of possibilities was reduced to 2500. The revealing coefficient shows the portion by which your possibilities have been reduced. In our case, it is (5050 − 2500)/5050 (4950-2500)/4950, slightly more less than one half.

Now that I’ve explained the kindergarten version, it’s time for you to try the elementary version. This problem is the same as above, except that this time you have 99 coins, instead of 100.

Hopefully you’ve finished that warm-up problem and we can move on to the original Shapovalov’s problem, which was designed for high schoolers.

A judge is presented with 100 coins that look the same, knowing that there are two or three fake coins among them. All the real coins weigh the same and all the fake coins weigh the same, but the fake coins are lighter than the real ones.

You yourself know that there are exactly three fake coins and you know which ones they are. Can you use a balance scale to convince the judge that there are exactly three fake coins, without revealing any information about any particular coin?

If you are lazy and do not want to solve this problem, but not too lazy to learn Russian, you can find several solutions to this problem in Russian in an essay by Konstantin Knop.

Your challenge is to solve the original Shapovalov puzzle, and for each solution to calculate the revealing coefficient. The best solution will be the one with the smallest revealing coefficient.


Is Anyone Watching?

Recently I conducted an experiment. I wrote an essay “What’s Hidden?” in which I claimed that the essay had a hidden secret message in it. I coded the message using a very simple method — to read it you need to combine together all the capital letters in the essay.

The goal of the experiment was to audit intelligence agencies of different countries. I wanted to check if this essay would draw any special attention.

Intelligence agencies should crawl around the web and check places that might have secret messages. They might also want to sieve Internet data through some standard coding techniques and check if there are coded messages out there. But the Internet is so vast that most agencies might not have the resources to parse through all the web pages. They probably only analyze suspected pages.

Anyway, I wanted to see if my traffic for this essay would be different from the usual. I have a tool for that — Google Analytics, which provides aggregated geographical data of my traffic. Looking at the results I can see that the visits to this particular essay were mostly from the United States, with a few from Europe. The total number of visits was small, especially compared to my essay on masturbation.

If an intelligence agency has any intelligence it should hide its visits from Google Analytics and crawl around the web without being registered. For example, they can use cached Google pages.

So my intention in this experiment was to check for any agency that had so much time and money on their hands that they were monitoring the entire web and, at the same time, was dumb enough to leave a trail. I am happy to conclude that there is no such agency, with only one potential exception: my home country — the United States.


Challenging Start

Start is the STate of the ART question-answering system. You can ask Start any question in plain English — for example, “What is the population of Moscow?” — and instead of producing millions of pages like Google, it provides one exact answer: “The population of Moscow, Russia, is 8,746,700.” I am not sure where this number comes from, as Russian sites suggest that the population of Moscow is more than 10 million people. But anyway, back to my challenge.

I have my email address in plain sight on my webpage. As a result, I get a lot of spam. So, I am thinking about a way to present my address so that humans can easily deduce it, but computers can’t. Here it is: my email server is Yahoo and my user name consists of 7 lower case letters. Each letter answers one of the questions below, in the right order. As of today, Start can’t answer any of these questions.

  1. What is the first letter of the word 3?
  2. What is the first letter of the alphabet?
  3. What is the only common letter in the words “knowledge” and “triamphant”?
  4. What is the last letter of all the days of the week?
  5. What is the first letter of almost all the continents?
  6. What is the first letter of the word “knight”?
  7. What is the most frequent letter in the word “although”?

The advantage of presenting my user name in this manner is that I will restrict my new correspondence to people who are sufficiently eager to write to me that they can spare ten seconds figuring out my email address. The main advantage is that Start can’t answer these questions, giving me hope that spamming software can’t do it either.

I do think that the state of the art question-answering system should know the first letter of the alphabet. Start: these questions are a challenge for you. How much time will it take you to do it?

Watch out. Maybe Google can do it faster.



I tried to enroll on a website recently, but they didn’t allow me to continue without choosing five security questions out of about ten samples they supplied. I started in good faith to do what they asked.

Question: What is your father’s middle name?
Answer: They do not have middle names in Russia; they have something called “otchestvo” and I know seven different ways to spell my father’s.

Question: What is the name of the street on which you were born?
Answer: I am glad it was not Lenin Street, but it was equally bad. Besides, it was renamed and I am not sure which name to choose.

Question: What is the name of your high school?
Answer: Finally, an easy question. In Russia we didn’t have names, but rather numbers for schools. I happily entered 444, and oops — the applet wouldn’t accept numbers.

I couldn’t find five questions that I could answer uniquely and reliably. I felt that the designers of these questions were clueless and disrespectful to other cultures. Then I thought about whether I really wanted some creepy database to know the name of my best friend. No, I didn’t.

Now I have established a file where I put the answers to security questions and I can have all the fun I want with my new biography. I can name my first dog Tom Cruise and have my wedding date be 20 years before I was born. I can name my husband Freedom Of Speech and my city of birth IHateSecurityQuestions. Maybe next time I will switch: Freedom Of Speech will be my dog and Tom Cruise my husband.

If you are lazy like me, you can choose your questions so you have the same answer for everything. This way you do not need to type much into your file. For example, you can name your city, your cat and your best friend George Washington. Or, if you are really lazy, God.


Do Gas Stations Need Your Zip Code?

Recently I was buying gas at a gas station. I was paying with my credit card and the machine asked for my zip code. If you read my previous post, you know what happened: I got furious. First, I tried 00000 as a zip code. The machine was smart enough to tell me that it was invalid. No matter what combination I tried, it wouldn’t accept another zip code.

Finally, the machine got frustrated with me and printed on its screen that I needed to talk to the cashier. Then, I argued with the cashier. He could have suggested that I pay cash, but he didn’t. I left without gas.

Luckily, for me I had dinner that day with my son, Alexey Radul, and his friend, Grem. Grem explained to me that gas stations are not collecting customers’ zip codes. They use zip codes as a security measure for checking that the credit card is not stolen. I guess that means I behaved exactly like a person who stole a card. Protecting your own privacy sometimes makes you look like a thief.

After dinner, I went back to the same gas station to conduct some experiments. This time I was armed with several valid zip codes. It didn’t work. Grem was right — only the zip code corresponding to the card could have worked. I looked like a thief again. I paid for my gas with cash and left.

The next day I called my credit card company and asked them about this. The manager I talked to told me the same thing as the manager Grem talked to. Apparently credit cards do not give your zip code information to gas stations. Your zip code is used instead of a pin number to verify that your credit card is not stolen. So in this case you do not need to worry that you are providing extra information, since credit card companies know your zip code anyway.


Resisting Databases

Nowadays, supporting a database is cheap. Computer storage is cheap, too. This gives companies an opportunity to collect more and more information about us. If you think, as I do, that this is an invasion of your privacy, here are some ways to resist.

When you buy something over the Internet or through a catalog, they ask for both your email address and telephone number. They may need a way to contact you in case something happens with your order, but they do not need both. When you are ordering online and their default contact is by email, they do not need your phone number. If the website requires your phone number, you can put in a fake number. Of course, you are a nice person and do not want to provide some innocent soul’s phone number instead of yours. Here is the perfect solution. Put the number 555-5555 as your home number with any area code.

The phone numbers of the format 555-xxxx are reserved for the movie industry. That is, if Hugh Grant calls Julia Roberts in a movie, there would be hundreds of bored or not very smart people who would try to call the same number Hugh dialed hoping to talk to Julia Roberts. For these situations, the movie industry reserves all the numbers of the form 555-xxxx. This way they guarantee that all of these fans will not bother a real person. So you can use these numbers without any guilt.

If you are ordering by phone, they might see your number on the caller ID. In this case, you can always say that you do not have an email address. You can also use a one-time email address offered through Sneakmail or AddressGuard at Yahoo.

Store shopping cards also scare me very much. When you use your store shopping card, they know exactly what and when and in what amounts you are buying. If you do not want anyone to know that you are buying 100 Tylenol pills a month, do not use your store card, and consider paying cash.

My friend Sam Steingold suggested I try card swapping. You have a CVS card and your friend has a CVS card — you can swap them. CVS’s database will register that you quit buying Tylenol in Boston, but started buying cigarettes in Atlanta. If you continue swapping, CVS’s database will be totally confused. The good part of this idea is that if someone tries to hold your purchases against you, you have a way to prove that you are not responsible.

The disadvantage of card swapping, is that for the transition time you lose targeted coupons. Your friend in Atlanta will get all the Tylenol coupons he doesn’t need. But you still will be able to buy sale items with discounts.

Here’s what I did – I put another last name on my CVS card. They didn’t notice. If they were to notice, I would have told them that I am in process of changing my last name to my newly acquired husband’s last name and would ask for newlyweds’ coupons.

Sometimes when you buy things, they ask you for your phone number at the cash register. It is even worse than shopping cards. They have your information on file without giving you your discounts. Just remember: you can always refuse. Or if you’re not comfortable refusing, let us all agree to give the same number: (area code)-555-5555. Let their analysts wonder why the same person is buying morning-sickness pills in one store and condoms in another.