prime number problem of google treasure hunt 2008I just found out about Google's Treasure Hunt Challenge. They say that "it's a puzzle contest designed to test yer problem-solving skills in computer science, networking, and low-level UNIX trivia."

Apparently I have missed the first three puzzles (first, second and third) but I'll give the fourth puzzle a shot.

The fourth problem is about prime numbers and it's formulated as following:

Find the smallest number that can be expressed as the sum of 7 consecutive prime numbers, the sum of 17 consecutive prime numbers, the sum of 41 consecutive prime numbers, the sum of 541 consecutive prime numbers, and is itself a prime number. For example, 41 is the smallest prime number that can be expressed as the sum of 3 consecutive primes (11 + 13 + 17 = 41) and the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).

The Solution

Here's how I approached and solved this problem.

I had no desire to generate lists of prime numbers myself as it's been done thousands of times already. I didn't even want to copy any existing prime generating code. I decided to just use a publicly available list of prime numbers. Here is a list of first fifty million primes that I found.

Next, I used my Unix-fu to find the solution.

I noticed that the primes are zipped and split into chunks of million primes per file. The file names were like "primes1.zip", ... "primes50.zip".

A quick loop from 1 to 50 and wget got all these files to my hard drive:

$ for i in $(seq 50); do wget "http://primes.utm.edu/lists/small/millions/primes$i.zip"; done

Next, I unzip'ped all these files, and removed those zips to save disk space:

$ for i in $(seq 50); do unzip "primes$i.zip" && rm -f "primes$i.zip"; done

After doing that and looking at what I got, I realized that they were in some strange format, 8 primes per line, space padded and with some text on the first two lines. Here is an example of how the first five lines in primes1.txt file looked like:

                 The First 1,000,000 Primes (from primes.utm.edu)

         2         3         5         7        11        13        17        19
        23        29        31        37        41        43        47        53
        59        61        67        71        73        79        83        89

This is not great. I wanted all my primes to be in one file and one prime per line so I could extract N-th prime by looking at N-th line.

I used the following command to merge all the files into a single file:

for i in $(seq 50); do (awk 'BEGIN { OFS="\n" } NR > 2 {print $1,$2,$3,$4,$5,$6,$7,$8}' primes$i.txt >> primes.txt) && rm -f primes$i.txt; done

A quick verification that I didn't lose any primes:

$ wc -l primes.txt
50000000 primes.txt

Now I created four files which contain sums of 7, 17, 41 and 541 consecutive primes, not exceeding the biggest prime in primes.txt file. I did that with the following AWK one-liner:

$ last=$(tail -1 primes.txt)
$ for N in 7 17 41 541
  do
   awk 'BEGIN { prev[0] = 0 } NR < '$N' {prev[NR] = $1; sum += $1 } NR >= '$N' { psum += prev[NR-'$N']; delete prev[NR-'$N']; prev[NR] = $1; sum += $1; if (sum - psum > '$last') { exit } printf "%d\n", sum - psum }' primes.txt > primes$N.txt
  done

The command created primes7.txt, primes17.txt, primes 41.txt and primes541.txt files. These files contained sums of prime numbers but only some of them were primes.

The solution, if it existed in the given data set, was the intersect of all these files. If there were multiple items in the intersect, the smallest should be chosen and checked if it really was a prime.

$ sort -nm primes541.txt primes41.txt | uniq -d | sort -nm primes17.txt - | uniq -d | sort -nm primes7.txt - | uniq -d
7830239
$ grep -m1 7830239 primes.txt
7830239

And I found the solution! It was 7830239. I submitted the answer and after a few minutes it was confirmed to be correct.

Your question: [7, 17, 41, 541] Your answer: 7830239 Time received: 2008-06-06 23:33:26.268414 UTC Correct answer: 7830239 Your answer was: Correct

Awesome! Now tell me how you solved this problem.