prime number problem of google treasure hunt 2008I just found out about Google’s Treasure Hunt 2008. They say that “it’s a puzzle contest designed to test yer problem-solving skills in computer science, networking, and low-level UNIX trivia.”

Apparently I have missed the first three puzzles (first, second and third) but I’ll give the fourth puzzle a shot.

The fourth problem is about prime numbers and it’s formulated as following:

Find the smallest number that can be expressed as
the sum of 7 consecutive prime numbers,
the sum of 17 consecutive prime numbers,
the sum of 41 consecutive prime numbers,
the sum of 541 consecutive prime numbers,
and is itself a prime number.

For example, 41 is the smallest prime number that can be expressed as
the sum of 3 consecutive primes (11 + 13 + 17 = 41) and
the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).

The Solution

I’ll write how I got the solution as I go, so I’ll mix the past and present tenses in this post. Sometimes I’ll write what I am going to do and sometimes I’ll write what I just did.

I have no desire to generate lists of prime numbers myself as it has been done infinitely many times already. I’ll just use a publicly available list of prime numbers! Here is a list of first fifty million primes.

I’ll use my Unix-fu to find the solution.

First I noticed that the primes are zipped and split into chunks of million primes per file. The file names are like “primes1.zip”, … “primes50.zip”.

A quick loop from 1 to 50 and wget gets all these files to my hard drive:

$ for i in $(seq 50); do wget "http://primes.utm.edu/lists/small/millions/primes$i.zip"; done

Next, I unzip all these files, and remove those zips to save space:

$ for i in $(seq 50); do unzip "primes$i.zip" && rm -f "primes$i.zip"; done

After doing that and looking at what I got, I realized that they were in some strange format, 8 primes per line, space padded and with some text on the first two lines. Here is an example how the first five lines look in primes1.txt file:

                 The First 1,000,000 Primes (from primes.utm.edu)

         2         3         5         7        11        13        17        19
        23        29        31        37        41        43        47        53
        59        61        67        71        73        79        83        89

I want all my primes to be in one file and one prime per line so I can extract N-th prime by looking at that line.
I used the following command to merge all the files into a single file:

for i in $(seq 50); do (awk 'BEGIN { OFS="\n" } NR > 2 {print $1,$2,$3,$4,$5,$6,$7,$8}' primes$i.txt >> primes.txt) && rm -f primes$i.txt; done

A quick verification that I did not lose any primes:

$ wc -l primes.txt
50000000 primes.txt

Now I’ll create four files which contain sums of 7, 17, 41 and 541 consecutive primes, not exceeding the biggest prime in primes.txt file. I did that with the following AWK one-liner:

$ last=$(tail -1 primes.txt)
$ for N in 7 17 41 541
  do
   awk 'BEGIN { prev[0] = 0 } NR < '$N' {prev[NR] = $1; sum += $1 } NR >= '$N' { psum += prev[NR-'$N']; delete prev[NR-'$N']; prev[NR] = $1; sum += $1; if (sum - psum > '$last') { exit } printf "%d\n", sum - psum }' primes.txt > primes$N.txt
  done

The command created primes7.txt, primes17.txt, primes 41.txt and primes541.txt files. These files contain sums of prime numbers and just some of them are primes.

The solution, if it exists in the given data set, is the intersect of all these files. If there are multiple items in the intersect, the smallest should be chosen and checked if it really was a prime.

$ sort -nm primes541.txt primes41.txt | uniq -d | sort -nm primes17.txt - | uniq -d | sort -nm primes7.txt - | uniq -d
7830239
$ grep -m1 7830239 primes.txt
7830239

We have found the solution! It’s 7830239!

I submitted the answer and after a few minutes it was confirmed to be correct! Awesome!

Your question: [7, 17, 41, 541]
Your answer: 7830239
Time received: 2008-06-06 23:33:26.268414 UTC

Correct answer: 7830239
Your answer was: Correct

Now leave a comment and tell me how you solved this problem!