Hacker's Approach June 06, 2008

# Solving Google Treasure Hunt Puzzle 4: Prime Numbers

I just found out about Google's Treasure Hunt 2008. They say that "it's a puzzle contest designed to test yer problem-solving skills in computer science, networking, and low-level UNIX trivia."

Apparently I have missed the first three puzzles (first, second and third) but I'll give the fourth puzzle a shot.

The fourth problem is about prime numbers and it's formulated as following:

Find the smallest number that can be expressed as
the sum of 7 consecutive prime numbers,
the sum of 17 consecutive prime numbers,
the sum of 41 consecutive prime numbers,
the sum of 541 consecutive prime numbers,
and is itself a prime number.

For example, 41 is the smallest prime number that can be expressed as
the sum of 3 consecutive primes (11 + 13 + 17 = 41) and
the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).

## The Solution

I'll write how I got the solution as I go, so I'll mix the past and present tenses in this post. Sometimes I'll write what I am going to do and sometimes I'll write what I just did.

I have no desire to generate lists of prime numbers myself as it has been done infinitely many times already. I'll just use a publicly available list of prime numbers! Here is a list of first fifty million primes.

I'll use my Unix-fu to find the solution.

First I noticed that the primes are zipped and split into chunks of million primes per file. The file names are like "primes1.zip", ... "primes50.zip".

A quick loop from 1 to 50 and wget gets all these files to my hard drive:

```\$ for i in \$(seq 50); do wget "http://primes.utm.edu/lists/small/millions/primes\$i.zip"; done
```

Next, I unzip all these files, and remove those zips to save space:

```\$ for i in \$(seq 50); do unzip "primes\$i.zip" && rm -f "primes\$i.zip"; done
```

After doing that and looking at what I got, I realized that they were in some strange format, 8 primes per line, space padded and with some text on the first two lines. Here is an example how the first five lines look in primes1.txt file:

```                 The First 1,000,000 Primes (from primes.utm.edu)

2         3         5         7        11        13        17        19
23        29        31        37        41        43        47        53
59        61        67        71        73        79        83        89
```

I want all my primes to be in one file and one prime per line so I can extract N-th prime by looking at that line.
I used the following command to merge all the files into a single file:

```for i in \$(seq 50); do (awk 'BEGIN { OFS="\n" } NR > 2 {print \$1,\$2,\$3,\$4,\$5,\$6,\$7,\$8}' primes\$i.txt >> primes.txt) && rm -f primes\$i.txt; done
```

A quick verification that I did not lose any primes:

```\$ wc -l primes.txt
50000000 primes.txt
```

Now I'll create four files which contain sums of 7, 17, 41 and 541 consecutive primes, not exceeding the biggest prime in primes.txt file. I did that with the following AWK one-liner:

```\$ last=\$(tail -1 primes.txt)
\$ for N in 7 17 41 541
do
awk 'BEGIN { prev[0] = 0 } NR < '\$N' {prev[NR] = \$1; sum += \$1 } NR >= '\$N' { psum += prev[NR-'\$N']; delete prev[NR-'\$N']; prev[NR] = \$1; sum += \$1; if (sum - psum > '\$last') { exit } printf "%d\n", sum - psum }' primes.txt > primes\$N.txt
done
```

The command created primes7.txt, primes17.txt, primes 41.txt and primes541.txt files. These files contain sums of prime numbers and just some of them are primes.

The solution, if it exists in the given data set, is the intersect of all these files. If there are multiple items in the intersect, the smallest should be chosen and checked if it really was a prime.

```\$ sort -nm primes541.txt primes41.txt | uniq -d | sort -nm primes17.txt - | uniq -d | sort -nm primes7.txt - | uniq -d
7830239
\$ grep -m1 7830239 primes.txt
7830239
```

We have found the solution! It's 7830239!

I submitted the answer and after a few minutes it was confirmed to be correct! Awesome!

Your question: [7, 17, 41, 541]

Fuck, man, awesomely eclectic. I did mine in python and generated the primes (it was computationally less expensive for me to do that than to download them from the net). I then did the same algorithm in python. I'm confident it runs much faster than yours, but I'm also ashamedly confident that it took me a while longer to actually code it.

Good Lord you were lucky; I got 5, 25, 343, and 1017. Not sure what I was on, but I did binary lookups for the 1017s and the brute-forced the successes to check 5, 25, and 343. We can call it the Crack Algorithm&tm;, respecting what I must've been on to ignore all the possible optimizations.

Still, you get 41 and 541, and I get 343 and 1017. Sometimes life's just no fair.

nice ..i also though of doing in shell script first but then it became messy. Then i did it in c, though procedure is same.

I wish i know how to use shell in that way!

:D

I did my solution in F#. but im pretty sure it was pure brute force.

I did a simple python script that used binary search to find the wanted sums, I'm pretty sure it's faster.

Jeez, that must have taken a long time. I decided to do it with a sample of about 5 million or so primes and that took a long time.

Also, btw, why do you need to keep the psum? As far as I can see you are outputing the sum up to that point minus the sum up the the point i primes before that. Just keep the sum minus the prime number i primes ago.

sum += \$1 - prev[NR-'\$i']; prev[NR] = \$1; delete prev[NR-'\$i']; print sum

also, why do you quit when the sum is greater than the last prime? Wouldn't that skip the prime numbers at the end of the list? In your case it's ok since the answer comes out WAY before the end of the list, but from the looks of it it would miss the solution if it was at the end of the primes.txt file.

Ian, true, I could have summed it that way.

I quit when the sum is greater than the last prime because I wouldn't be able to verify if the sum was a prime (I don't want to write any verification algorithms myself. I just want to use that primes.txt list).

And you are right about primes at the end of the list, this wouldn't find it. You'd need a bigger prime list 450 million primes.

I did it in perl - and not terribly efficiently. Generated the primes up to 30,000,000 because that's about all that the simple sieve code I found would do in the 4Gb of memory I had available, figured it was probably enough.

And keeping ALL the lists in memory as both arrays and hashes was actually not too bad - wrote it in about 10 minutes and it took another 5 to run. Woot.

My first-pass recursive attempt at the robots thing was totally intractable, and the second time when I forgot to "use bigint" I actually submitted the wrong number through doing insufficient checking (oh the embarassment) - but once you figure that: value(N, M) == value(N - 1, M) + value (N, M - 1) and value (1, x) == value (y, 1) == 1, then it's pretty trivial to just build an N x M array and calculate the values.

It is nice having a multi-core modern machine with 4Gb of ram for the prime problem though - it makes brute force much cheaper, and coming up with a good algorithm for a once-off computation rather worthless.

Looks like a nice showcase for Python's stream-processing approach via generators and other iterators/itertools/etc. Here's the relevant snippet from the itoo.py in my personal toolbox:

```def sieve():
"""Sieve of Eratosthenes, yield each prime in sequence."""
yield 2
D = {}
q = 3
while True:
p = D.pop(q, 0)
if p:
x = q + p
while x in D: x += p
D[x] = p
else:
yield q
D[q*q] = 2*q
q += 2

def running_sum(it, k):
"""Yield sums of the latest k items from it"""
n = it.next
d = collections.deque(n() for i in range(k))
s = sum(d)
while True:
yield s
s -= d.popleft()
d.append(n())
s += d[-1]

def inters(*its, **k):
"""Yield items that are in ALL of sorted iterators *its"""
heap = []
for it in its:
try: heap.append((it.next(), it.next))
except StopIteration: return
heapq.heapify(heap)
mx = max(h[0] for h in heap)
while True:
v, n = heapq.heappop(heap)
if v==mx: yield v
try: v = n()
except StopIteration: return
if v>mx: mx = v
heapq.heappush(heap, (v, n))
```

[of course there are several other parts of itoo.py but I've picked only the ones used in the following]. Now, using these tools I keep lying around, I wrote uit.py:

```import itertools
import itoo

def prunsums(*totals):
ps = itertools.tee(itoo.sieve(), len(totals)+1)
sums = [itoo.running_sum(p, t) for p, t in zip(ps, totals)]
return itoo.inters(ps[-1], *sums)

def main():
gen = prunsums(3, 6)
print gen.next()
gen = prunsums(7, 17, 41, 541)
print gen.next()

main()

and:

alex-martellis-macbook-air:~ alex\$ time python uit.py
41
7830239

real	0m15.552s
user	0m14.541s
sys	0m0.440s
```

Of course, it IS only a Macbook Air, so a faster machine would get better times than 15 seconds!-) But, 15 seconds IS about the additional time it took to write uit.py, and I'd say that's about par for the course.

Alex

Thanks for your excellent comment, Alex! Great usage of Python.

with windows platform? can u tell me how it could be done without shell

That was really great unix-fu. I used it solve the 4th puzzle, thanks.

One suggestion though - I don't think you need to download all the fifty million prime numbers. I used your code to download just the first zip file, and it worked. My maximum number in the conditionals was 707.

Anyway, thanks once again!

llcoolj, you'd need CygWin. It's a linux environment for Windows. Once you install it, you'll be able to use all these commands from cygwin shell.

I used more or less the same approach as you to sum the primes, up to using awk and "delete a[NR-41]". :-) I used "tr -s" instead to split the files to one prime per line. And I used "fgrep -xf" to compute the intersections of the files, which is probably slower than sorting and using uniq.

My solutions:
http://leonardo-m.livejournal.com/63888.html
http://leonardo-m.livejournal.com/64165.html
The faster Python version requires 1 second, the faster D version runs in 0.07 seconds or so.

actually my summing code is the same as Ian's.

I used Python to solve this:
http://kourge.net/node/116

The need to write code for doing list intersections was a bit ugly; if I were using Ruby it wouldn't have needed an "intersection" function because consecutives.inject(:"&") would have done the job, although I'm not sure if Ruby's built-in & method is efficient enough.

My implementation runs about 7 seconds for five conditions. ([3, 5, 143, 865, 1089] => [5990183])

Nice work! I solved this question using similar methods of scripting to you, but utilizing MySQL when performing the sums. Created a 2 column table where I was running the query:

```SELECT p FROM primes WHERE p=(SELECT sum(p) from primes WHERE n>=\$i AND n

where \$i is the index of the prime (1 being 2, 2 being 3, 3 being 5, 4 being 7 etc) and \$a being the number of consecutive primes to sum.

Here's some MySQL:

<pre>mysql> describe primes;
+-------+------------------+------+-----+---------+----------------+
| Field | Type             | Null | Key | Default | Extra          |
+-------+------------------+------+-----+---------+----------------+
| n     | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| p     | int(10) unsigned | NO   | MUL |         |                |
+-------+------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

mysql> select * from primes limit 10;
+----+----+
| n  | p  |
+----+----+
|  1 |  2 |
|  2 |  3 |
|  3 |  5 |
|  4 |  7 |
|  5 | 11 |
|  6 | 13 |
|  7 | 17 |
|  8 | 19 |
|  9 | 23 |
| 10 | 29 |
+----+----+
10 rows in set (0.02 sec)```

Had to create an index on column 'p' to speed up the summation of the primes (very slow otherwise!). In the end, once the actual table had been created and indexed by MySQL, the solution popped out in a few more minutes of PHP and BASH script.

Ended up with 4 text files, containing all possible potential numbers. Then used a final bit of scripting piped through ‘less’ to manually locate the final answer:

```for i in `awk '{print \$1}' 1151.txt`; do grep "^\$i\$" 1151.txt 775.txt 11.txt 5.txt; echo; done | less

…

1151.txt:8256109

1151.txt:8327513

1151.txt:8368331

1151.txt:8429539
775.txt:8429539
11.txt:8429539
5.txt:8429539

1151.txt:8480599

1151.txt:8501039

1151.txt:8582831

…```

And there’s the answer: 8429539 (for problem 5, 11, 775 and 1151).

And this was all done on a small ITX Linux box with 384Mb of RAM running a VIA Nehemiah 1Ghz cpu. Who needs fast processors and gigs of RAM to solve this!! :-)

Great! Thanks for the detailed explanantion. However, I am not familiar with awk, so will appreciate, if you can explain the following:

1. awk 'BEGIN { OFS="\n" } NR > 2 {print \$1,\$2,\$3,\$4,\$5,\$6,\$7,\$8}'

2. awk 'BEGIN { prev[0] = 0 } NR = '\$N' { psum += prev[NR-'\$N'];

Beautiful approach towards solving the problem. I solved the problem using C. Apart from the language advantage, I think my idea is also a little different.

Guys, you are talking about brute force approach.
I mean these scripts are okay, but google could give us a different task to sum REALLY BIG primes?

Without math or a brilliant idea, its just a practice of typing code in your favorite programming language, isn't it?

I think about math approach because number of primes in each sum is usually also prime. That could be a google hint to us...

Runnig: I could not agree more with you. However, Quote -->"number of primes in each sum is usually also prime. That could be a google hint to us ".
Well I am not sure about this since I got 2 of my numbers as non prime.( and same seems to be the case with "K" above who got 25 and 343.

But hell you are right(about the (brilliant idea || math)-whatever you may want to call it). There is more information in the question, than any of the provided solutions here, have explioted.

Has anyone used the fact that the solution sets needs to be of necessarily continious prime numbers.

There is more there than meets the eye !

Great! Very inventive approach, imho. I am not fluent in shell-scripting, so for me your article was some kind of demonstration of its power if used sensibly.

I think I found a nice solution: quick and with a low memory footprint

http://mauro.netsons.org/th08.py

Okay. Runnig, as been already siand is undoubtly right. My first thought here was "such a hot lad," second was chaotic and worthless, but begun vomiting keywords such as: erdos, partition function, generatingfunctionology, ergodic theory. Goolge has to generate questions that have answers and all your methods are inefficient. Neverthell the point of this puzzle surely isn't mathematics, its too hard for undergraduate, too easy or pointless for a phd (too lazy to look up references). Therefore it isn't also a programmer's puzzle, because without aforementioned mathematics cannot be solved memory-efficiently. Peter got it right showing ideal qualifications for a Site Reliability Engineer (uber sysop) at Google. Hot lad.

I think its perfect. But my opinion is still you need to think on your comment.

I've written mine in D using atkin sieve and it goes about 1s :)

http://dsource.org/projects/tango/wiki/BitArrayVsArrayOfBoolsExample

But solution of a friend of mine beat me to the ground even more than yours.
He wrote it in Haskell in 12 line (including coments and blank lines!) which after some shrinking (but still readable though) went down to 4 iirc :) I'm not sure weather his solution is available online.

Hi,
I solved this fourth problem of Google Treasure Hunt 2008 in very easy way. I just write a C program to give the required output. By this program you can solve for the numbers upto 2^31.(upto any 10 digit number). I submited answers for so many problems of this kind. I got the confirmation for the first problem that I solved.

Find the smallest number that can be expressed as
the sum of 15 consecutive prime numbers,
the sum of 37 consecutive prime numbers,
the sum of 369 consecutive prime numbers,
the sum of 1015 consecutive prime numbers,
and is itself a prime number.

I am a good boy and your friend.

Well, I learned about the treasurehunt too late, but I had fun solving these puzzles.

I have bruteforced prime numbers puzzle by generating prime numbers and their sums into the file, and then just grep-ped numbers :)
I did this in Perl first, but I had not enough patience to wait for it, so I've rewritten it in C.
It took about a half an hour to me to solve this :)

Plz help me in doing this problem.
the problem is
n^2-n+41 represent the prime number where n belongs to Natural number.which option is correct for this statement.
(a) n

I got the same answer! But my way, I think was way easier. I just went to the bottom of your post and looked at the answer.

These puzzles remind me of Jon Bentley's "Programming Pearls" columns/books.