prime number problem of google treasure hunt 2008I just found out about Google's Treasure Hunt 2008. They say that "it's a puzzle contest designed to test yer problem-solving skills in computer science, networking, and low-level UNIX trivia."

Apparently I have missed the first three puzzles (first, second and third) but I'll give the fourth puzzle a shot.

The fourth problem is about prime numbers and it's formulated as following:

Find the smallest number that can be expressed as
the sum of 7 consecutive prime numbers,
the sum of 17 consecutive prime numbers,
the sum of 41 consecutive prime numbers,
the sum of 541 consecutive prime numbers,
and is itself a prime number.

For example, 41 is the smallest prime number that can be expressed as
the sum of 3 consecutive primes (11 + 13 + 17 = 41) and
the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).

The Solution

I'll write how I got the solution as I go, so I'll mix the past and present tenses in this post. Sometimes I'll write what I am going to do and sometimes I'll write what I just did.

I have no desire to generate lists of prime numbers myself as it has been done infinitely many times already. I'll just use a publicly available list of prime numbers! Here is a list of first fifty million primes.

I'll use my Unix-fu to find the solution.

First I noticed that the primes are zipped and split into chunks of million primes per file. The file names are like "primes1.zip", ... "primes50.zip".

A quick loop from 1 to 50 and wget gets all these files to my hard drive:

$ for i in $(seq 50); do wget "http://primes.utm.edu/lists/small/millions/primes$i.zip"; done

Next, I unzip all these files, and remove those zips to save space:

$ for i in $(seq 50); do unzip "primes$i.zip" && rm -f "primes$i.zip"; done

After doing that and looking at what I got, I realized that they were in some strange format, 8 primes per line, space padded and with some text on the first two lines. Here is an example how the first five lines look in primes1.txt file:

                 The First 1,000,000 Primes (from primes.utm.edu)

         2         3         5         7        11        13        17        19
        23        29        31        37        41        43        47        53
        59        61        67        71        73        79        83        89

I want all my primes to be in one file and one prime per line so I can extract N-th prime by looking at that line.
I used the following command to merge all the files into a single file:

for i in $(seq 50); do (awk 'BEGIN { OFS="\n" } NR > 2 {print $1,$2,$3,$4,$5,$6,$7,$8}' primes$i.txt >> primes.txt) && rm -f primes$i.txt; done

A quick verification that I did not lose any primes:

$ wc -l primes.txt
50000000 primes.txt

Now I'll create four files which contain sums of 7, 17, 41 and 541 consecutive primes, not exceeding the biggest prime in primes.txt file. I did that with the following AWK one-liner:

$ last=$(tail -1 primes.txt)
$ for N in 7 17 41 541
  do
   awk 'BEGIN { prev[0] = 0 } NR < '$N' {prev[NR] = $1; sum += $1 } NR >= '$N' { psum += prev[NR-'$N']; delete prev[NR-'$N']; prev[NR] = $1; sum += $1; if (sum - psum > '$last') { exit } printf "%d\n", sum - psum }' primes.txt > primes$N.txt
  done

The command created primes7.txt, primes17.txt, primes 41.txt and primes541.txt files. These files contain sums of prime numbers and just some of them are primes.

The solution, if it exists in the given data set, is the intersect of all these files. If there are multiple items in the intersect, the smallest should be chosen and checked if it really was a prime.

$ sort -nm primes541.txt primes41.txt | uniq -d | sort -nm primes17.txt - | uniq -d | sort -nm primes7.txt - | uniq -d
7830239
$ grep -m1 7830239 primes.txt
7830239

We have found the solution! It's 7830239!

I submitted the answer and after a few minutes it was confirmed to be correct! Awesome!

Your question: [7, 17, 41, 541]
Your answer: 7830239
Time received: 2008-06-06 23:33:26.268414 UTC

Correct answer: 7830239
Your answer was: Correct

Now leave a comment and tell me how you solved this problem!

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

the bill gates song, photo from microsoft 1983This week on Musical Geek Friday - The Bill Gates Song!

I found this song while watching a TED Talk on good vs. bad software interfaces called "When It Comes to Tech, Simplicity Sells" (scroll to the bottom to view this talk). To funny things up the speaker, David Pogue, bursted into a few songs, one of them being a song about Bill Gates.

David Pogue is the personal-technology columnist for the New York Times. Each week, he contributes a print column, an online column and an online video. His daily blog, "Pogue's Posts," is the Times's most popular blog.

So, here it is! The Bill Gates Song:

[audio:http://www.catonmat.net/download/david_pogue-bill_gates_song.mp3]

Download this song: the bill gates song.mp3 (musical geek friday #8)
Downloaded: 93651 times

Download lyrics: the bill gates song lyrics (musical geek friday #8)
Downloaded: 3202 times

Here is the lyrics of The Bill Gates Song:

I've been a geek forever,
And I wrote the very first DOS.
I put my software and IBM together,
I got profit, and they got the loss!
I write the code that makes the whole world run,
I'm gettin' royalties from everyone.
Sometimes it's garbage, but the press is snowed,
You buy the box, I sell the code.

Every software company
Is doing Microsoft's R&D.
You can't keep a good idea down these days.
Even Windows is a hack
We kinda based loosely on the Mac.
So it's big, so it's slow,
You've got nowhere to go!
I'm not doing it for praise!

I write the code that fits the world today,
Big mediocrity in every way.
We've entered planet domination mode,
You'll have no choice, you'll buy my code.
I am Bill Gates, and I write the code!

Here is the TED Talk with the song. The talk includes another two songs about Steve Jobs and technical support nightmare.

Direct link: http://www.ted.com/index.php/talks/view/id/7

Download "The Bill Gates Song"

Download this song: the bill gates song.mp3 (musical geek friday #8)
Downloaded: 93651 times

Download lyrics: the bill gates song lyrics (musical geek friday #8)
Downloaded: 3202

Click to listen:
[audio:http://www.catonmat.net/download/david_pogue-bill_gates_song.mp3]

Have fun and until next geeky Friday! :)

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

just one more hack geek songThis week on Musical Geek Friday the only song ever about a debugger - Just One More Hack and I'll Put It on the Net!

This song is written and performed by Mark Wheadon.

He says that once, when he worked for The Kent Koftware Tools Group, he had a colleague Mark Russell who was working on a debugger. This guy had been working on this debugger for quite some time, but he'd never got around to releasing it. There was always a new feature, some new functionality that absolutely had to be added before it could be released to the world. After a while this started getting silly, and so for a laugh Mr. Wheadon wrote a song entitled "Just One More Hack and I'll Put It on the Net" to see if he could chivvy him along.

Here it is! The Just One More Hack geek song (also know as "the ups song"):

[audio:http://www.catonmat.net/download/mark_wheadon-just_one_more_hack.mp3]

Download this song: just one more hack.mp3 (musical geek friday #7)
Downloaded: 10499 times

Download lyrics: just one more hack lyrics (musical geek friday #7)
Downloaded: 3169 times

Here is the lyrics of the Just One More Hack song:

Just one more hack and then I'll put it on the net,
Just one more hack and then I'll put it on the net,
...

Well, I've written a debugger and it suits me just fine
it'll chase away your problems, turn your water into wine
it's got so many features, in fact it's bloody clever
if it can't solve your problem then your problem probably never can be solved
so you might as well pack it on in,
coz it's the best debugger that there's ever been.

Just one more hack and then I'll put it on the net,
Just one more hack and then I'll put it on the net,
...

It's got everything you wanted, everything you desire
it'll handle fancy structures, set your soul on fire
it'll indirect through pointers, and catch a falling star
and if you ask it nicely it'll pop off to the bar and tell your friends
how to solve the problems they're in,
coz it's the best debugger that there's ever been.

Just one more hack and then I'll put it on the net,
Just one more hack and then I'll put it on the net,
...

So if you've got a nasty problem and your data structure's bent
and your pointer's in a tangle with your structure elements
if you're losing all your memory coz your allocator leaks
and your girl's getting nasty coz she's not seen you for weeks
then stoke up Mark's debugger and you know it'll win,
coz it's the best debugger that there's ever been....

Just one more hack and then I'll put it on the net,
Just one more hack and then I'll put it on the net,
...

Download "Just One More Hack" Song

Download this song: just one more hack.mp3 (musical geek friday #7)
Downloaded: 10499 times

Download lyrics: just one more hack lyrics (musical geek friday #7)
Downloaded: 3169

Click to listen:
[audio:http://www.catonmat.net/download/mark_wheadon-just_one_more_hack.mp3]

Have fun and until next geeky Friday! :)

From NAND to Tetris in 12 StepsI watched another really great video lecture. It's called 'From Nand to Tetris - Building a Modern Computer From First Principles'.

At first I thought it was just a video lecture by some computer hobbyist, but it turned out to be a lecture about a new computer science course for both undergraduate and graduate students, developed by professor Shimon Schocken. The goal of this course is to teach students various levels of abstractions in computing by letting them build a computer from scratch and have it run a computer game.

One of Shimon Schocken's colleagues says that, "Computer science is a thousand layers of abstraction", which is quite true. If we write a program in a high level language, it gets translated into assembly language which in turn gets translated into machine code, which in turn gets executed on hardware platform, which in turn uses logic gates to perform computation which is itself electrical engineering and physics.

This video lecture briefly explains all of these abstractions (except physics).

Before you watch the lecture (it's 1 hour long), you might first want to view a 10 minute introduction, to see if you're interested in this topic:

And here is the whole 1 hour lecture:

Moments from the lecture:

  • [06:55] Course is a mishmash from all computer science topics - hardware, architecture, data structures and algorithms, programming languages, compilers and software engineering.
  • [07:46] Students learn both software and hardware.
  • [10:05] Hardware part uses HDL (Hardware Description Language), a special HDL testing language and a hardware simulator.
  • [10:58] Building elementary logic gates in HDL.
  • [15:30] 16 bit adder, full adder, half adder.
  • [19:32] Building combinatorial gates (ALU).
  • [22:58] Building sequential chips (memory).
  • [23:40] Implementing machine language.
  • [26:30] Building the CPU.
  • [27:48] Building a Harvard architecture computer.
  • [29:58] Input/Output devices. Screen and keyboard memory maps.
  • [30:28] CPU Emulator demo.
  • [33:55] Summary of 30 different chips built in HDL.
  • [34:15] Example of ALU HDL project given to students, including HDL program, test file and a compare file
  • [35:32] Software part starts with writing an assembler.
  • [37:24] Implementing a VM (Virtual Machine).
  • [39:31] VM language.
  • [40:33] VM Emulator demo.
  • [42:30] Writing a compiler for "Jack" programming language.
  • [43:06] Demo of a Space Invaders game, written in Jack.
  • [47:02] Jack code gets parsed into XML and later translated into assembly code.
  • [51:33] Building the OS (Operating System).
  • [53:43] Summary of both hardware and software parts.
  • [57:45] Q and A!

Great news about the course this lecture talks about is that you can actually take it on your own, if you buy Mr. Shimon's book The Elements of Computing Systems. The book covers all the 12 steps in building the computer.

Shimon makes the necessary software, such as HDL, the testing suite and other programs available on book's website!

The course goes under a few different titles, such as 'Digital System Construction', 'Elements of Computing Systems', and the same 'From Nand to Tetris', depending on university.

Apart from that, Professor Shimon Schocken mentioned a few famous computer scientists during his talk. I got interested, and wrote them down for your convenience:

Have fun getting smarter with each lecture, until next time!

theorizing from data video talk by peter norvigHere is a video lecture by Google's Director of Research - Peter Norvig. The full title of this lecture is "Theorizing from Data: Avoiding the Capital Mistake".

In 1891 Sir Arthur Conan Doyle said that "it is a capital mistake to theorize before one has data." These words still remain true today.

In this talk Peter gives insight into what large amounts of data can do for problems in language understanding, translation and information extraction. The talk is accompanied with a bunch of examples from various Google services.

Moments from the lecture:

  • [00:35] Peter Norvig came to Google from NASA in 2001 because that's where the data was.
  • [01:30] Peter says that the way to make progress in AI (Artificial Intelligence) is to have more data. If you don't have data you won't make progress just with fancy algorithms.
  • [04:40] In 2001 a meta study of several different algorithms for disambiguating words in sentences showed that the worst algorithms performed better than the best algorithms if they were trained with a larger word database. Link to original meta study paper: Scaling to Very Very Large Corpora for Natural Language Disambiguation
  • [06:30] It took at least 30 years to go from a linguistic text collection of 1 million words (10^6 words, Brown Corpus) to what we now have on Internet (around 100 trillion words (10^14 words)).
  • [06:55] Google harvested one billion words (10^12) from the net, counted them up and published them to Linguistics Data Consortium. Announcement here, you can buy 6 DVDs of the words here (the price is $150).
  • [10:00] Example: Google Sets was the first experiment done using large amounts of data. It's a clustering algorithm which returns a group of similar words. Try "dog and cat" and then "more and cat" :)
  • [11:55] Example: Google Trends shows popularity of a search terms based on data collected over time of searches performed by users.
  • [13:15] Example: Query refinement suggestions.
  • [13:40] Example: Question answering.
  • [15:30] Principles of machine reading - concepts, relational templates, patterns.
  • [16:32] Example of learning relations and patterns with machine reading.
  • [18:40] Learning classes and attributes (for example, computer games and their manufacturers).
  • [21:18] Statistical Machine Translation (See Google Language Tools).
  • [24:25] Example of Chinese to English machine translation.
  • [26:27] Main components of machine translation are Translation Model, Language Model and Decoding Algorithm.
  • [29:35] More data helps!
  • [29:45] Problem: How many bits to use to store probabilities?
  • [31:10] Problem: How to reduce space used for storing words from training data during translation process?
  • [35:25] Three turning points in the history of development of information.
  • [37:00] Q and A!

There were some interesting questions in Q and A session:

  • [37:15] Have you applied any of the theories used in stock markets to language processing?
  • [38:08] Are you working on any tools to assist writers?
  • [39:50] How far you off from automated translation without disfluencies?
  • [41:58] 1) Is GOOG-411 service actually used to gather a huge corpus of spoken data. 2) Are there any advances on other data than text?
  • [43:50] Would the techniques you described in your talk work in speech-to-text processing?
  • [44:50] Will there be any services for fighting comment and form spam?
  • [46:00] Do you also take information like what links do users click into account when displaying search results?
  • [47:22] How do you measure difference between someone finding something, and someone being satisfied what they found?
  • [49:23] When doing machine translation, how can you tell that you're not learning from a website which was already translated with another machine translation service?
  • [50:49] How do you take into account that one uses slang, the other does not, and does it affect your translation tools?
  • [51:40] Can you speak a little about methods in OCR (Optical Character Recognition)?

The question at 44:50 got me very interested. The person asked if Google was going to offer any services for fighting spam. Peter said that it was an interesting idea, but it was better to ask Matt Cutts.

Having a hacker's mindset, I started thinking, what if someone emailed their comments through Gmail? If the comment was spam, Gmail's spam system would detect it and label the message as being spam. Otherwise the message would end up in Inbox folder. All the messages in Inbox folder could then be posted back to the website as good comments. If there were false positives, you could go through the spam folder and move the non-spam messages back to Inbox. What do you think?

Have fun!