theorizing from data video talk by peter norvigHere is a video lecture by Google's Director of Research - Peter Norvig. The full title of this lecture is "Theorizing from Data: Avoiding the Capital Mistake".

In 1891 Sir Arthur Conan Doyle said that "it is a capital mistake to theorize before one has data." These words still remain true today.

In this talk Peter gives insight into what large amounts of data can do for problems in language understanding, translation and information extraction. The talk is accompanied with a bunch of examples from various Google services.

Moments from the lecture:

  • [00:35] Peter Norvig came to Google from NASA in 2001 because that's where the data was.
  • [01:30] Peter says that the way to make progress in AI (Artificial Intelligence) is to have more data. If you don't have data you won't make progress just with fancy algorithms.
  • [04:40] In 2001 a meta study of several different algorithms for disambiguating words in sentences showed that the worst algorithms performed better than the best algorithms if they were trained with a larger word database. Link to original meta study paper: Scaling to Very Very Large Corpora for Natural Language Disambiguation
  • [06:30] It took at least 30 years to go from a linguistic text collection of 1 million words (10^6 words, Brown Corpus) to what we now have on Internet (around 100 trillion words (10^14 words)).
  • [06:55] Google harvested one billion words (10^12) from the net, counted them up and published them to Linguistics Data Consortium. Announcement here, you can buy 6 DVDs of the words here (the price is $150).
  • [10:00] Example: Google Sets was the first experiment done using large amounts of data. It's a clustering algorithm which returns a group of similar words. Try "dog and cat" and then "more and cat" :)
  • [11:55] Example: Google Trends shows popularity of a search terms based on data collected over time of searches performed by users.
  • [13:15] Example: Query refinement suggestions.
  • [13:40] Example: Question answering.
  • [15:30] Principles of machine reading - concepts, relational templates, patterns.
  • [16:32] Example of learning relations and patterns with machine reading.
  • [18:40] Learning classes and attributes (for example, computer games and their manufacturers).
  • [21:18] Statistical Machine Translation (See Google Language Tools).
  • [24:25] Example of Chinese to English machine translation.
  • [26:27] Main components of machine translation are Translation Model, Language Model and Decoding Algorithm.
  • [29:35] More data helps!
  • [29:45] Problem: How many bits to use to store probabilities?
  • [31:10] Problem: How to reduce space used for storing words from training data during translation process?
  • [35:25] Three turning points in the history of development of information.
  • [37:00] Q and A!

There were some interesting questions in Q and A session:

  • [37:15] Have you applied any of the theories used in stock markets to language processing?
  • [38:08] Are you working on any tools to assist writers?
  • [39:50] How far you off from automated translation without disfluencies?
  • [41:58] 1) Is GOOG-411 service actually used to gather a huge corpus of spoken data. 2) Are there any advances on other data than text?
  • [43:50] Would the techniques you described in your talk work in speech-to-text processing?
  • [44:50] Will there be any services for fighting comment and form spam?
  • [46:00] Do you also take information like what links do users click into account when displaying search results?
  • [47:22] How do you measure difference between someone finding something, and someone being satisfied what they found?
  • [49:23] When doing machine translation, how can you tell that you're not learning from a website which was already translated with another machine translation service?
  • [50:49] How do you take into account that one uses slang, the other does not, and does it affect your translation tools?
  • [51:40] Can you speak a little about methods in OCR (Optical Character Recognition)?

The question at 44:50 got me very interested. The person asked if Google was going to offer any services for fighting spam. Peter said that it was an interesting idea, but it was better to ask Matt Cutts.

Having a hacker's mindset, I started thinking, what if someone emailed their comments through Gmail? If the comment was spam, Gmail's spam system would detect it and label the message as being spam. Otherwise the message would end up in Inbox folder. All the messages in Inbox folder could then be posted back to the website as good comments. If there were false positives, you could go through the spam folder and move the non-spam messages back to Inbox. What do you think?

Have fun!

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

the day the rotuers died geek songThis week on Musical Geek Friday - a song about The Day the Routers Died!

This song is written and performed live at the 55th RIPE Meeting by Gary Feldman (scroll down for a video of him performing it live).

RIPE (Réseaux IP Européens) is a collaborative forum open to all parties interested in wide area IP networks in Europe and beyond. The objective of RIPE is to ensure the administrative and technical coordination necessary to enable the operation of a pan-European IP network.

A RIPE Meeting is a five-day event where Internet service providers, network operators and other interested parties from Europe and the surrounding regions gather. RIPE Meetings are open to everyone and provide an excellent opportunity for attendees to discuss Internet policy.

The song is about a problem that the current IPv4 address space is running out of IP addresses (read about the problem here). IPv4 address space can hold just 4'294'967'296 (232) addresses. The song suggests that we all move to IPv6 address space which can hold 2128 addresses. That many addresses will never run out.

Here it is! The Day the Routers Died geek song:

[audio:http://www.catonmat.net/download/gary_feldman-the_day_the_routers_died.mp3]

Download this song: the day the routers died.mp3 (musical geek friday #6)
Downloaded: 12707 times

Download lyrics: the day the routers lyrics (musical geek friday #6)
Downloaded: 3578 times

Here is the lyrics of The Day the Routers Died song:

a long long time ago
i can still remember
when my laptop could connect elsewhere

and i tell you all there was a day
the network card i threw away
had a purpose - and it worked for you and me...

but 18 years completely wasted
with each address we've aggregated
the tables overflowing
the traffic just stopped flowing...

and now we're bearing all the scars
and all my traceroutes showing stars...
the packets would travel faster in cars...
the day... the routers died...

Chorus (ALL!!!!!)

so bye bye, folks at RIPE 55
be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
but I s'pose we'd better give it a try
I suppose we'd better give it a try

now did you write an RFC
that dictated how we all should be
did we listen like we should that day

now you back at RIPE fifty-four
where we heard the same things months before
and the people knew they'd have to change their ways...

and we - knew that all the ISPs
could be - future proof for centuries

but that was then not now
spent too much time playing WoW

ooh there was time we sat on IRC
making jokes on how this day would be
now there's no more use for TCP
the day the routers died...

Chorus (chime in now)

so bye bye, folks at RIPE 55
be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
but I s'pose we'd better give it a try
I suppose we'd better give it a try

I remember those old days I mourn
sitting in my room, downloading porn
yeah that's how it used to be...

when the packets flowed from A to B
via routers that could talk IP
There was data... that could be exchanged between you and me...

oh but - I could see you all ignore
the fact - we'd fill up IPv4

but we all lost the nerve
and we got what we deserved!

and while... we threw our network kit away
and wished we'd heard the things they say
put all our lives in disarray

the day... the routers died...

Chorus (those silent will be shot)

so bye bye, folks at RIPE 55
be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
but I s'pose we'd better give it a try
I suppose we'd better give it a try

saw a man with whom I used to peer
asked him to rescue my career
he just sighed and turned away...

I went down to the net cafe
that I used to visit everyday
but the man there said I might as well just leave...

and now we've all lost our purpose..
my cisco shares completely worthless...

no future meetings for me
at the Hotel Krasnapolsky

and the men that make us push and push
like Geoff Huston and Randy Bush
should've listened to what they told us...
The day... the routers... died

Chorus (time to lose your voice)

bye bye, folks at RIPE 55
be persuaded to upgrade it or your network will die
IPv6 just makes me let out a sigh
but I spose we'd better give it a try
I suppose we'd better give it a try

I also found a live video of Gary performing the song live at RIPE 55:

Ps. The Internet died once in 1997.

Download "The Day the Routers Died" Song

Download this song: the day the routers died.mp3 (musical geek friday #6)
Downloaded: 12707 times

Download lyrics: the day the routers lyrics (musical geek friday #6)
Downloaded: 3578

Click to listen:
[audio:http://www.catonmat.net/download/gary_feldman-the_day_the_routers_died.mp3]

Have fun and until next geeky Friday! :)

defcon logo post iconHere is something for all you hackers out there reading my blog: all the videos from the previous year's biggest and greatest hacker conference -- DefCon 15!

I found these videos via this post on Roy/SAC's blog. He bought a full set of DVDs for several hundred dollars and uploaded them to Google Video! I sincerely appreciate his effort!

Total of more than 200 videos!

For your convenience, here is the full DefCon 15 session listing:
Download Full DefCon 15 Session Listing (.pdf).

You're welcome to comment here on lectures you found intriguing and liked the most!

Have fun!

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

The Eternal Flame, God Wrote in Lisp SongThis week on Musical Geek Friday - God Wrote in Lisp (also known as The Eternal Flame) song!

This song is written by Bob Kanefsky and is performed by Julia Ecklar. It's a parody of another song of hers - "God Lives on Terra".

The song is about a question which programming language God could have used to create us?

God had a tight 6 day deadline to create the world, so he had to make a smart decision which language to use. Some folks say that it could have been C++ or C, but these languages are out as God would not have been able to even count grains of sand with 32-bit integers! Others say it could have been Fortran, Java, COBOL or even APL. But we all know the truth...

God wrote the world in Lisp!

Here it is! The God Wrote in Lisp song:

[audio:http://www.catonmat.net/download/the_eternal_flame-god_wrote_in_lisp.mp3]

Download this song: god wrote in lisp.mp3 (musical geek friday #5)
Downloaded: 37451 times

Download lyrics: god wrote in lisp lyrics (musical geek friday #5)
Downloaded: 2169

If you liked this song, it's included in a music CD "Roundworm" which contains parodies about Star Trek, dead cats, Lisp programming (this song), and everything in between!

Here is the lyrics of The Eternal Flame (God Wrote in Lisp) song:

I was taught assembler in my second year of school.
It's kinda like construction work — with a toothpick for a tool.
So when I made my senior year, I threw my code away,
And learned the way to program that I still prefer today.

Now, some folks on the Internet put their faith in C++.
They swear that it's so powerful, it's what God used for us.
And maybe it lets mortals dredge their objects from the C.
But I think that explains why only God can make a tree.

For God wrote in Lisp code
When he filled the leaves with green.
The fractal flowers and recursive roots:
The most lovely hack I've seen.
And when I ponder snowflakes, never finding two the same,
I know God likes a language with its own four-letter name.

Now, I've used a SUN under Unix, so I've seen what C can hold.
I've surfed for Perls, found what Fortran's for,
Got that Java stuff down cold.
Though the chance that I'd write COBOL code
is a SNOBOL's chance in Hell.
And I basically hate hieroglyphs, so I won't use APL.

Now, God must know all these languages, and a few I haven't named.
But the Lord made sure, when each sparrow falls,
that its flesh will be reclaimed.
And the Lord could not count grains of sand with a 32-bit word.
Who knows where we would go to if Lisp weren't what he preferred?

And God wrote in Lisp code
Every creature great and small.
Don't search the disk drive for man.c,
When the listing's on the wall.
And when I watch the lightning
Burn unbelievers to a crisp,
I know God had six days to work,
So he wrote it all in Lisp.

Yes, God had a deadline.
So he wrote it all in Lisp.

Download "God Wrote in Lisp" Song

Download this song: god wrote in lisp.mp3 (musical geek friday #5)
Downloaded: 37451 times

Download lyrics: god wrote in lisp lyrics (musical geek friday #5)
Downloaded: 2169

Click to listen:
[audio:http://www.catonmat.net/download/the_eternal_flame-god_wrote_in_lisp.mp3]

Have fun and until next geeky Friday! :)

python design patterns video lecturesIn my previous post about learning Python programming through video lectures I stopped at three lectures on Design Patterns. This time I continue from there.

If you don't know what a Design Pattern is, think of it as a simple solution to a specific problem that occurs very frequently in software design.

For example, suppose you use a bunch of unrelated pieces of code. It is a nice idea to bring the unrelated pieces of code together in a unified interface. This design pattern is called Facade. There are a bunch of patterns like this one!

The three lectures are given by Alex Martelli who works as "Über Tech Lead" for Google.

Python Design Patterns, Part I

Alex briefly covers the history and main principles of Design Patterns and quickly moves to discussing Structural and Behavioral DPs in Python.

Interesting ideas from the lecture:

  • [03:24] The name "Design Patterns" was first used by Christopher Alexander, an architect, who abstracted the idea of building buildings as building them using well known patterns which can be applied to the same problem over and over again without ever doing it the same way twice.
  • [05:30] Design Patterns are mostly applied to Object Oriented programming because it's the most widely spread programming paradigm nowadays.
  • [08:36] Design Patterns are not invented, they are discovered.
  • [10:00] Alex says that the original book Design Patterns by the Gang of Four should be read only when you are a master of DPs.
  • [13:10] Three classical categories of DPs are - Creational (deal with object instantiaton), Structural (deal with composition of objects) and Behavioral (deal with interaction of objects).
  • [14:05] "Program to an interface, not to an implementation."
  • [17:00] Use inheritance only when absolutely necessary, otherwise use "hold or wrap" principle.
  • [18:30] Never have more than one dot - Law of Demeter.
  • [18:50] Inheritance cannot restrict, use wrapping to restrict.
  • [21:41] In most of the cases when you need a single instance of something in Python, use a module instead of a class.
  • [22:23] Otherwise, just make 1 instance (without enforcing one).
  • [22:59] Singleton is also called "Highlander".
  • [24:50] There is basically no way to support subclassing well in Singleton.
  • [25:45] Monostate is also called "Borg".
  • [27:00] Python's data overriding helps in Monostate Design Pattern.
  • [29:00] Each Python's type/class is essentially a factory.
  • [32:06] Python does a "two-phase object construction".
  • [35:30] Adapter Design Pattern (it tweaks the interface to your needs).
  • [41:22] Facade Design Pattern (it provides a simple subset of a complex functionality).
  • [47:25] Bridge Design Pattern (it abstracts interface from the implementation).
  • [49:30] Decorator Design Pattern (it transparently modifies some functionality.).
  • [50:24] Proxy Design Pattern (sounds the same as decorator just for access control).
  • [51:21] Q and A!

Python Design Patterns, Part II

In this lecture Alex discusses behavioral patterns. Unlike the first part, he goes in depth of some of the patterns and explains how they can be implemented in Python.

Interesting ideas from the lecture:

  • [02:25] Template Method is a great pattern with a lousy name, a better name is "self-delegation".
  • [03:43] Example of Template Method Design Pattern (text pagination).
  • [08:50] Template Method Rationale.
  • [09:45] The "Hollywood Principle" - "don't call us, we'll call you"
  • [12:05] In Python you can also override data.
  • [13:10] Example of Template Method in Queue.Queue.
  • [14:05] If you are a good Python programmer, use Queue in threaded applications.
  • [17:45] Customizing Queue.
  • [19:30] Example of Template Method in cmd.Cmd.cmdloop.
  • [21:22] Example of Template Method in asyncore.dispatcher.
  • [22:30] Variant of Template Method - Mixin (not presented in Gang of Four book). It's a class to be multiply-inherited from and supplies organizing methods only.
  • [25:50] Template Method in DictMixin class.
  • [26:45] Example of DictMixin usage.
  • [29:00] Hooks can be factored out in another class. Two examples of this from Python's stdlib are HTML's formatter vs. writer, SAX's parser vs. handler
  • [32:40] Hook method introspection example of cmd.Cmd.docmd.
  • [33:30] There are three kinds of Template Methods - plain, factored into separate classes, and introspective.
  • [34:35] Example of all three kinds of Template Methods used in unittest.TestCase.
  • [36:17] State and Strategy Design Patterns. Very similar classes in what they do. They both factor out object's behavior.
  • [40:40] Ring buffer example done via State Design Pattern.
  • [43:35] Q and A!

Python Design Patterns, A Recap

This video lecture was presented at Google Developers day. It is a short version of the previous two video lectures. It starts with an example of Facade Design Pattern, moves on to history and all the types of design patterns.

I did not write out the interesting moments from this lecture as it was a subset of previous two lectures.

If you liked these lectures, check out this geek song about another commonly used design pattern - Model-View-Controller Song :)

Even though these were Python design patterns, to understand some of them I used Perl Design Patterns website!

Were there any interesting points in the lectures that caught your attention?