A Year of BloggingMy dear readers, it has been a year since I have been blogging here! With this post I want to share my blog statistics with you.

During this year (July 14, 2007 - July 20, 2008) I managed to write 58 posts, which received 808 comments. Based on statistics from Statcounter and Google Analytics, these posts received 574,874 views by 424,292 unique visitors.

Here is a Google Analytics graph showing monthly page views for this period (click for a larger version):

catonmat.net page views graph (small)

Here are the top 5 countries my blog readers came from:

  • United States (195,076 visitors)
  • United Kingdom (25,988 visitors)
  • Canda (25,335 visitors)
  • India (13,753 visitors)
  • Germany (13,670 visitors)

Not surprisingly, the top 5 referring sites were all social media and bookmarking sites:

From all the visitors 48,386 went to my site directly and 82,502 were sent here by my darling Google.

During the first year, approximately 1000 people subscribed to my blog. Here is the Feedburner subscriber graph for the year:

feedburner statistics for one year of blogging

If you are interested in my blog, you may subscribe here: catonmat rss feed.

According to Technorati, my blog has received 476 blog reactions and ranks 29,521-st out of 112.8 million blogs!

Many of my posts have been submitted to Reddit, Digg and have been Stumbled. Here are the top 5 most visited posts:

Some of my how-to posts came with downloadable cheat sheets. Here are the top 5 cheat sheets:

During the last year I also did several web projects. As I was busy with physics studies, I created only four web projects (all of these projects are open source):

In March 2008 I started posting geek music on Fridays. Here are the top 5 geek songs:

I am satisfied with where this blog is heading. I'd like to thank all my fans and all the visitors who regularly return to the site!

To make things more challenging, I am setting myself a goal of reaching 5000 subscribers by the end of the next year of blogging (July 2009)! I know that this is very ambitious goal but I am ready to take the challenge!

birthday portal cake

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

the bittorrent songThis week on Musical Geek Friday - the famous BitTorrent Song!

The song is written and performed by Brent Simon. Brent describes himself as a super nerd who plays the synth and makes original music that's honest and from the heart. He keeps composing and his latest music can be found on his MySpace page - brentsimon.

The song, when I first downloaded it, was actually called "Mininova", in honor of one of the largest BitTorrent sites on the net - Mininova.org.

I was curious about the history of this song and emailed Brent. It turns out the original title of the song is "World Wired West" and Mininova torrent renamed to song to "Mininova" when they put it for download on their site!

Here is what Brent had to say about the history of the song:

I decided to write "World Wired West", which has become better known as "the Bittorent song", based off of my experience with the net and torrents. I wanted to write something as geeky as possible and see if anyone knew what I was talking about. It looks like you guys do! The second verse is actually a short poem I wrote for the literary journal back in high school. I was transitioning from DOS to Windows at that point; that's why I have the old DOS references of "Bad command or file name" and D-I-R.

I also asked him about the mysterious final verse, he said:

I had just read Naked Lunch by William S. Burroughs, which was the most confusing and disturbing book I've ever read, and my cousin Jim wanted me to write a song about it. That verse is the combined total of what I could make out from the book and what I remember from the movie (which I haven't seen in many years). There was no way I could write a whole song about it, but I was able to fit a verse together and there you have it.

Here it is! The BitTorrent Song!

[audio:http://www.catonmat.net/download/brent_simon-the_bittorrent_song.mp3]

Download this song: the bittorrent song.mp3 (musical geek friday #11)
Downloaded: 15172 times

Download lyrics: the bittorrent song lyrics (musical geek friday #11)
Downloaded: 2854 times

Here is the lyrics of The BitTorrent Song:

-Verse 1-
Gather 'round and hear my story
'Bout the new old west
I sought torrents, rips, and ISO's
Logged in under guest

Movies, music, games and porno,
Warez and killer apps,
Compressors, CODECs and keygens
Compilers, hacks and cracks

-Chorus-
Data's streamin' my hard drive's screamin'
Fragmenting the night
Bit's are flowin' corruption's growin'
Into every last byte

-Verse 2-
Bad command or file name
OK, now what's next
D-I-R where is my brain
It's coded in base hex

Alright now you'll feel my wrath
Don't ever mess with me
Cram this up your Gigabyte
So long Format C:

-Chorus-
Data's streamin' my hard drive's screamin'
Fragmenting the night
Bit's are flowin' corruption's growin'
Into every last byte

-Bonus Verse-
If you watch my lips real closely
You'll see they don't match what I'm saying
My typewriter emits fluids
All of which are intoxicating

Staring at my shoes all after-
Noon on heroin isn't boring
Just to poop I need the surgical
Equivalent of apple coring

-Chorus-
Naked Lunch
Naked Lunch
Naked Lunch
Naked Lunch

Here is Brent Simon performing The Bittorrent Song:

Download "The Bittorrent Song"

Download this song: the bittorrent song.mp3 (musical geek friday #11)
Downloaded: 15172 times

Download lyrics: the bittorrent song lyrics (musical geek friday #11)
Downloaded: 2854 times

Click to listen:
[audio:http://www.catonmat.net/download/brent_simon-the_bittorrent_song.mp3]

Have fun and until next geeky Friday! :)

hackers steal moneyAnother great lecture from Google TechTalks.

This lecture is given by Neil Daswani, who has a Ph.D. from Stanford and currently works at Google as a security engineer. He is also an author of a book entitled "Foundations of Security: What Every Programmer Needs to Know", which teaches you state-of-the-art software security design principles, methodology, and concrete programming techniques you need to build secure software systems.

Neil talks about top three web application vulnerabilities that cybercriminals use to steal money. These three vulnerabilities are:

  • SQL Injection attacks,
  • Cross-Site Request Forgery (XSRF) attacks, and
  • Cross-Site Script Inclusion (XSSI) attacks.

I was surprised that he did not cover plain, old Cross-Site Scripting (XSS) attacks, but jumped right to dynamic XSS. You'll have to get familiar with this type attack on your own. See the XSS Faq and XSS Cheat Sheet for more information!

Interesting points from the lecture:

  • [01:48] Years ago cybercriminals were teenagers writing viruses and worms, today they are organized crime looking for stealing money.
  • [03:19] Intermediate goals to stealing money are data theft, extortion and malware distribution.
  • [04:02] Russian Business Network (RBN) is an example of organized cybercrime.
  • [09:00] Attack #1: SQL Injection.
  • [16:30] Preventing SQL injections.
  • [17:00] Don't blacklist (filter) characters in queries. Whitelist (allow) well-defined set of safe values for each field.
  • [18:30] Take a look at mod_security if you use Apache web server. Mod_security is a Web Application Firewall. It allows you to define a set of rules the web application must follow.
  • [19:30] Prepared statements and bind variables help to avoid SQL injections.
  • [23:00] Other mitigations strategies include - limiting web application user's privileges on the sql server, hardenining database server and host operating system.
  • [23:45] Second order SQL injections (link to pdf) abuse data that is already in the database.
  • [23:55] Blind SQL injection (link to pdf) is a technique to reverse engineer the structure of the database.
  • [24:25] Attack #2: Cross-Site Request Forgery (XSRF).
  • [26:00] How XSRF Works.
  • [31:30] Drive-By-Pharming (pdf) is an XSRF technique where the attacker changes DNS settings of a users broadband router (fact - 50% of home users do not change default router password).
  • [34:00] Preventing XSRF.
  • [34:20] Check Referer HTTP header. That doesn't always work because the user might be using a proxy.
  • [36:15] Validate the user by asking him to provide his password or any other token only the user has knowledge of.
  • [37:15] Validate requests via "Action Tokens" which add special tokens to forms to distinguish them from forged forms.
  • [38:30] Attack #3: Cross-Site Script Inclusion (XSSI).
  • [39:10] How XSSI works.
  • [41:20] Dynamic script inclusion example.
  • [47:25] Trends.
  • [50:12] Open Web Application Security Project (OWASP) Top 10 vulnerabilities in 2007 (link).
  • [53:55] Google has some material on Web Security at code.google.com/edu.

Happy hacking! (just kidding ;) )

Koders, Krugle, Codase, Google Code SearchI found a Google Talk on a topic that's related to the motto of my blog - "good coders code, great reuse".

In this talk, professor Tao Xie speaks about his research on using public code repositories together with code search engines for finding common API usage patterns and anti-patterns.

His research software uses the following four code search engines.

He suggests to view Raphael Volz's analysis for more information about these search engines.

Tao has developed three tools, which use the aforementioned search engines:

  • PARSEWeb for finding API usage patterns,
  • XWeb for finding forgotten exception handlers, and
  • NEGWeb for finding misuses of API calls.

See the code mining project website for more information.

The lecture is done in a very academic manner and it's very hard to follow. Be sure that you are really interested in this topic before watching it.

Some excerpts from the lecture:

  • [04:26] A problem with data mining on source code is that it might not have enough data points (usages of API) to discover common patterns.
  • [04:58] It is crucial to have a lot of data points to get good results out of data mining
  • [08:37] Google Code Search indexes publicly hosted SVN and CVS repositories.
  • [09:20] Example of searching for C stdlib's fopen usage on Google Code Search (query: "lang:C file:.c$ fopen\s*\("
  • [11:08] Example of the same search on Krugle.
  • [16:40] Code search engines return partial code samples. Various heuristics are used for type inference.
  • [22:05] Example of integrating Tao's PARSEWeb into Eclipse.
  • [28:15] Interesting idea of constructing and issuing multiple queries to find more code samples.
  • [36:20] A study showed that a proper deallocation of resources after an exception resulted in 17% performance increase.

I'd like to hear some comments on websites that you use for finding code examples!

reddit topLast week I published the Hacker Top program and promised to explain how it was made. Before I do that, let me publish another similar program for Reddit. It's the Reddit Top program.

As I mentioned in the Hacker Top post, Reddit is my favorite source for news because of the great programmer community it has. I actually unsubscribed from all the default subreddits (politics, pics, etc.) and subscribed to some 20 - 30 programming subreddits (like python, erlang, compsci and many others).

Reddit Top was actually derived from Hacker Top. Hacker Top was made in such a manner that it required me just to write a new parser for Reddit to create the new program. The program is written in Python programming language and uses ncurses interface for displaying the stories.

reddit top, follow reddit from console/shell
Try the 'm' keyboard shortcut to switch to other display modes

Download

Download link: reddit top program
Downloaded: 4123 times

I'll describe the process of creating the program in one of the next posts.
If you want to read about that, I suggest you subscribe to my rss feed.

Note - this program is released under GNU GPL.

How to run the program?

1) Make sure you are running a Unix type operating system.

2) Make sure you have Python installed. Any recent version will do.

3) Download and unpack the hacker top program archive.

$ wget 'http://www.catonmat.net/download/reddit-top.tgz'
$ tar -xvzf reddit-top.tgz

4) Change to 'reddit-top' directory which was created by unpacking the archive.

$ cd reddit-top

5) Give the 'reddit_top.py' program execute permissions.

$ chmod u+x reddit_top.py

6) Run the reddit_top.py program.

$ ./reddit_top.py

(If that does not work out, try running 'python ./reddit_top.py')

Make sure that your terminal is at least 80 columns wide, otherwise the program won't be able to display the results nicely.

Command Line Options

If you run the program with '--help' argument, it will display the possible command line options:

Usage: ./reddit_top.py [-h|--help] - displays this
Usage: ./reddit_top.py [-s|--subreddit subreddit]
          [-i|--interval interval] [-n|--new]
          [-u|--utf8 <on|off>]

As the help message suggests, the four main options are:

  • -s or --subreddit, which specifies the subreddit to monitor.

    The default is reddit's front page (http://www.reddit.com).

    At the moment it is not possible to specify multiple subreddits. I'll add the feature in the future. Here are a few examples of valid subreddits - 'programming', 'wtf', 'python', 'politics', and others.
  • -i or --interval, which specifies refresh interval.

    The default refresh interval is 1 minutes. Here are a few examples: 10s (10 seconds), 12m (12 minutes), 2h (2 hours).
  • -u or --utf8, turns on utf8 output mode.

    Default: off. Use this if you know for sure that your terminal supports it, otherwise you might get gibberish.
  • -n or --new, which follows only the newest reddit stories on a given subreddit or front page.

    Default: follow front page stories.

Keyboard Shortcuts

There are several keyboard shortcuts which you should know about when using the Reddit Top program:

  • q - quits the program.
  • u - forces an update of the news.
  • up arrow/down arrow (or alternatively j/k keys) - scrolls the news list up or down.
  • m - changes the display mode. There are 5 different display modes for your taste.

Enjoy the program! I'll write some details how I created it in one of the next posts. Stay tuned - subscribe to my rss feed! :)

Download Reddit Top Program

Download link: reddit top program
Downloaded: 4123 times

The program is released under GNU General Public License.