hackers steal moneyAnother great lecture from Google TechTalks.

This lecture is given by Neil Daswani, who has a Ph.D. from Stanford and currently works at Google as a security engineer. He is also an author of a book entitled "Foundations of Security: What Every Programmer Needs to Know", which teaches you state-of-the-art software security design principles, methodology, and concrete programming techniques you need to build secure software systems.

Neil talks about top three web application vulnerabilities that cybercriminals use to steal money. These three vulnerabilities are:

  • SQL Injection attacks,
  • Cross-Site Request Forgery (XSRF) attacks, and
  • Cross-Site Script Inclusion (XSSI) attacks.

I was surprised that he did not cover plain, old Cross-Site Scripting (XSS) attacks, but jumped right to dynamic XSS. You'll have to get familiar with this type attack on your own. See the XSS Faq and XSS Cheat Sheet for more information!

Interesting points from the lecture:

  • [01:48] Years ago cybercriminals were teenagers writing viruses and worms, today they are organized crime looking for stealing money.
  • [03:19] Intermediate goals to stealing money are data theft, extortion and malware distribution.
  • [04:02] Russian Business Network (RBN) is an example of organized cybercrime.
  • [09:00] Attack #1: SQL Injection.
  • [16:30] Preventing SQL injections.
  • [17:00] Don't blacklist (filter) characters in queries. Whitelist (allow) well-defined set of safe values for each field.
  • [18:30] Take a look at mod_security if you use Apache web server. Mod_security is a Web Application Firewall. It allows you to define a set of rules the web application must follow.
  • [19:30] Prepared statements and bind variables help to avoid SQL injections.
  • [23:00] Other mitigations strategies include - limiting web application user's privileges on the sql server, hardenining database server and host operating system.
  • [23:45] Second order SQL injections (link to pdf) abuse data that is already in the database.
  • [23:55] Blind SQL injection (link to pdf) is a technique to reverse engineer the structure of the database.
  • [24:25] Attack #2: Cross-Site Request Forgery (XSRF).
  • [26:00] How XSRF Works.
  • [31:30] Drive-By-Pharming (pdf) is an XSRF technique where the attacker changes DNS settings of a users broadband router (fact - 50% of home users do not change default router password).
  • [34:00] Preventing XSRF.
  • [34:20] Check Referer HTTP header. That doesn't always work because the user might be using a proxy.
  • [36:15] Validate the user by asking him to provide his password or any other token only the user has knowledge of.
  • [37:15] Validate requests via "Action Tokens" which add special tokens to forms to distinguish them from forged forms.
  • [38:30] Attack #3: Cross-Site Script Inclusion (XSSI).
  • [39:10] How XSSI works.
  • [41:20] Dynamic script inclusion example.
  • [47:25] Trends.
  • [50:12] Open Web Application Security Project (OWASP) Top 10 vulnerabilities in 2007 (link).
  • [53:55] Google has some material on Web Security at code.google.com/edu.

Happy hacking! (just kidding ;) )

Koders, Krugle, Codase, Google Code SearchI found a Google Talk on a topic that's related to the motto of my blog - "good coders code, great reuse".

In this talk, professor Tao Xie speaks about his research on using public code repositories together with code search engines for finding common API usage patterns and anti-patterns.

His research software uses the following four code search engines.

He suggests to view Raphael Volz's analysis for more information about these search engines.

Tao has developed three tools, which use the aforementioned search engines:

  • PARSEWeb for finding API usage patterns,
  • XWeb for finding forgotten exception handlers, and
  • NEGWeb for finding misuses of API calls.

See the code mining project website for more information.

The lecture is done in a very academic manner and it's very hard to follow. Be sure that you are really interested in this topic before watching it.

Some excerpts from the lecture:

  • [04:26] A problem with data mining on source code is that it might not have enough data points (usages of API) to discover common patterns.
  • [04:58] It is crucial to have a lot of data points to get good results out of data mining
  • [08:37] Google Code Search indexes publicly hosted SVN and CVS repositories.
  • [09:20] Example of searching for C stdlib's fopen usage on Google Code Search (query: "lang:C file:.c$ fopen\s*\("
  • [11:08] Example of the same search on Krugle.
  • [16:40] Code search engines return partial code samples. Various heuristics are used for type inference.
  • [22:05] Example of integrating Tao's PARSEWeb into Eclipse.
  • [28:15] Interesting idea of constructing and issuing multiple queries to find more code samples.
  • [36:20] A study showed that a proper deallocation of resources after an exception resulted in 17% performance increase.

I'd like to hear some comments on websites that you use for finding code examples!

reddit topLast week I published the Hacker Top program and promised to explain how it was made. Before I do that, let me publish another similar program for Reddit. It's the Reddit Top program.

As I mentioned in the Hacker Top post, Reddit is my favorite source for news because of the great programmer community it has. I actually unsubscribed from all the default subreddits (politics, pics, etc.) and subscribed to some 20 - 30 programming subreddits (like python, erlang, compsci and many others).

Reddit Top was actually derived from Hacker Top. Hacker Top was made in such a manner that it required me just to write a new parser for Reddit to create the new program. The program is written in Python programming language and uses ncurses interface for displaying the stories.

reddit top, follow reddit from console/shell
Try the 'm' keyboard shortcut to switch to other display modes


Download link: reddit top program
Downloaded: 4343 times

I'll describe the process of creating the program in one of the next posts.
If you want to read about that, I suggest you subscribe to my rss feed.

Note - this program is released under GNU GPL.

How to run the program?

1) Make sure you are running a Unix type operating system.

2) Make sure you have Python installed. Any recent version will do.

3) Download and unpack the hacker top program archive.

$ wget 'http://www.catonmat.net/download/reddit-top.tgz'
$ tar -xvzf reddit-top.tgz

4) Change to 'reddit-top' directory which was created by unpacking the archive.

$ cd reddit-top

5) Give the 'reddit_top.py' program execute permissions.

$ chmod u+x reddit_top.py

6) Run the reddit_top.py program.

$ ./reddit_top.py

(If that does not work out, try running 'python ./reddit_top.py')

Make sure that your terminal is at least 80 columns wide, otherwise the program won't be able to display the results nicely.

Command Line Options

If you run the program with '--help' argument, it will display the possible command line options:

Usage: ./reddit_top.py [-h|--help] - displays this
Usage: ./reddit_top.py [-s|--subreddit subreddit]
          [-i|--interval interval] [-n|--new]
          [-u|--utf8 <on|off>]

As the help message suggests, the four main options are:

  • -s or --subreddit, which specifies the subreddit to monitor.

    The default is reddit's front page (http://www.reddit.com).

    At the moment it is not possible to specify multiple subreddits. I'll add the feature in the future. Here are a few examples of valid subreddits - 'programming', 'wtf', 'python', 'politics', and others.
  • -i or --interval, which specifies refresh interval.

    The default refresh interval is 1 minutes. Here are a few examples: 10s (10 seconds), 12m (12 minutes), 2h (2 hours).
  • -u or --utf8, turns on utf8 output mode.

    Default: off. Use this if you know for sure that your terminal supports it, otherwise you might get gibberish.
  • -n or --new, which follows only the newest reddit stories on a given subreddit or front page.

    Default: follow front page stories.

Keyboard Shortcuts

There are several keyboard shortcuts which you should know about when using the Reddit Top program:

  • q - quits the program.
  • u - forces an update of the news.
  • up arrow/down arrow (or alternatively j/k keys) - scrolls the news list up or down.
  • m - changes the display mode. There are 5 different display modes for your taste.

Enjoy the program! I'll write some details how I created it in one of the next posts. Stay tuned - subscribe to my rss feed! :)

Download Reddit Top Program

Download link: reddit top program
Downloaded: 4343 times

The program is released under GNU General Public License.

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

teco user’s guideRemember the song from Musical Geek Friday #7 - Just One More Hack? I said that it was the only song ever about a debugger. I must admit that I was wrong! This week I bring to you another song about a debugger - the DDT debugger (and the ancient TECO text editor).

TECO stands for Text Editor and COrrector. It was developed at Massachusetts Institute of Technology (MIT) and was one of the first text editors ever written. It grew over the years, gaining popularity and features. Along the way, it also became a Turing-complete programming language. Several sets of editor macros were developed and used. I was surprised to find that around 1975 Richard Stallman organized these Editor MACroS into the first Emacs-like text editor. Here is an article about the history of TECO.

DDT, on the other hand, was a collection of several debugger programs. These debugging programs included a command shell from which TECO was also used interchangeably with DDT. DDT stands for Dynamic Debugging Technique (initially known as DEC Debugging Tape).

Talking about the 10th geek song, it's written and performed by a band called "red martian".

Enough of facts, here is the TECO and DDT song:


Download this song: teco and ddt.mp3 (musical geek friday #10)
Downloaded: 54164 times

Download lyrics: teco and ddt lyrics (musical geek friday #10)
Downloaded: 2971 times

TECO and DDT Lyrics:

Oh, you can hack anything that you want with just TECO and DDT.
You can hack anything that you want with just TECO and DDT!

Oh, no, you don't have to tell me what would I need because it's simply plain to see...
That I can hack anything that I want with just TECO and DDT!


Oh, you can hack anything that you want with just TECO and DDT.
Oh, you can hack anything that you want with just TECO and DDT!

Oh, no, you don't have to tell me what would I need because it's simply plain to see...
That I can hack anything that I want with just TECO, just TECO, just TECO and DDT!

I'd like to thank Adam for letting me know about this song.

Download "TECO and DDT" Song

Download this song: teco and ddt.mp3 (musical geek friday #10)
Downloaded: 54164 times

Download lyrics: teco and ddt lyrics (musical geek friday #10)
Downloaded: 2971 times

Click to listen:

Have fun and until next geeky Friday! :)

Three Beautiful QuicksortsHave you heard about a book called Beautiful Code? This book is composed of 33 essays by well-known computer scientists and software engineers on the same question - "What is the most beautiful piece of code you know?".

Chapter 3 of this book is written by Jon Bentley. The chapter is curiously entitled "The Most Beautiful Code I Never Wrote" and it talks about a beautiful implementation and experimental analysis of the most widely used sorting algorithm in the world called Quicksort.

Jon also decided to give a talk at Google on the same topic. In this techtalk, he talks about three Quicksort topics. In the first part Jon discusses an elegant and tiny implementation of Quicksort. In the next part he analyzes the Quick sort algorithm by instrumenting this elegant implementation. Finally, he talks about a production quality Quicksort implementation which has been included in C standard library (qsort function).

What I learned the most from this lecture is the experimental part of counting number of comparsions of an algorithm. I had never read about experimental analysis of algorithms.

Interesting ideas from the lecture:

  • [02:30] What's the most beautiful code you've ever written?
  • [04:58] The classical Quicksort algorithm.
  • [05:32] Best case (nlog(n)), worst (n2) and average case (nlog(n)) analysis results of Quicksort.
  • [08:00] Randomization in Quicksort.
  • [09:08] Lomuto's idea of partitioning.
  • [10:48] Partitioning algorithm implementation pseudocode.
  • [11:10] Beautiful Quicksort code.
  • [12:13] Strengths of Quicksort: simple to implement and explain, pretty fast. Weaknesses of Quicksort: quadratic time for increasing elements (fix by randomization), quadratic time for equal elements, not stable.
  • [13:55] "vigorous writing is concise" /William Strunk Jr.'s/
  • [14:49] Simplest semi-production C Quicksort implementation.
  • [15:35] How much time will Quicksort take on average on indistinct elements?
  • [17:00] Graph of (runtime/N, logN) for Quicksort and Heapsort.
  • [18:31-32:00] Finding number of comparisons for Quicksort experimentally.
  • [23:08] Doing various tricks the time to count comparisons reduced from nlog(n) to n and space reduced from n to log(n).
  • [24:25] The Inventor's Paradox: "The more ambitious plan may have more chance of success."
  • [30:13] Perlis: "Simplicity does not precede complexity, but follows it."
  • [30:45] Mathematical solution to Quicksort's comparison complexity.
  • [31:35] The solution is ~ 1.386nlog(n)
  • [32:10] Analysis of comparisons in Quicksort is isomorphic to time taken to insert an element in a Binary Search Tree.
  • [36:25] A beautiful bug report on the original implementation of C library's qsort
  • [39:31] Engineering qsort together with Doug McIlroy. Goals and strategy.
  • [41:23] With equal elements, some Quicksorts go quadratic, some go nlog(n), some go linear.
  • [41:46] Is there a quick algorithm for ternary (<, = , >) partitioning?
  • [46:00] Knuth said that the overheard and comparisons took the same amount of time but the swaps really killed you, where as Jon and Doug found that time(overhead) < time(swaps) < time(comparisons), if swaps are done properly.
  • [46:30] Choosing effective partitioning element was important. First, middle or random gave runtime aproximately 1.386nlog(n), where as median of three or median and three medians gave run times of 1.188nlog(n) and 1.09nlog(n), respectivey.
  • [46:50] The final qsort had ternary partitioning, fast swap, adaptive sample for partition element, insertion sort for small files.
  • [47:40] The complete code of C library's qsort.
  • [48:15] Beauties of qsort (beautiful algorithm, beautiful interface, beautiful bug report, beautiful cost models, beautiful tests)
  • [50:40] Three beautiful Quicksorts - 1) a simple teaching tool, 2) an anaysis of Quicksort, 3) An industrial-strength Quicksort.
  • [51:35] Q and A.

Here is a working C program which runs Quicksort algorithm (in bold) from the Chapter 3 of the book. The algorithm sorts the global array of integers x.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define S(a) (sizeof(a)/sizeof(a[0]))

int x[] = { 56, 10, 39, 15, 20, 62, 14, 39, 77, 17 };

void swap(int i, int j) {
    int t = x[i];
    x[i] = x[j];
    x[j] = t;

int randint(int l, int u) {
    return rand()%(u-l+1)+l;

<strong>void quicksort(int l, int u) {
    int i, m;
    if (l >= u) return;
    swap(l, randint(l, u));
    m = l;
    for (i = l+1; i <= u; i++)
        if (x[i] < x[l])
            swap(++m, i);
    swap(l, m);
    quicksort(l, m-1);
    quicksort(m+1, u);

int main() {
    int i;
    quicksort(0, S(x)-1);
    for (i = 0; i < S(x); i++) {
        printf("%d ", x[i]);
    return 0;

In a functional programming language, Quicksort can be written even more elegantly. Check out this Quicksort written in Haskell (thanks to bayareaguy.)

  quicksort (x:xs) = 
      quicksort [ i | i <- xs, i < x ] 
      ++ [x] ++
      quicksort [ i | i <- xs, i >= x ]

  quicksort [] = []

And what's the most beautiful code you've ever written?

Ps. If you got interested, I suggest you get the Beautiful Code book. Jon Bentley carefully explains the ticks used for doing the experimental part of analysis in his chapter.