GitHub Social CodingHey everyone, I just pushed 30 of my projects to GitHub. I realized that all the projects were scattered across my blog and there was no central repository. So I took time to organize them, write documentation, and uploaded them to GitHub.

I did all of these projects for fun and to learn better programming. You can't become a great programmer if you don't program a lot. The more you program, the more language idioms and constructs you'll learn. You'll learn common patterns that occur frequently in programming and it will greatly improve your problem solving skills.

These were all relatively small projects and I think I am ready to move to the next level. I have several larger ideas in mind that need to be turned into code. I will post the updates to catonmat.

If you find any of my projects interesting, clone and start hacking. You can also follow my profile at github. :-)

Here they all are. Enjoy!

1. Busy Beaver

Busy beaver is a computer science problem to finding the smallest Turing Machine that outputs the most data and eventually halts. This project is an implementation of a Turing Machine in Python and C++ that runs the busy beavers. It also comes with Turing Machine's tape visualization tool written in Perl.

2. Feedburner Graph Generator

Current Feedburner statistics graphs do not look nice. I wrote this Perl program to generate the nice graphs they had in 2008.

3. CodingHorror Keyword Analyzer

This is a Perl program that parses public statcounter data for codinghorror.com blog and stores the search keywords in an SQLite database.

4. Adns

This is a tiny Python module that does asynchronous DNS resolution with adns library. My benchmarks were 20,000 resolutions per minute on a 512kbit/s connection.

5. Winamp Music Reporter Plugin

This is a Winamp plugin that reports to digitalpoint forums the tracks you are listening to. Written in C and uses Winamp SDK. The code can be modified to make it report to Facebook or Twitter, or anywhere you wish.

6. Bithacks.h

Bithacks.h is a C header file that implements various bit operations.

7. Set Operations in Unix Shell

This is an implementation of 14 set operations by using only Unix utilities such as sort, uniq, diff, comm, cat, head, tail, awk, and others.

8. Hacker Top

Hacker top is a Unix top-like program written in Python for monitoring Hacker News from the console.

9. Reddit Top

Reddit top is a Unix top-like program written in Python for monitoring Reddit from the console.

10. GNU Awk YouTube Downloader

This is a program written in GNU Awk that downloads YouTube videos. It works well but please don't take it too seriously. It's a proof of concept code, written to see what can be done with GNU Awk's networking interface.

11. VBScript YouTube Downloader

This is a program written in VBScript that downloads YouTube videos. I wrote it because when I was a child, I did a lot of programming in Visual Basic and I wanted to remember what it was like.

12. Perl One-Liner YouTube Downloader

This is a Perl one-liner that downloads YouTube videos. I wrote it because I love Perl golf.

13. YouTube Video Uploader

This is a YouTube video uploader that works without any APIs. It just simulates what a browser would do and takes all the steps to post the video and set the video info. Written in Perl.

14. Plurk Translation Tool

This is a GreaseMonkey script that translates Plurks to English.

Plurk is like Twitter but more fun and organized. Come be my friend on Plurk, my profile name is pkrumins. Written in JavaScript.

15. Command Line Plurker

Plurk from your command line. Written in Perl.

16. Find Plurks on Google

This program searches for plurks that were indexed by Google. It outputs URLs to indexed pages. It's written in Python and uses my xgoogle library.

17. Delete Plurks

This is a GreaseMonkey script that adds a "delete" button on individual plurk pages. This way you can delete your or other people plurks (if it's your thread) directly from the plurk page. Written in JavaScript.

18. Xgoogle

A Python library to Google services - Google Search, Google Sets, Google Translate and Sponsored Links.

19. Reddit Media

The old Reddit Media website that I created in 2007. No longer maintained. It was written in Perl.

20. Reddit River

The old Reddit River website for mobile devices. No longer maintained. It was written in Python.

21. Digpicz

This is the digpicz.com website that I created back in 2007. It got massive attention back then because Digg didn't have picture section then. It was written in Perl.

22. Picurls

Picurls.com is a picture aggregator much like popurls.com but for pics. Currently down for maintenance, will be soon up again. Written in PHP.

23. Bash Vi Editing Mode Cheat Sheet

Bash has two input modes - emacs and vi. This is vi input/editing mode keyboard shortcut cheat sheet.

24. Bash Emacs Editing Mode Cheat Sheet

Bash has two input modes - emacs and vi. This is emacs input/editing mode keyboard shortcut cheat sheet.

25. Bash History Cheat Sheet

This is the bash history cheat sheet. It summarizes everything there is to know about working efficiently with command line history in bash.

26. Screen Cheat Sheet

This is the screen terminal emulator cheat sheet. It lists the default keyboard shortcuts for working with screen.

27. Perl Special Variable Cheat Sheet

This is Perl predefined variable cheat sheet. It lists all the variables from `perldoc perlvar` with a concise description and some example usages. I created it when I was mastering Perl. I enjoy Perl golf and I wanted to know all of the variables.

28. Perl pack/unpack and printf/sprintf Cheat Sheet

This is Perl pack/unpack/printf/sprintf cheat sheet. The pack/unpack cheat sheet is on page one, and it lists the pack/unpack template parameters and what they do. The printf/sprintf cheat sheet is on page two, and it lists the printf/sprintf format specifiers and format attributes.

I created this when I was mastering what could be done with pack/unpack. I added printf/sprintf as I could never remember all the format specifiers.

29. Awk Cheat Sheet

AWK programming language cheat sheet.

30. Sed Cheat Sheet

This is sed cheat sheet. Sed is the Unix stream editor. If you don't know it, you don't know Unix.

31. Ed Cheat Sheet

This is ed cheat sheet. Ed is The Unix Text Editor.

One day when I was learning sed, I got interested if it originated from ed, which got me interested in ed itself. I find that cheat sheets are a great way to learn new topics and therefore I created this cheat sheet.

32. The New catonmat.net Website

I just started pushing code to the new catonmat.net repository. It's going to be a state of the art personal website from now on.

I have around 100 ideas for it, and the next big article series on catonmat is going to be "Designing the new catonmat.net together with me." You should subscribe to my blog here if you are interested!

If you have any questions, don't hesitate to ask in the comments!

This article is part of the article series "MIT Linear Algebra."
<- previous article next article ->

MIT Introduction to Linear AlgebraThis is the second post in an article series about MIT's course "Linear Algebra". In this post I will review lecture two on solving systems of linear equations by elimination and back-substitution. The other topics in the lecture are elimination matrices (also known as elementary matrices) and permutation matrices.

The first post covered the geometry of linear equations.

One of my blog readers, Seyed M. Mottaghinejad, had also watched this course and sent me his lecture notes. They are awesome. Grab them here: lecture notes by Seyed M. Mottaghinejad (includes .pdf, .tex and his document class).

Okay, here is the second lecture.

Lecture 2: Elimination with Matrices

Elimination is the way every software package solves equations. If the elimination succeeds it gets the answer. If the matrix A in Ax=b is a "good" matrix (we'll see what a good matrix is later) then the elimination will work and we'll get the answer in an efficient way. It's also always good to ask how can it fail. We'll see in this lecture how elimination decides if the matrix A is good or bad. After the elimination there is a step called back-substitution to complete the answer.

Okay, here is a system of equations. Three equations in three unknowns.

three equations, three unknowns

Remember from lecture one, that every such system can be written in the matrix form Ax=b, where A is the matrix of coefficients, x is a column vector of unknowns and b is the column vector of solutions (the right hand side). Therefore the matrix form of this example is the following:

Ax=b

For the elimination process we need the matrix A and the column vector b. The idea is very simple, first we write them down in the augmented matrix form A|b:

augmented matrix form a|b

Next we subtract rows from one another in such a way that the final result is an upper triangular matrix (a matrix with all the elements below the diagonal being zero).

So the first step is to subtract the first row multiplied by 3 from the second row. This gives us the following matrix:

elimination step one

The next step is to subtract the second row multiplied by 2 from the third row. This is the final step and produces an upper triangular matrix that we needed:

elimination step one

Now let's write down the equations that resulted from the elimination:

Ax=b after elimination

Working from the bottom up we can immediately find the solutions z, y, and x. From the last equation, z = -10/5 = -2. Now we put z in the middle equation and solve for y. 2y = 6 + 2z = 6 + 2(-2) = 6 - 4 = 2 => y = 1. And finally, we can substitute y and z in the first equation and solve for x. x = 2 - 2y - z = 2 - 2(1) - (-2) = 2.

We have found the solution, it's (x=2, y=1, z=-2). The process we used to find it is called the back-substitution.

The elimination would fail if taking a multiple of one row and adding to the next would produce a zero on the diagonal (and there would be no other row to try to exchange the failing row with).

The lecture continues with figuring out how to do the elimination by using matrices. In the first lecture we learned that a matrix times a column vector gave us a combination of the columns of the matrix. Similarly, a row times a matrix gives us a combination of the rows of the matrix.

Let's look at our first step of elimination again. It was to subtract 3 times the first row from the second row. This can be expressed as matrix multiplication (forget the column b for a while):

matrix elimination

Let's call the matrix on the right E as elimination matrix (or elementary matrix), and give it subscript E21 for making a zero in the resulting matrix at row 2, column 1.

The next step was twice the second row minus the third row:

matrix elimination

The matrix on the right is again an elimination matrix. Let's call it E32 for giving a zero at row 3, column 2.

But notice that these two operations can be combined:

matrix elimination

And we can write E32(E21A) = U. Now remember that matrix operations are associative, therefore we can change the parenthesis (E32E21)A = U. If we multiply (E32E21) we get a single matrix E that we will call the elimination matrix. What we have done is expressed the whole elimination process in matrix language!

Next, the lecture continues takes a step back and looks at permutation matrices. The question asked is "what matrix would exchange two rows of a matrix?" and "what matrix would exchange two columns of a matrix?"

Watch the lecture to find the answer to these questions!

Topics covered in lecture two:

  • [00:25] Main topic for today: elimination.
  • [02:35] A system with three equations and three unknowns: two linear equations, two unknowns
  • [03:30] Elimination process. Taking matrix A to U.
  • [08:35] Three pivots of matrix U.
  • [10:15] Relation of pivots to determinant of a matrix.
  • [10:40] How can elimination fail?
  • [14:40] Back substitution. Solution (x=2, y=1, z=-2).
  • [19:45] Elimination with matrices.
  • [21:10] Matrix times a column vector is a linear combination of columns the matrix.
  • [22:15] A row vector times a matrix is a linear combination of rows of the matrix.
  • [23:40] Matrix x column = column.
  • [24:10] Row x matrix = row.
  • [24:20] Elimination matrix for subtracting three times row one from row two.
  • [26:55] The identity matrix.
  • [30:00] Elimination matrix for subtracting two times row two from row three.
  • [32:40] E32E21A = U.
  • [37:20] Permutation matrices.
  • [37:30] How to exchange rows of a 2x2 matrix?
  • [37:55] Permutation matrix P to exchange rows of a 2x2 matrix.
  • [38:40] How to exchange columns of a 2x2 matrix?
  • [39:40] Permutation matrix P to exchange columns of a 2x2 matrix.
  • [42:00] Commutative law does not hold for matrices.
  • [44:25] Introduction to inverse matrices.
  • [47:10] E-1E = I.

Here are my notes of lecture two:

MIT Linear Algebra, Lecture 2: Elimination with Matrices
My notes of linear algebra lecture 2 on elimination with matrices.

Have fun with this lecture! The next post is going to be either on lectures three and four together or just lecture three. Lecture three will touch a bit more on matrix multiplication and then dive into the inverse matrices. Lecture four will cover A=LU matrix decomposition (also called factorization).

PS. This course is taught from Introduction to Linear Algebra textbook. Get it here:

There is a mysterious underground Perl organization that hardly anyone has heard of. Their primary mission is to discover new secret Perl operators but they are not allowed to talk about them. I managed to infiltrate this organization and steal some of their secrets. Here are 8 secret Perl operators that I retrieved from their mysterious hoard:

(to be honest, only the first is a real Perl operator, all others are syntactic tricks.)

The Spaceship Operator <=>

<=> is the spaceship operator. Most commonly it's used to sort a list of numbers. Here is an example:

my @numbers = (-59, 99, 87, 1900, 42, 1, -999, 30000, 0);
my @sorted = sort { $a <=> $b } @numbers;
print "@sorted\n";

# output: -999 -59 0 1 42 87 99 1900 30000

If you don't specify a block with the spaceship operator to sort() function, it will treat the numbers as strings and sort them asciibetically:

my @numbers = (-59, 99, 87, 1900, 42, 1, -999, 30000, 0);
my @sorted = sort @numbers;
print "@sorted\n";

# output: -59 -999 0 1 1900 30000 42 87 99

In general the spaceship operator is defined as following:

  • $a <=> $b is -1 if $a < $b.
  • $a <=> $b is 0 if $a == $b.
  • $a <=> $b is 1 if $a > $b.
  • $a <=> $b is undef if $a and $b are NaN.

The Eskimo Greeting Operator }{

The Eskimo greeting operator can be most frequently met in Perl one-liners.

For example, this one-liner uses the Eskimo greeting to emulate `wc -l` command and prints the number of lines in a file:

perl -lne '}{ print $.' file

Here the Eskimo greets the print function. To understand what happens here, you have to know what the -n command line option does. It causes Perl to assume the following loop around your program:

while (<>) {
  ...
}

Where `...` contains the code specified by the -e command line option. If the code specified is `}{ ...` then it causes the while loop to be closed with no actions to be done and only the `...` part gets executed.

Therefore the one-liner above is equivalent to:

while (<>) {
}
{
print $.
}

This just prints the special variable $. which is the number of input lines processed.

This can be extended further and we can have Eskimo greet code on both sides:

perl -lne 'code1 }{ code2'

Code1 gets executed within the loop and code2 after the loop is done:

while (<>) {
  code1
}
{
  code2
}

If you are interested in the topic of Perl one-liners, see the first part of my article "Perl One-Liners Explained".

The Goatse Operator =()=

The Goatse operator, as nasty as it may sound, doesn't do any nasty things. Instead it does a wonderful thing and causes an expression on the right to be evaluated in array context.

Here is an example,

my $str = "5 foo 6 bar 7 baz";
my $count =()= $str =~ /\d/g;
print $count;

This program prints 3 - the number of digits in $str. How does it do it? Let's deparse the 2nd line:

(my $count = (() = ($str =~ /\d/g)));

What happens here is that the expression ($str =~ /\d/g) gets assigned to the empty list (). Assigning to a list forces the list context. The whole (() = ($str =~ /\d/g)) thing gets evaluated in list context, but then it gets assigned to a scalar which causes it to get evaluated again in scalar context. So what we have is a list assignment in scalar context. The key thing to remember is that a list assignment in scalar context returns the number of elements on the right-hand side of the list assignment. In this example the right-hand side of the list assignment is ($str =~ /\d/g). This matches globally (/g flag) and finds 3 digits in $str. Therefore the result is 3.

The Turtle Operator "@{[]}"

I couldn't find the name of this operator therefore I decided to name it the turtle operator, because it looks a bit like a turtle, @ being the head, and {[]} being the shell.

This operator is useful for interpolating an array inside a string.

Compare these two examples:

print "these people @{[get_names()]} get promoted"

and

print "these people ", join " ",get_names(), " get promoted"

Clearly, the first example wins for code clarity.

More precisely, writing

print "@{[something]}"

is exactly the same as writing

print join $", something

The Inchworm Operator ~~

The inchworm operator can be used to force scalar context.

Here is an example with localtime() function. In scalar context localtime() returns human readable time, but in list context it returns a 9-tuple with various date elements.

$ perl -le 'print ~~localtime'
Mon Nov 30 09:06:13 2009

Here localtime was evaluated in scalar context, even though it was called within print that forces list context. It returned human readable date and time.

$ perl -le 'print localtime'
579301010913330

Here localtime returned a list of 9 elements and print function just printed them one after another. To really see that it's a list of 9 elements, let's use the turtle operator:

$ perl -le 'print "@{[localtime]}"'
5 13 9 30 10 109 1 333 0

The Inchworm-On-A-Stick Operator ~-

For numbers greater than 0, this operator decrements them by one. Example:

my $x = 5;
print ~-$x;

# prints 4

It works because ~-$x parses to (~(-$x)), which on a two-complement machine is effectively the same as $x-1.

The Spacestation Operator -+-

The spacestation operator turns a string starting with positive number into a number. Here are some examples:

print -+-"4zy"   # prints 4
print -+-'3.99'  # prints 3.99
print -+-'2e5'   # prints 200000

The Venus Operator 0+

It's named the Venus operator because the astronomical symbol for the planet Venus looks similar.

It does the same as the spacestation operator, it numifies a string, but it binds less tightly than spacestation. An example:

print 0+"4zy"  # prints 4

Share the secrets!

This article is part of the article series "MIT Linear Algebra."
<- previous article next article ->

MIT Introduction to Linear AlgebraThis is going to be my summary of the freely available* Linear Algebra course from MIT. I watched the lectures of this course in the summer of 2008. This was not the first time I learned linear algebra. I had already had two terms of linear algebra when I studied physics back in 2004. But it was not enough for a curious mind like mine. I wanted to see how it was taught at the world's best university.

The rationale of why I am posting these mathematics lectures on my programming blog is because I believe that if you want to be a great programmer and work on the most exciting and world changing problems, then you have to know linear algebra. Larry and Sergey wouldn't have created Google if they didn't know linear algebra. Take a look at this publication if you don't believe me "Linear Algebra Behind Google." Linear algebra has also tens and hundreds of other computational applications, to name a few, data coding and compression, pattern recognition, machine learning, image processing and computer simulations.

The course contains 35 lectures. Each lecture is 40 minutes long, but you can speed them up and watch one in 20 mins. The course is taught by no other than Gilbert Strang. He's the world's leading expert in linear algebra and its applications and has helped the development of Matlab mathematics software. The course is based on working out a lot of examples, there are almost no theorems or proofs.

The textbook used in this course is Introduction to Linear Algebra by Gilbert Strang:

I'll write the summary in the same style as I did my summary of MIT Introduction to Algorithms. I'll split up the whole course in 30 or so posts, each post will contain one or more lectures together with my comments, my scanned notes, embedded video lectures and a timeline of the topics in the lecture. But, not to flood my blog with just mathematics, I will write one or two programming posts in between. You should subscribe to my posts through RSS here.

* The whole course is available at MIT's Open Course Ware: Course 18.06, Linear Algebra.

I'll review the first lecture today.

Lecture 1: The Geometry of Linear Equations

The first lecture starts with Gilbert Strang stating the fundamental problem of linear algebra, which is to solve systems of linear equations. He proceeds with an example.

The example is a system of two equations in two unknowns:

two linear equations, two unknowns

There are three ways to look at this system. The first is to look at it a row at a time, the second is to look a column at a time, and the third is use the matrix form.

If we look at this equation a row at a time, we have two independent equations 2x - y = 0 and -x + 2y = 3. These are both line equations. If we plot them we get the row picture:

row picture of two equations

The row picture shows the two lines meeting at a single point (x=1, y=2). That's the solution of the system of equations. If the lines didn't intersect, there would have been no solution.

Now let's look at the columns. The column at the x's is (2, -1), the column at y's is (-1, 2) and the column at right hand side is (0, 3). We can write it down as following:

column picture of two equations

This is a linear combination of columns. What this tells us is that we have to combine the right amount of vector (2, -1) and vector (-1, 2) to produce the vector (0, 3). We can plot the vectors in the column picture:

column picture of two equations

If we take one green x vector and two blue y vectors (in gray), we get the red vector. Therefore the solution is again (x=1, y=2).

The third way to look at this system entirely through matrices and use the matrix form of the equations. The matrix form in general is the following: Ax = b where A is the coefficient matrix, x is the unknown vector and b is the right hand side vector.

How to solve the equations written in matrix form will be discussed in the next lectures. But I can tell you beforehand that the method is called Gauss elimination with back substitution.

For this two equations, two unknowns system, the matrix equation Ax=b looks like this:

Matrix form of two equations in two unknowns

The next example in the lecture is a system of three equations in three unknowns:

three equations, three unknowns

We can no longer plot it in two dimensions because there are three unknowns. This is going to be a 3D plot. Since the equations are linear in unknowns x, y, z, we are going to get three planes intersecting at a single point (if there is a solution).

Here is the row picture of 3 equations in 3 unknowns:

Row picture of three equations

The red is the 2x - y = 0 plane. The green is the -x + 2y - z = -1 plane, and the blue is the -3y + 4z = 4 plane.

Notice how difficult it is to spot the point of intersection? Almost impossible! And all this of going one dimension higher. Imagine what happens if we go 4 or higher dimensions. (The intersection is at (x=0, y=0, z=1) and I marked it with a small white dot.)

The column picture is almost as difficult to understand as the row picture. Here it is for this system of 3 equations in 3 unknowns:

Column picture of three equations

The first column (2, -1, 0) is red, the second column (-1, 2, -3) is green, the fourth column (0, -1, 4) is blue, and the result (0, -1, 4) is gray.

Again, it's pretty hard to visualize how to manipulate these vectors to produce the solution vector (0, -1, 4). But we are lucky in this particular example. Notice that if we take none of red vector, none of green vector and one of blue vector, we get the gray vector! That is, we didn't even need red and green vectors!

This is all still tricky, and gets much more complicated if we go to more equations with more unknowns. Therefore we need better methods for solving systems of equations than drawing plane or column pictures.

The lecture ends with several questions:

  • Can Ax = b be solved for any b?
  • When do the linear combination of columns fill the whole space?,
  • What's the method to solve 9 equations with 9 unknowns?

The examples I analyzed here are also carefully explained in the lecture, you're welcome to watch it:

Topics covered in lecture one:

  • [00:20] Information on textbook and course website.
  • [01:05] Fundamental problem of linear algebra: solve systems of linear equations.
  • [01:15] Nice case: n equations, n unknowns.
  • [02:20] Solving 2 equations with 2 unknowns two linear equations, two unknowns
  • [02:54] Coefficient matrix.
  • [03:35] Matrix form of the equations. Ax=b.
  • [04:20] Row picture of the equations - lines.
  • [08:05] Solution (x=1, y=2) from the row picture.
  • [08:40] Column picture of the equations - 2 dimensional vectors.
  • [09:50] Linear combination of columns.
  • [12:00] Solution from the column picture x=1, y=2.
  • [12:05] Plotting the columns to produce the solution vector.
  • [15:40] Solving 3 equations with 3 unknowns three equations, three unknowns
  • [16:46] Matrix form for this 3x3 equation.
  • [17:30] Row picture - planes.
  • [22:00] Column picture - 3 dim vectors.
  • [24:00] Solution (x=0, y=0, z=1) from the column picture by noticing that z vector is equal to b vector.
  • [28:10] Can Ax=b be solved for every b?
  • [28:50] Do the linear combinations of columns fill the 3d space?
  • [32:30] What if there are 9 equations and 9 unknowns?
  • [36:00] How to multiply a matrix by a vector? Two ways.
  • [36:40] Ax is a linear combination of columns of A.

Here are my notes of lecture one. Sorry about the handwriting. It seems that I hadn't written much at that time and the handwriting had gotten really bad. But it gets better with each new lecture. At lecture 5 and 6 it will be as good as it gets.

MIT Linear Algebra, Lecture 1: The Geometry of Linear Equations
Notes of Linear Algebra lecture 1 on The Geometry of Linear Equations.

Have fun with this lecture! The next post is going to be about a systematic way to find a solution to a system of equations called elimination.

PS. This course is taught from Introduction to Linear Algebra textbook. Get it here:

Feedburner used to have a really nice RSS subscriber growth graph. I loved it. But then one day they were acquired by Google and they changed their nice chart to an interactive flash thing that was slow and looked just awful.

Here is how awesome the graph used to look like:


This graph was taken from my "one year of blogging" post.

And how it looks now:

Crappy Feedburner Stats
This graph was taken from Feedburner stats dashboard today.
Choose "Show stats for" -> "all time" to generate this graph.

This critter takes 35MB of RAM, responds in 4 seconds and worst of all, looks very, very ugly. I don't know why would anyone replace a nice 6.5KB image with a 35MB monster.

I don't want to see this ugliness anymore, therefore I'll create a Perl program that generates the awesome graph they used to have. I'll write my thought process in creating this program in this post. Here it goes.

First I need to get the data somehow. I remember they had some kind of an API to get the data. A quick Google search for feedburner api returns this link Feedburner Awareness API. Ya, that's it. This is the documentation of their API.

Accessing the following URL gets me subscriber count data from July 1, 2007 to November 17, 2009:

http://feedburner.google.com/api/awareness/1.0/GetFeedData?uri=catonmat&dates=2007-07-01,2009-11-17

Excellent, now I can write the Perl program. It will need to parse the XML data, draw the chart and save the image to a file.

Hmm, how should I invoke my program? Ok, here is how:

$ generate_feedburner_graph.pl <<strong>feed name</strong>> [<<strong>start date</strong>> [<<strong>end date</strong>>]]

# if <strong>end date</strong> is not specified, it's set to today.
# if <strong>start date</strong> is not specified, it's set to first
# day when the feed had at least one subscriber.

This program will use LibGD to generate the image. It will save it to a file called feed_name-start_date-end_date.png.

Now I need to find the colors used in the awesome feedburner graph. For this purpose I'll use ColorZilla Firefox plugin. The green one is #95CF9C, the background is #F2F8FC, the light grid is #CCCECE, the one that separates the green area from background is #687279, and the x-y axis are #808080.

Alright, now I have everything I need to create the program.

... Several hours later ...

Done!

One thing I forgot to mention is that you will need DejaVuSans TrueType font to run this program (it uses it to draw text). Download it and put the DejaVuSans.ttf in the same directory as the program.

#!/usr/bin/perl
#
# Feedburner graph generator
# Version 1.0
#

use warnings;
use strict;

use WWW::Mechanize;
use List::Util 'max';
use XML::Simple;
use POSIX;
use GD;

# This is the API URL that returns XML data with feed statistics by day.
my $feedburner_url = "http://feedburner.google.com/api/awareness/1.0/GetFeedData?uri=%s&dates=%s,%s";

# This function prints the usage and terminates the program.
sub usage {
    printf "Usage: %s <feed name> [<start date> [<end date>]]\n", $0;
    print "Parameters:\n";
    print "<feed name>  - your feed name, for example 'catonmat'\n";
    print "<start date> - start date (YYYY-MM-DD)\n";
    print "<end date>   - end date (YYYY-MM-DD), today if not specified\n";
    exit(1);
}

# This function checks if DejaVuSans font is present, if not
# it prints the instructions on how to download and terminates the program.
sub check_dejavu_sans {
    unless (-e 'DejaVuSans.ttf') {
        print "Please download DejaVu fonts and put DejaVuSans.ttf file in\n";
        print "the same directory as this program.\n";
        print "http://dejavu-fonts.org/wiki/index.php?title=Download\n";
        exit(1);
    }
}

# Given year, month, day from `struct tm` (man 3 mktime),
# it constructs a YYYY-MM-DD string.
sub format_date {
    my ($y, $m, $d) = @_;
    return sprintf("%04d-%02d-%02d", $y+1900, $m+1, $d);
}

# Given the `struct tm` (man 3 mktime) as a 9-list (perldoc -f localtime),
# it constructs a YYYY-MM-DD string.
sub yyyymmdd_from_9list {
    my ($y, $m, $d) = @_[5,4,3];
    return format_date $y, $m, $d;
}

# This function returns a YYYY-MM-DD string for today.
sub today {
    return yyyymmdd_from_9list localtime
}

# This function constructs the 9-list (perldoc -f localtime) for a 
# date that was $months_ago months ago.
sub months_ago {
    my $months_ago = shift;
    my @date = @_;
    $date[4] -= $months_ago;
    return localtime mktime @date;
}

# Given feed data from feedburner's api (array of hashrefs), it finds
# the first date that had non-zero circulation.
# If no such date exists, it returns undef.
sub find_first_nonzero {
    my @feed_data = @_;
    return if $feed_data[0]->{circulation} != 0;
    my $prev_item;
    for my $item (@feed_data) {
        return $prev_item if $item->{circulation};
        $prev_item = $item;
    }
    return
}

# Given feed's name, this function finds the first date the
# feed had some subscribers, i.e., feed's start date.
sub find_start_date {
    my $feed = shift;
    print "Finding feed's start date...\n";
    my @ago = months_ago 6, localtime;
    my $end_date = today();
    while (1) {
        my $start_date = format_date @ago[5,4,3];

        print "Trying $start_date as start date...\n";
        my @feed_data = get_feed_data($feed, $start_date, $end_date);
        my $non_zero = find_first_nonzero(@feed_data);
        if ($non_zero) {
            print "Found $non_zero->{date} as start date!\n";
            return $non_zero->{date};
        }

        $end_date = yyyymmdd_from_9list @ago;
        @ago = months_ago 6, @ago;
    }
}

# This function returns an array of hashrefs of feeds data.
# Each hash contains 'reach', 'hits', 'date', and 'circulation' keys.
sub get_feed_data {
    my $raw_feed_data = get_raw_feed_data(@_);
    my $feed_data = XML::Simple->new->XMLin($raw_feed_data);
    if ($feed_data->{stat} ne "ok") {
        die $feed_data->{err}{msg}
    }
    return @{$feed_data->{'feed'}{'entry'}};
}

# This function formats the $feedburner_url and uses WWW::Mechanize
# to get the feed data via feedburner's API.
sub get_raw_feed_data {
    my ($feed, $start_date, $end_date) = @_;
    my $url = sprintf($feedburner_url, $feed, $start_date, $end_date);
    return WWW::Mechanize->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5')->get($url)->content;
}

# This function drops feed items when they can't fit in graph's width.
sub drop_data {
    my ($width, @data) = @_;
    my $len = $#data;
    my $delta = @data - $width;
    my @drop = map { int($len / $delta * $_) } 1..$delta;
    splice @data, $_, 1 for reverse @drop;
    return @data;
}

# This function duplicates feed items when there are not enough items
# to fill the whole graph.
sub dupe_data {
    my ($width, @data) = @_;
    my $len = $#data;
    my $delta = $width - @data;
    my @dupe = map { int($len / $delta * $_) } 1..$delta;
    splice @data, $_, 0, $data[$_] for reverse @dupe;
    return @data;
}

# This function draws the outline of the graph box where the green
# lines are drawn.
sub draw_outline {
    my ($gd, $grid, $xy, $bg) = @_;
    $gd->rectangle(40, 4, 482, 100, $grid);
    $gd->filledRectangle(41, 5, 481, 99, $bg);
    $gd->line(40, 4, 40, 100, $xy);
    $gd->line(38, 100, 482, 100, $xy);
}

# This function draws the grid lines.
sub draw_grid {
    my ($gd, $xy, $grid) = @_;

    # horizontal
    $gd->line(41, 26, 482, 26, $grid);
    $gd->line(38, 26, 40, 26, $xy);
    $gd->line(41, 63, 482, 63, $grid);
    $gd->line(38, 63, 40, 63, $xy);

    # vertical
    for (my $x = 77; $x <= 442; $x += 73) {
        $gd->line($x, 4, $x, 99, $grid);
        $gd->line($x, 100, $x, 102, $xy);
    }
}

# This function saves the $gd image to a file named
# "feed_name-start_date-end_date.png"
sub save_image {
    my ($gd, $feed, $start_date, $end_date, @data) = @_;

    my $filename = "$feed-$start_date-$end_date.png";
    $filename =~ s|/|_|g;

    open my $fh, '>', $filename or die $!;
    print $fh $gd->png;
    close $fh;

    print "Done. Image written to $filename\n";
}

# This function draws the date thingies on the x axis.
sub draw_date {
    my ($gd, $item, $text_color, $x) = @_;
    my @mons = qw/Nul Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/;

    my ($y, $m, $d) = $item->{date} =~ /(\d{4})-(\d{2})-(\d{2})/;
    $m = $mons[$m];
    my $text1 = sprintf("%s-", $m);
    my $text2 = sprintf("%s-%d", $m, $y);

    my @bounds = GD::Image->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, 0, 0, $text1);
    my $offset = $bounds[4];

    $gd->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, $x-$offset+2, 114, $text2);
}

# This function draws the feed usage numbers on the y axis.
sub draw_count {
    my ($gd, $count, $text_color, $y) = @_;

    my $text = int $count;

    my @bounds = GD::Image->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, 0, 0, $text);
    my $width = $bounds[4] - $bounds[6];

    $gd->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, 34-$width, $y+4, $text);
}

# This function creates the GD image and draws everything.
sub draw_feedburner_image {
    my ($feed, $start_date, $end_date, @data) = @_;

    print "Creating the awesome feedburner image.\n";

    my $gd = GD::Image->new(490, 120, 1);
    my $white  = $gd->colorAllocate(0xff, 0xff, 0xff);
    my $green  = $gd->colorAllocate(0x95, 0xcf, 0x9c);
    my $bg     = $gd->colorAllocate(0xf2, 0xf8, 0xfc);
    my $grid   = $gd->colorAllocate(0xcc, 0xce, 0xce);
    my $xy     = $gd->colorAllocate(0x80, 0x80, 0x80);
    my $alphagrid = $gd->colorAllocateAlpha(0xcc, 0xce, 0xce, 0x30);
    my $border = $gd->colorAllocate(0x68, 0x72, 0x79);
    my $text   = $gd->colorAllocate(0, 0 , 0);

    $gd->alphaBlending(1);
    $gd->filledRectangle(0, 0, 489, 119, $white);
    $gd->setAntiAliased($border);

    draw_outline($gd, $grid, $xy, $bg);

    my $t_height = 90;
    my $t_width = 441;
    my $max_circulation = max map { $_->{circulation} } @data;

    my $compress_factor = @data/$t_width;

    if ($compress_factor > 1) {
        @data = drop_data($t_width, @data);
    }
    elsif ($compress_factor < 1) {
        @data = dupe_data($t_width, @data);
    }

    my ($prev_x1, $prev_y1);
    my $x = 41;
    my %x_markers = (77 => 1, 150 => 1, 223 => 1, 296 => 1, 369 => 1, 442 => 1);
    for my $item (@data) {
        my $height = int($item->{circulation}/$max_circulation * $t_height);
        my ($x1, $y1, $x2, $y2) = ($x, 99, $x, 99-$height);
        $gd->line($x1, $y1, $x2, $y2, $green);
        if ($prev_x1) {
            $gd->line($prev_x1, $prev_y1, $x2, $y2, gdAntiAliased);
        }
        ($prev_x1, $prev_y1) = ($x1, $y2);

        if (exists $x_markers{$x}) {
            draw_date($gd, $item, $text, $x)
        }

        $x++;
    }

    draw_grid($gd, $xy, $alphagrid);
    draw_count($gd, 0,  $text, 100);
    draw_count($gd, $max_circulation * 74/90, $text, 26);
    draw_count($gd, $max_circulation * 37/90, $text, 63);
    save_image($gd, $feed, $start_date, $end_date);
}

# The main function, does everything.
sub main {
    check_dejavu_sans;

    my $feed = shift || usage();
    my $start_date = shift || find_start_date($feed);
    my $end_date = shift || today();

    unless ($start_date =~ /^\d{4}-\d{2}-\d{2}$/) {
        die "Invalid start date. Format: YYYY-MM-DD."
    }
    unless ($end_date =~ /^\d{4}-\d{2}-\d{2}$/) {
        die "Invalid end date. Format: YYYY-MM-DD."
    }

    print "Getting feed data for $feed from $start_date to $end_date\n";
    my @feed_data = get_feed_data($feed, $start_date, $end_date);

    draw_feedburner_image($feed, $start_date, $end_date, @feed_data);
}

main @ARGV;

Download: generate_feedburner_graph.perl

Let's test run it.

$ generate_feedburner_graph.pl catonmat
Finding feed's start date...
Trying 2009-05-17 as start date...
Trying 2008-11-17 as start date...
Trying 2008-05-17 as start date...
Trying 2007-11-17 as start date...
Trying 2007-05-17 as start date...
Found 2007-07-15 as start date!
Getting feed data for catonmat from 2007-07-15 to 2009-11-17
Creating the awesome feedburner image.
Done. Image written to catonmat-2007-07-15-2009-11-17.png

Here is the result:

catonmat feed statistics from 2007-07-15 to 2009-11-17
catonmat.net feed statistics from 2007-07-15 to 2009-11-17.

This looks divine. I love it!!!

As I was writing this I had the coolest idea to make a set of tools for probloggers. I added this idea to my idea-list and will try to make it happen. This tool could be the first in problogger tool suite!

Download "generate_feedburner_graph.pl":

Download: generate_feedburner_graph.perl
Downloaded: 2843 times.
Download url: http://www.catonmat.net/download/generate_feedburner_graph.perl

And finally, help me reach 10,000 subscribers! If you haven't yet subscribed, subscribe to my blog!