Linear Algebra January 04, 2010

MIT Linear Algebra, Lecture 4: A=LU Factorization

<- previous article next article ->

This is the fourth post in an article series about MIT's course "Linear Algebra". In this post I will review lecture four on factorizing a matrix A into a product of a lower-triangular matrix L and an upper-triangular matrix U, or in other words A=LU. The lecture also shows how to find the inverse of matrix product A·B, how to find the inverse of transposed matrix AT, and introduces permutation matrices.

Here are the previous lectures:

Lecture 4: A=LU Factorization

Lecture four starts with a small review of inverse matrices. Remember from previous lecture that the inverse of matrix A is matrix A-1 such that multiplying them together A·A-1 produces the identity matrix I.

Another key fact to remember is that matrix multiplication is associative (can change parenthesis) and that for square matrices the right-inverse is also the left-inverse A·A-1 = A-1·A = I.

Lecture then continues with finding the inverse of matrix product A·B. The answer is found by reasoning what should we multiply A·B to get the identity matrix. Let's try B-1·A-1. If this is the inverse of A·B, then multiplying it with A·B from the left and right sides should produce the identity matrix I. Let's test this.

From the right side: (A·B)·(B-1·A-1). Since we can rearrange parenthesis, this is the same as A·(B·B-1)·A-1. But B·B-1 is identity, therefore A·(B·B-1)·A-1 = A·I·A-1 = A·A-1 = I. Seems good so far.

Now the left side: (B-1·A-1)·(A·B). We rearrange parenthesis again, B-1·(A-1·A)·B and just like in the previous example, A-1·A is identity and B-1·I·B is B-1·B = I.

We have found the inverse of (A·B). It's (B-1·A-1).

Next, while we're at it, the lecture continues with finding the inverse of transposed matrix AT. It's found by reasoning again. Let's first look at the equation A·A-1 = I. We can transpose both sides of this equation and it will still hold. If we transpose the right side, we get the same identity matrix I, because IT=I. But what about the left-hand side? Transposing the left-hand side we get (A·A-1)T = (A-1)T·AT. Now left-hand side is still equal to right-hand side, therefore (A-1)T·AT = I. But from this equation it's instantly obvious that the inverse of AT is (A-1)T.

We can therefore note and remember that (AT)-1 = (A-1)T.

Finally the lecture moves on to today's key topic of A=LU decomposition. As usual, the topic is introduced by an example.

Let's take a look at this 2x2 matrix A:

And let's try to find the elementary matrix E21 that will eliminate the element at row 2, column 1. Multiplying the first row by 4 and subtracting it from the second row will produce a zero at 2, 1.

But look at the right-hand side. It's the upper-triangular matrix U (all elements below the diagonal are 0) that we were looking for!

Now if we multiply both sides by the inverse of E21, we get (E21)-1·E21·A = (E21)-1·U. But (E21)-1·E21 is identity, therefore A = (E21)-1·U. From this equation it's instantly obvious that the lower-triangular matrix L is nothing else but (E21)-1!

Another form of factorization is A = LDU, where D is the diagonal matrix that contains the pivots. For this example it would be:

Now imagine that we have some arbitrary 3x3 matrix A. To reduce matrix A into upper-triangular matrix U, we first eliminate the element at position 2,1, that is, we multiply A (from the left side) with elimination matrix E21. Then we eliminate the element at 3,1 by multiplying (from the left side) with elimination matrix E31. Finally we eliminate the element at 3,2 by multiplying with elimination matrix E32:

E32·E31·E21·A = U

It follows from this equation that the lower-triangular matrix is the inverse of E32·E31·E21, that is,

L = (E32·E31·E21)-1 = E21-1·E31-1·E32-1

We have found the factorization of a 3x3 matrix:

A = E21-1·E31-1·E32-1·U = L·U

The algorithm for finding matrices L and U should now be clear. First do the elimination to find matrix U, then invert the product of elimination matrices used for finding U to find L. Actually it's even easier, you don't even need to keep elimination matrices E, or find their inverse. You can just keep the multipliers used in each elimination step. Please see the video to find out how it works.

Next, the cost of elimination algorithm is discussed. How many steps does it take to go from matrix A to U? Turns out the elimination is O(n3) process (more carefully, it takes around n3/3 steps). (See my notes on MIT's Introduction to Algorithms for more info about O-notation.)

Since we sometimes need to do row exchanged to do elimination, the last ten minutes of lecture are spent on permutation matrices. Remember from lecture two that multiplying a matrix from the left side with a permutation matrix exchanges its rows.

The key facts about permutation matrices P are:

• The inverse of P is its transpose: P-1 = PT.
• There are n! permutation matrices for nxn matrices.

Here is the video of the fourth lecture:

Topics covered in lecture four:

• [01:20] What is the inverse of A·B?
• [05:20] What is the inverse of AT?
• [09:20] A=LU factorization for 2x2 matrices.
• [13:20] A=LDU factorization.
• [14:50] A=LU decomposition for 3x3 matrices.
• [18:30] Why is finding matrix L trivial?
• [27:25] How many operations does elimination take?
• [42:20] Permutation matrices.
• [47:20] What is the inverse of a permutation matrix?
• [48:20] How many P matrices for 3x3 and 4x4 matrices?

Here are my notes of lecture four:

My notes of linear algebra lecture 4 on the A=LU factorization.

Have fun with this lecture! The next post is going to be about transpose matrices, symmetric matrices, vector spaces, their subspaces and column spaces of matrices.

PS. This course is taught from Introduction to Linear Algebra textbook. Get it here:

Unix Shell December 23, 2009

A Unix Utility You Should Know About: lsof

<- previous article next article ->

This is the third post in the article series about Unix and Linux utilities that you should know about. In this post I will take you through the useful lsof tool. If netcat was called the Swiss Army Knife of Network Connections, then I'd call lsof the Swiss Army Knife of Unix debugging.

Lsof follows Unix philosophy closely. It does just one task and it does it perfectly -- it lists information about files opened by processes. An open file may be a regular file, a directory, a NFS file, a block special file, a character special file, a shared library, a regular pipe, a named pipe, a symbolic link, a socket stream, an Internet socket, a UNIX domain socket, and many others. Since almost everything in Unix is a file, you can imagine how incredibly useful lsof is!

See the first post on pipe viewer for the introduction to this article series. If you are interested in articles like this one, I suggest that you subscribe to my rss feed to receive my future posts automatically!

How to use lsof?

In this article I will try to present lsof based on as many use cases as I can think of. Let's start with the simplest (that you probably already know) and proceed to more complicated ones.

List all open files.

# lsof

Running lsof without any arguments lists all open files by all processes.

Find who's using a file.

# lsof /path/to/file

With an argument of a path to a file, lsof lists all the processes, which are using the file in some way.

You may also specify several files, which lists all the processes, which are using all the files:

# lsof /path/to/file1 /path/to/file2

Find all open files in a directory recursively.

# lsof +D /usr/lib

With the +D argument lsof finds all files in the specified directory and all the subdirectories.

Note that it's slower than the usual version with grep:

# lsof | grep '/usr/lib'

It's slower because +D first finds all the files and only then does the output.

List all open files by a user.

# lsof -u pkrumins

The -u option (think user) limits output of files opened only by user pkrumins.

You can use comma separated list of values to list files open by several users:

# lsof -u rms,root

This will list all the files that are open by users rms and root.

Another way to do the same is by using the -u option twice:

# lsof -u rms -u root

Find all open files by program's name.

# lsof -c apache

The -c option selects the listing of files for processes whose name begins with apache.

# lsof | grep foo

You can now write the shorter version:

# lsof -c foo

In fact, you can specify just the beginning part of the process name you're looking for:

# lsof -c apa

This will list all the open files by a processes whose starts with apa.

You can also specify several -c options to output open files by several processes:

# lsof -c apache -c python

This will list all open files by apache and python.

List all open files by a user OR process.

# lsof -u pkrumins -c apache

Lsof options can be combined. The default is to OR between options. It means it will combine outputs of -u pkrumins and -c apache producing a listing of all open files by pkrumins and all open files by apache.

List all open files by a user AND process.

# lsof -a -u pkrumins -c bash

Notice the -a option. It combines the options with AND. The output listing is files opened by bash, which is run under pkrumins user.

List all open files by all users EXCEPT root.

# lsof -u ^root

Notice the ^ character before root username. It negates the match and causes lsof print all open files by all users who are not root.

List all open files by the process with PID.

# lsof -p 1

The -p option (think PID) filters out open files by program's id.

Remember that you can select multiple PIDs by either comma separating the list or using multiple -p arguments:

# lsof -p 450,980,333

This selects processes with PIDs 450, 980 and 333.

List all open files by all the processes EXCEPT process with PID.

# lsof -p ^1

Here the negation operator ^ is used again. It inverts the list and does not include process with PID 1.

List all network connections.

# lsof -i

Lsof with -i option lists all processes with open Internet sockets (TCP and UDP).

List all TCP network connections.

# lsof -i tcp

The -i argument can take several options, one of them is tcp. The tcp option forces lsof to list only processes with TCP sockets.

List all UDP network connections.

# lsof -i udp

The udp option causes lsof to list processes with UDP sockets.

Find who's using a port.

# lsof -i :25

The :25 option to -i makes lsof find processes using TCP or UDP port 25.

You may also use service port name (found in /etc/services) rather than port number:

# lsof -i :smtp

Find who's using a specific UDP port.

# lsof -i udp:53

Similarly, to find who's using a TCP port, use:

# lsof -i tcp:80

Find all network activity by user.

# lsof -a -u hacker -i

Here the -a option combines -u and -i to produce listing of network file usage by user hacker.

List all NFS (Network File System) files.

# lsof -N

This option is easy to remember because -N is NFS.

List all Unix domain socket files.

# lsof -U

This option is also easy to remember because -U is Unix.

List all files for processes with a specific group id.

# lsof -g 1234

Process groups are used to logically group processes. This example finds all files opened by processes with PGID 1234.

List all files associated with specific file descriptors.

# lsof -d 2

This lists all files that have been opened as file descriptor 2.

You may also specify ranges of file descriptors:

# lsof -d 0-2

This would list all files with file descriptors 0, 1 and 2.

There are also many special values, such as mem, that lists memory-mapped files:

# lsof -d mem

Or txt for programs loaded in memory and executing:

# lsof -d txt

Output PIDs of processes using some resource.

# lsof -t -i

The -t option outputs only PIDs of processes. Used together with -i it outputs PIDs of all processes with network connections. It's easy to kill all processes that use network:

# kill -9 lsof -t -i

Repeat listing files.

# lsof -r 1

The -r option makes lsof repeatedly list files until interrupted. Argument 1 means repeat the listing every 1 second. This option is best combined with a narrower query such as monitoring user network file activity:

# lsof -r 1 -u john -i -a

How to install lsof?

Lsof comes preinstalled on many Unix systems. If your system doesn't have it, try to install it from the source.

BSD supplies its own utility that does similar things, it's called fstat.

For the full documentation of lsof see the man lsof page or type lsof -h for a small cheat sheet.

Have fun with lsof!

Linear Algebra December 16, 2009

MIT Linear Algebra, Lecture 3: Matrix Multiplication and Inverse Matrices

<- previous article next article ->

This is the third post in an article series about MIT's course "Linear Algebra". In this post I will review lecture three on five ways to multiply matrices, inverse matrices and an algorithm for finding inverse matrices called Gauss-Jordan elimination.

The first lecture covered the geometry of linear equations and the second lecture covered the matrix elimination.

Here is lecture three.

Lecture 3: Matrix Multiplication and Inverse Matrices

Lecture three starts with five ways to multiply matrices.

The first way is the classical way. Suppose we are given a matrix A of size mxn with elements aij and a matrix B of size nxp with elements bjk, and we want to find the product A·B. Multiplying matrices A and B will produce matrix C of size mxp with elements .

Here is how this sum works. To find the first element c11 of matrix C, we sum over the 1st row of A and the 1st column of B. The sum expands to c11 = a11·b11 + a12·b21 + a13·b31 + ... + a1n·bn1. Here is a visualization of the summation:

We continue this way until we find all the elements of matrix C. Here is another visualization of finding c23:

The second way is to take each column in B, multiply it by the whole matrix A and put the resulting column in the matrix C. The columns of C are combinations of columns of A. (Remember from previous lecture that a matrix times a column is a column.)

For example, to get column 1 of matrix C, we multiply A·(column 1 of matrix B):

The third way is to take each row in A, multiply it by the whole matrix B and put the resulting row in the matrix C. The rows of C are combinations of rows of B. (Again, remember from previous lecture that a row times a matrix is a row.)

For example, to get row 1 of matrix C, we multiply row 1 of matrix A with the whole matrix B:

The fourth way is to look at the product of A·B as a sum of (columns of A) times (rows of B).

Here is an example:

The fifth way is to chop matrices in blocks and multiply blocks by any of the previous methods.

Here is an example. Matrix A gets subdivided in four submatrices A1 A2 A3 A4, matrix B gets divided in four submatrices B1 B2 B3 B4 and the blocks get treated like simple matrix elements.

Here is the visualization:

Element C1, for example, is obtained by multiplying A1·B1 + A2·B3.

Next the lecture proceeds to finding the inverse matrices. An inverse of a matrix A is another matrix, such that A-1·A = I, where I is the identity matrix. In fact if A-1 is the inverse matrix of a square matrix A, then it's both the left-inverse and the right inverse, i.e., A-1·A = A·A-1 = I.

If a matrix A has an inverse then it is said to be invertible or non-singular.

Matrix A is singular if we can find a non-zero vector x such that A·x = 0. The proof is easy. Suppose A is not singular, i.e., there exists matrix A-1. Then A-1·A·x = 0·A-1, which leads to a false statement that x = 0. Therefore A must be singular.

Another way of saying that matrix A is singular is to say that columns of matrix A are linearly dependent (one ore more columns can be expressed as a linear combination of others).

Finally, the lecture shows a deterministic method for finding the inverse matrix. This method is called the Gauss-Jordan elimination. In short, Gauss-Jordan elimination transforms augmented matrix (A|I) into (I|A-1) by using only row eliminations.

Please watch the lecture to find out how it works in all the details:

Topics covered in lecture three:

• [00:51] The first way to multiply matrices.
• [04:50] When are we allowed to multiply matrices?
• [06:45] The second way to multiply matrices.
• [10:10] The third way to multiply matrices.
• [12:30] What is the result of multiplying a column of A and a row of B?
• [15:30] The fourth way to multiply matrices.
• [18:35] The fifth way to multiply matrices by blocks.
• [21:30] Inverses for square matrices.
• [24:55] Singular matrices (no inverse matrix exists).
• [30:00] Why singular matrices can't have inverse?
• [36:20] Gauss-Jordan elimination.
• [41:20] Gauss-Jordan idea A·I -> I·A-1.

Here are my notes of lecture three:

My notes of linear algebra lecture 3 on the matrix multiplication and inverses.

Have fun with this lecture! The next post is going to be about the A=LU matrix decomposition (also known as factorization).

PS. This course is taught from Introduction to Linear Algebra textbook. Get it here:

Computer Science December 14, 2009

Recursive Regular Expressions

The regular expressions we use in our daily lives are actually not that "regular." Most of the languages support some kind of extended regular expressions that are computationally more powerful than the "regular" regular expressions as defined by the formal language theory.

For instance, the so often used capture buffers add auxiliary storage to the regular expressions that allow them to match an arbitrary pattern repeatedly. Or look-ahead assertions that allow the regular expression engine to peek ahead before it making a decision. These extensions make regular expressions powerful enough to describe some context-free grammars.

The Perl programming language has an especially rich with regex engine. One of the engine's features is the lazy regular subexpressions. The lazy regular subexpressions are expressed as (??{ code }), where the "code" is arbitrary Perl code that gets executed when the moment this subexpression may match.

This allows us to construct something really interesting - we can define a regular expression that has itself in the "code" part. The result is a recursive regular expression!

One of the classical problems that a regular expression can't match is the language 0n1n, i.e., a string with a number of zeroes followed by an equal number of ones. Surprisingly, using the lazy regular subexpressions this problem becomes tractable!

Here is a Perl regular expression that matches 0n1n:

$regex = qr/0(??{$regex})?1/;

This regular expression matches a 0 followed by itself zero or one time, followed by a one. If the itself part doesn't match, then the string this regular expression matches is 01. If the itself part matches, the string this regular expression matches is 00($regex)?11, which is 0011 if$regex doesn't match or it's 000($regex)?111 if it matches, ..., etc. Here is a Perl program that matches 050000150000: #!/usr/bin/perl$str = "0"x50000 . "1"x50000;
$regex = qr/0(??{$regex})*1/;

if ($str =~ /^$regex$/) { print "yes, it matches" } else { print "no, it doesn't match" } Now let's look at the Yo Dawg regular expression in the picture above. Can you guess what it does? It matches a fully parenthesized expression such as (foo(bar())baz) or balanced parentheses ((()()())()).$regex = qr/
$$# (1) match an open paren ( ( # followed by [^()]+ # (3) one or more non-paren character | # OR (??{regex}) # (5) the regex itself )* # (6) repeated zero or more times$$                 # (7) followed by a close paren )
/x;

Here is how to think about this regular expression. For an expression to be fully parenthesized, it has to start with an open paren, so we match it (point (1) in the regex). It also has to end with close paren, so we match a close paren at the end (point (7)). Now we have to think what can be in-between the parens? Well, we can either have some text that is neither an open paren or closed paren (point (3)) OR we can have another fully parenthesized expression! (point (5)). And all this may be repeated either zero times (point (6)) to match the smallest fully parenthesized expression () or more times to match a more complex expression.

Without the /x flag (that allows multiline regexes), it can be written more compactly:

$regex = qr/$$([^()]+|(??{regex}))*$$/; But please don't use these regular expressions in production as they are too cryptic. Use Text::Balanced or Regexp::Common Perl modules. And finally, in Perl 5.10 you can use recursive capture buffers instead of lazy code subexpressions to achieve the same result. Here is a regular expression that matches 0n1n and uses the recursive capture buffer syntax (?N): my$rx = qr/(0(?1)*1)/

The (?1)* says "match the first group zero or more times," where the first group is the whole regular expression.

You can try to rewrite the regular expression that matches balanced parens as an exercise.

Have fun!

The New Catonmat December 10, 2009

50 ideas for the new catonmat.net website

I have been working on the new catonmat.net website for quite some time now and I have gathered a very long list ideas of what a modern website should have. They include ideas from psychology, search engine optimization, social media, programming, and the best web design practices.

Since I started to use github last week, all the new catonmat code will be there. I already created a catonmat repository and I am going to be pushing code there daily.

I love to share my ideas, so here are the first 50 of them. I am going to share more later in the upcoming article series "Designing the new catonmat.net together with me." You should subscribe to my blog here, if you are interested in this topic and haven't subscribed yet.

The ideas are in no particular order. If some of them seem fuzzy or unclear, please ask me to clarify in the comments.

1. 301 Redirect Table.

The idea is to maintain a table that will be checked against the request URLs. If the request URL is in the table, it gets redirected to a new destination URL with the 301 HTTP header, which forwards the link juice from one URL to another. This is necessary because some forums break URLs in multiple lines and the resulting clickable URL is 404. If I didn't do a 301 to the new location, this link would be lost. Anyone who clicked it, would end up on the 404 error page. Another example is if someone links to a mistyped URL. Anyone who clicks it ends up being unsatisfied, at 404 page. With a 301 table I can quickly fix these problems. Here is a concrete example: Someone links to www.catonmat.net/artikle when they wanted to link to www.catonmat/article. I'd simply insert an entry to 301 redirect /artikle to /article and everyone's happy.

2. 404 Error Log.

I absolutely have to know what pages trigger 404 errors so that I can fix them ASAP with the 301 redirect table. There may be many false alarms, therefore I have to add filters to the 404 error log to ignore some spammy patterns.

3. A Great 404 Page.

If someone ends up on a non-existing page, log it as described in #2 and then explain to the visitor why the page he visited was not found. Suggest him or her or it to view 5 latest post titles, or 5 most popular posts, or most popular downloads, or something else (still have to think about it when I implement it).

Tracebacks are usually displayed among comments. This is very messy. There is no place for them among comments. The idea is to have them on a separate page like www.catonmat.net/post-title/trackbacks. Also add bulk moderation as spammers love to exploit trackback. (Perhaps drop trackbacks altogether as they are of very little value.)

5. A Page With Last Comments.

Currently I am displaying only the 10 last comments on the right sidebar. This is insufficient. I want to see all the comments (like www.reddit.com/comments) on a single page, so that even if I am away from computer for several days, I can easily navigate through them and reply/hide/edit/delete in-place. Perhaps add a RSS feed for comments page.

6. Comment Statistics.

I need to tell my blog readers who are the most active people on my website. This will stimulate people to be more active. Therefore I should add a comment statistics with most popular commentators, link to their pages without nofollow to give them link juice. Also add statistics of the most commented articles.

7. Add Incoming Search Term Statistics On All Pages.

When people come from Google or Yahoo, save the query they used and display it on the page. This way I will always know what terms were searched for on each of the pages, without having to write complex queries.

Currently I have sloppy download statistics. I want nice graphs and want to see the most popular downloads by day, month, etc. This should be written as a statistics framework as traffic statistics, article statistics, delicious statistics and download statistics will all have more or less same graphs.

9. Public Statistics Section For del.icious.com Bookmarks.

I want to see how often my posts get bookmarked on del.icious.com. I also want nice graphs for them by day, week, month, etc. I also want a list of people who bookmarked my posts the most often, and tags they used. Make this public and also very modular so other people can reuse the code and put it on their sites. Reward the most frequent bookmarkers with links to their sites.

10. Insert Beautiful Images To Give Rest To Eyes.

Insert some beautiful landscape images after serveral paragraphs of text to give rest to eyes and give the reader positive emotions.

An image from ginnerobot's photostream on Flickr.

11. Public Traffic Statistics.

I want the super statistics for my website. And make them public. I want to see which is the most popular article today, which was yesterday. I want nice traffic graphs and trends.

Find who's tweeting about catonmat and put all these tweets on a separate page. Find who's the most active catonmat tweeter and make this person stand out. Link to twitter profiles.

13. Integrate GitHub Directly In Catonmat

Integrate GitHub with status updates, friend count, directly in catonmat. Would have to write some kind of a scraper or use their API, if it's usable for this purpose.

Currently I have linear comments, which are no good. To find whom someone replied to I have to scroll through the comments. This doesn't make any sense. I have to add threaded comments. This will also engage users is more conversations.

I don't like links to comments in a form www.catonmat.net/article/#comment-55. I want them to be in form www.catonmat.net/article/comment-55 so that each comment (or thread of comments) is on its own page. This way, if linked to a comment, person will precisely have that comment loaded. Browsers can misbehave in case of anchor links like #comment-55 and I don't like that.

There are just a few things comments need:

• Quote someone.
• Emphesize part of a comment in bold or italic.
• Share code fragment (auto syntax highlight it).

That's it. Nothing else is necessary, no stupid HTML comments.

My website is getting a lot of spam from Russia with comments in Russian. I am sure no one would write comments in Russian on my site. Add a filter to leave just the English comments. All other are spam.

Gravatar is at www.gravatar.com. It's a map from emails to jpegs of user icons. This will make people stand out. Also display the gravatar user icons on comment statistics (point above). Having gravatars will emotionally associates you with commentators and the next time you see a gravatar you can predict the nature of the comment (depending if you had positive emotions or negative before).

19. Add More "Contact Me" Options Near "About me" on The Right Sidebar.

You want to get yourself out. If you don't do it, no one will come after you. Add links to Facebook, Linked In, Twitter, Plurk, GitHub, FriendFeed, perhaps some other sites. Show the email as an image and add a link to my IRC channel #catonmat on irc.freenode.net. Initially show only Twitter, Facebook and GitHub. Add an arrow down clickable image. After clicked displays all other contact options.

20. Snippets Page.

I have been writing and collecting various programming snippets. I want to have them in a central database on my site. Instead of putting them on some foreign service like GitHub's Gist or Pastebin of some kind, I want to keep them on my website in my database so that I can easily modify them in a single place and integrate within posts.

An image from digital cat's photostream on Flickr.

21. Add Revision Control For All The Pages.

Currently if I edit a page, the previous page is lost and I can't see the changes. It's crucial to keep the changes as sometimes I need to get something from a year ago. I have to add revision control like wikipedia does. The URL scheme could be www.catonmat.net/article/revisions - displays all available revisions, and www.catonmat.net/article/revisions/r1/r2 displays changes between r1 and r2, but I have to think about it a bit more.

22. Create Tiny URLs For Articles On My Own Site

I don't want to depend on some service that may go down. Make short urls like http://catonmat.net/abc, where abc is [a-zA-Z0-9] this will give me 238328 URLs, more than enough. I could even go for something shorter.

23. Optimize Catonmat Load Speed To Maximum.

Use the page-speed FireBug plugin to optimize site loading speed. Page-speed is at http://code.google.com/speed/page-speed/. Things to be optimized include minified javascript, minified html, gzip compressed content, maximizing caching, use asynchronous js loading of google analytics and others.

24. Make The Posts More Available

Currently the posts are only available as HTML documents. I should try to convert them to PDF and put them to Scribd. I have to think about consequences as Scribd may show up on search engines at a higher ranking position than catonmat iself, which would have drastic impact on the traffic. Saving to PDF has a benefit that it's a single file. If saved as HTML, the browser creates a folder with tons of images. Can't be shared easily and is clutter. Perhaps offer a PDF download for all articles.

25. Make Posts Printer Friendly.

Create a nice CSS template for printing articles. At the end of the article include URLs to all the mentioned resources. Add an option to choose whether to print comments or not. URL structure could be www.catonmat.net/article/print

Add "Share this" widget and perhaps "Reddit this", "Digg this", "Stumble this", etc., buttons. This should be based on referrer as I don't want to show "Reddit this" to a Digg visitor as there is a holy war between Reddit and Digg. Also add "Tweet this" button somewhere.

27. Utilize The New Google Feature Of Displaying Named Anchors In Search Results.

See this post: http://googleblog.blogspot.com/2009/09/jump-to-information-you-want-right-from.html Some of my posts utilize this (10 Awk Tips, Tricks and Pitfalls), but I need to utilize it more.

28. Highlight The Python Code As In SQLalchemy Documentation.

I like how the code is highlighted in SQLalchemy documentation. I am not sure if they are using Pygments or not, but I'll try to make mine exactly the same. Example: http://www.sqlalchemy.org/docs/05/ormtutorial.html

29. Create Pagination For Posts/Categories/Tags As In Flickr.com

I like the style of Flickr's pagination. Got to implement the same on catonmat. Example: http://www.flickr.com/photos/frijole/

30. Have Pages Open In A New Window By Default.

I feel that opening links in a new window would keep visitors on the website longer. I haven't tested but I will A/B this. Update: This is not a good idea, won't implement it.

An image from paraflyer's photostream on Flickr.

31. Investigate What Do Various <a rel="..."> Do.

There are a bunch of different relations like rel="bookmark", rel="prev", rel="next". This could improve the website navigation greatly. More info here: http://www.w3schools.com/TAGS/att_a_rel.asp

32. Perhaps Remove The Article Date Altogether.

I have noticed myself that if I search for something and I find an article from 2004, I want to look for something fresher. Got to A/B test this and see how long do people stay on the site.

33. Add An IP Ban List.

Sometimes spammers use the same IPs. Rather than iptables -I INPUT --src IP -j REJECT them, just block them at application level, 404 all pages, or redirect them elsewhere.

34. Automatically Translate All Pages To All Languages Via Google Translate

Sometimes people search for something in their own language and can't find it. Perhaps they don't know English term and therefore can't find what they wanted. If I automatically translate all pages to all languages, people would end up on my website and find what they were looking for. The URL scheme for this could be: www.catonmat.net/xx/..., where xx is two letter international language code.

Dustin Curtis did an experiment where he tried various link texts to invite readers to follow him on Twitter. The first version was "I'm on Twitter", this got 4.70% click through rate. The last version was "You Should follow me on Twitter <here>". This had 12.81% CTR, which is a massive improvement.

People ask me a lot of questions over email. I could answer all the questions on my website instead of email. This way everyone could always find all my answers.

I get asked the same questions over and over again over email. For example,

• What do I have to know for Google interview?
• What books do I read?
• How to learn C++/C/Python/Algorithms?
• etc.

Instead of sending the answer over email, I could send these people to FAQ page where they could find my latest answers (as they change over time).

This idea has the highest priority. A knowledge database is a section on my website where I can write everything that I learned each day. This should be accompanied with a desktop application that has a hotkey that instantly brings up input dialog and I can type what I just learned. I had a database like this in 2002-2004 and my knowledge literally went exponential. I wrote out key facts that I learned each day and could easily locate as necessary.

39. Add A Miniblog For Quick Articles.

Sometimes I have some cool idea or quick hack that I want to share, but as I am used to writing large and well thought out articles, I can't post the quick hacks and my thoughts don't get shared. A miniblog would allow me to share even the smallest thoughts that I have.

I love various smart programming quotes. I should make them more accessible, make them searchable by author/text.

An image from paraflyer's photostream on Flickr.

41. Integrate LaTeX In My Posts.

As I am sometimes writing about maths, I need to integrate LaTeX directly in my posts. I should not forget to do SEO on images it generates - instead of having some ridiculous <img src="latex-generator?q=$\begin{bmatrix}1&2\\3&4\end{bmatrix}$>" I should have it generate an image "<img src="matrix.jpg" title="Matrix">". This way people will be able to find my posts via image search if they search for some mathematical terms like "Matrix".

42. Try Out How The Articles Look With Text-Align: Justify.

I currently have the default left-aligned text. Books and journals have it justified. Not sure how it would look on my website. Have to try it out. A/B. I read somewhere that this may feel confusing to dyslexic users.

43. Have In-Line Code Snippets And Variables Stand Out From Rest Of The Text.

A nice example of this is Github's blog. They have gray background for constant-width things. See this: http://github.com/blog/530-how-we-made-github-fast

44. Have <a> Links Change Background Color On Hover.

I love how mattt.me has done it: http://mattt.me/articles/ I want this.

45. Have Only One Category Per Post But Multiple Tags.

A post should have only one single category. The category must strictly define the main theme of the article. A post can have multiple tags. Tags define topics discussed in the post.

46. Add Crazyegg Tracking For The First Month.

Must add crazyegg to track how the users navigate the site, where they click and what they visit. Optimize based on the results. http://crazyegg.com/

As my site is getting more popular and popular among programmers, it may be a good idea to add a job board. Joel Spolsky made a million \$ in a year with job boards. As the popularity of my site increases, I might make a few dollars out of it as well.

48. Use Statcounter and Google Analytics

This is obvious. Statcounter is for real-time data. The free version is limited to statistics of last 500 visits. Google Analytics is for keeping long term statistics with a day of delay in updates. Load them asynchronously: http://googlecode.blogspot.com/2009/12/google-analytics-launches-asynchronous.html

49. Form Input Fields And Text Fields Should Change Border Color On Focus.

Users should know what field they are focused to without trying to find the cursor.

50. Mandatory Alt Attributes For Images, Title Attributes For Links.

The rationale of this feature is that if images don't get loaded or if blind people are listening to the content of my posts via text-to-speech engine, they should know what the image displays. Alt tags also help the search engines to classify images. The same goes for title attributes for links.

51. Optimize Meta Description For Categories And Tags

Category and tag pages usually have meta description as "Posts in <category>". This is unsatisfactory. I want a description of category and if it's missing or is too short, I want it to have some post titles in it, to make it unique. The same for tags, make meta tags "Posts in <tag>: post title1, post title2, ..." not exceeding 20 words or so.

An image from dsevilla's photostream on Flickr.

The next post in this series will be about Python libraries that I use for the new catonmat and the structure of the site. You can also follow the development on GitHub (I started importing code only yesterday so there is not much yet).