This is the fourth post in the article series "Vim Plugins You Should Know About". This time I am going to introduce you to a plugin called "snipmate.vim".
If you are intrigued by this topic, I suggest that you subscribe to my posts! For the introduction and first post in this article series, follow this link  Vim Plugins You Should Know About, Part I: surround.vim.
Snipmate.vim is probably the best snippets plugin for vim. A snippet is a piece of oftentyped text or programming construct that you can insert into your document by using a trigger followed by a <tab>. It was written by Michael Sanders. He says he modeled this plugin after TextMate's snippets.
Here is an example usage of snipmate.vim. If you are a C programmer, then one of the most often used forms of a loop is "for (i=0; i<n; i++) { ... }". Without snippets you'd have to type this out every time. Even though it takes just another second, these seconds can add to minutes throughout the day and minutes can add to hours over longer periods of time. Why waste your time this way? With snippets you can type just "for<tab>" and snipmate will insert this whole construct in your source code automatically! If "i" or "n" weren't the variable you wanted to use, you can now use <tab> and <shifttab> to jump to next/previous item in the loop and rename them!
Michael also created an introduction video for his plugin where he demonstrates how to use it. Check it out:
How to install snipmate.vim?
To get the latest version:
 1. Download snipmate.zip.
 2. Extract snipmate.zip to ~/.vim (on Unix/Linux) or ~\vimfiles (on Windows).
 3. Run :helptags ~/.vim/doc (on Unix/Linux) or :helptags ~/vimfiles/doc (on Windows) to rebuild the tags file (so that you can read :help snipmate.)
 4. Restart Vim.
The plugin comes with predefined snippets for more than a dozen languages (C, C++, HTML, Java, JavaScript, Objective C, Perl, PHP, Python, Ruby, Tcl, Shell, HTML, Mako templates, LaTeX, VimScript). Be sure to check out the snippet files in the "snippets" directory under your ~/.vim or ~\vimfiles directory.
If you need to define your own snippets (which you most likely will need), create a new file named "languagefoo.snippets" in the "snippets" directory. For example, to define your own snippets for C language, you'd create a file called "cfoo.snippets" and place snippets in it.
To learn about snipmate snippet syntax, type ":help snipmate" and locate the syntax section in the help file.
Have Fun!
Have fun with this time saving plugin!
Famous Perl OneLiners Explained, Part II: Line Numbering
This is the second part of a sevenpart article on famous Perl oneliners. In this part I will create various oneliners for line numbering. See part one for introduction of the series.
Famous Perl oneliners is my attempt to create "perl1line.txt" that is similar to "awk1line.txt" and "sed1line.txt" that have been so popular among Awk and Sed programmers.
The article on famous Perl oneliners will consist of at least seven parts:
 Part I: File spacing.
 Part II: Line numbering (this part).
 Part III: Calculations.
 Part IV: String creation. Array creation.
 Part V: Text conversion and substitution.
 Part VI: Selective printing and deleting of certain lines.
 Part VII: Handy regular expressions.
 Part VIII: Release of perl1line.txt.
 Part IX: Release of Perl OneLiners ebook.
The oneliners will make heavy use of Perl special variables. A few years ago I compiled all the Perl special variables in a single file and called it Perl special variable cheatsheet. Even tho it's mostly copied out of perldoc perlvar, it's still handy to have in front of you. Print it!
Awesome news: I have written an ebook based on this article series. Check it out:
And here are today's oneliners:
Line Numbering
9. Number all lines in a file.
perl pe '$_ = "$. $_"'
As I explained in the first oneliner, "p" causes Perl to assume a loop around the program (specified by "e") that reads each line of input into the " $_ " variable, executes the program and then prints the " $_ " variable.
In this oneliner I simply modify " $_ " and prepend the " $. " variable to it. The special variable " $. " contains the current line number of input.
The result is that each line gets its line number prepended.
10. Number only nonempty lines in a file.
perl pe '$_ = ++$a." $_" if /./'
Here we employ the "action if condition" statement that executes "action" only if "condition" is true. In this case the condition is a regular expression "/./", which matches any character except newline (that is, it matches a nonempty line); and the action is " $_ = ++$a." $_" ", which prepends variable " $a " incremented by one to the current line. As we didn't use strict pragma, $a was created automatically.
The result is that at each nonempty line " $a " gets incremented by one and prepended to that line. And at each empty line nothing gets modified and the empty line gets printed as is.
11. Number and print only nonempty lines in a file (drop empty lines).
perl ne 'print ++$a." $_" if /./'
This oneliner uses the "n" program argument that places the line in " $_ " variable and then executes the program specified by "e". Unlike "p", it does not print the line after executing code in "e", so we have to call "print" explicitly to get it printed.
The oneliner calls "print" only on lines that have at least one character in them. And exactly like in the previous oneliner, it increments the line number in variable " $a " by one for each nonempty line.
The empty lines simply get ignored and never get printed.
12. Number all lines but print line numbers only nonempty lines.
perl pe '$_ = "$. $_" if /./'
This oneliner is similar to oneliner #10. Here I modify the " $_ " variable that holds the entire line only if the line has at least one character. All other lines (empty ones) get printed without line numbers.
13. Number only lines that match a pattern, print others unmodified.
perl pe '$_ = ++$a." $_" if /regex/'
Here we again use the "action if condition" statement but the condition in this case is a pattern (regular expression) "/regex/". The action is the same as in oneliner #10. I don't want to repeat, see #10 for explanation.
14. Number and print only lines that match a pattern.
perl ne 'print ++$a." $_" if /regex/'
This oneliner is almost exactly like #11. The only difference is that it prints numbered lines that match only "/regex/".
15. Number all lines, but print line numbers only for lines that match a pattern.
perl pe '$_ = "$. $_" if /regex/'
This oneliner is similar to the previous oneliner and to oneliner #12. Here the line gets its line number prepended if it matches a /regex/, otherwise it just gets printed without a line number.
16. Number all lines in a file using a custom format (emulate cat n).
perl ne 'printf "%5d %s", $., $_'
This oneliner uses the formatted print "printf" function to print the line number together with line. In this particular example the line numbers are left aligned on 5 char boundary.
Some other nice format strings are "%5d" that rightaligns line numbers on 5 char boundary and "%05d" that zerofills and rightjustifies the line numbers.
Here my Perl printf cheat sheet might come handy that lists all the possible format specifiers.
17. Print the total number of lines in a file (emulate wc l).
perl lne 'END { print $. }'
This oneliner uses the "END" block that Perl probably took as a feature from Awk language. The END block gets executed after the program has executed. In this case the program is the hidden loop over the input that was created by the "n" argument. After it has looped over the input, the special variable " $. " contains the number of lines there was in the input. The END block prints this variable. The " l " parameter sets the output record separator for "print" to a newline (so that we didn't have to print "$.\n").
Another way to do the same is:
perl le 'print $n=()=<>'
This is a tricky one, but easy to understand if you know about Perl contexts. In this oneliner the " ()=<> " part causes the <> operator (the diamond operator) to evaluate in list context, that causes the diamond operator to read the whole file in a list. Next, " $n " gets evaluated in scalar context. Evaluating a list in a scalar context returns the number of elements in the list. Thus the " $n=()=<> " construction is equal to the number of lines in the input, that is number of lines in the file. The print statement prints this number out. The " l " argument makes sure a newline gets added after printing out this number.
This is the same as writing the following, except longer:
perl le 'print scalar(()=<>)'
And completely obvious version:
perl le 'print scalar(@foo=<>)'
Yet another way to do it:
perl ne '}{print $.'
This oneliner uses the eskimo operator "}{" in conjunction with "n" command line argument. As I explained in oneliner #11, the "n" argument forces Perl to assume a " while(<>) { } " loop around the program. The eskimo operator forces Perl to escape the loop, and the program turns out to be:
while (<>) { }{ # eskimo operator here print $.; }
It's easy to see that this program just loops over all the input and after it's done doing so, it prints the " $. ", which is the number of lines in the input.
18. Print the number of nonempty lines in a file.
perl le 'print scalar(grep{/./}<>)'
This oneliner uses the "grep" function that is similar to the grep Unix command. Given a list of values, " grep {condition} " returns only those values that match condition. In this case the condition is a regular expression that matches at least one character, so the input gets filtered and the "grep{/./}" returns all lines that were non empty. To get the number of characters we evaluate the list in scalar context and print the result. (As I mentioned in the previous oneliner list in scalar context evaluates to number of elements in the list).
A golfer's version of this oneliner would be to replace "scalar()" with " ~~ " (double bitwise negate), thus it can be shortened:
perl le 'print ~~grep{/./}<>'
This can be made even shorter:
perl le 'print~~grep/./,<>'
19. Print the number of empty lines in a file.
perl lne '$a++ if /^$/; END {print $a+0}'
Here I use variable $a
to count how many empty lines have I encountered. Once I have finished looping over all the lines, I print the value of $a in the END block. I use " $a+0 " construction to make sure " 0 " gets output if no lines were empty.
I could have also modified the previous oneliner:
perl le 'print scalar(grep{/^$/}<>)'
Or written it with " ~~ ":
perl le 'print ~~grep{/^$/}<>'
These last two versions are not as effective, as they would read the whole file in memory. Where as the first one would do it line by line.
20. Print the number of lines in a file that match a pattern (emulate grep c).
perl lne '$a++ if /regex/; END {print $a+0}'
This oneliner is basically the same as the previous one, except it increments the line counter $a by one in case a line matches a regular expression /regex/.
Perl oneliners explained ebook
I've now written the "Perl OneLiners Explained" ebook based on this article series. I went through all the oneliners, improved explanations, fixed mistakes and typos, added a bunch of new oneliners, added an introduction to Perl oneliners and a new chapter on Perl's special variables. Please take a look:
Have Fun!
Have fun with these oneliners. These were really easy this time. The next part is going to be about various calculations.
Can you think of other numbering operations that I did not include here?
Holy smokes! It has now been two years since I started this blog. It seems almost like yesterday when I posted the "A Year of Blogging" article. And now it's two! With this post I'd like to celebrate the 2nd birthday and share various interesting statistics that I managed to gather.
During this year (July 20, 2008  July 26, 2009) I wrote 55 posts, which received around 1000 comments. According to StatCounter and Google Analytics my blog was visited by 1,050,000 unique people who viewed 1,700,000 pages. Wow, 1 million visitors! That's very impressive!
Here is a Google Analytics graph of monthly page views for the last year (click for a larger version):
In the last three months I did not manage to write much and you can see how that reflected on the page views. A good lesson to be learned is to be persistent and keep writing articles consistently.
Here is the same graph with two years of data, showing a complete picture of my blog's growth:
I like this seemingly linear growth. I hope it continues the same way the next year!
Here are the top 5 referring sites that my visitors came from:
 reddit (292,147 visitors).
 stumbleupon (69,575 visitors).
 hacker news (47,595 visitors).
 delicious (27,109 visitors).
 dzone (15,898 visitors).
And here are the top 5 referring blogs:
 Scott Klarr's blog (7,848 visitors).
 My Free Science Online blog (6,284 visitors).
 Eric Wendelin's blog (1,693 visitors).
 Andy Lester's PerlBuzz blog (1,356 visitors).
 Jurgen Appelo's blog (1,001 visitors)
I found that just a handful of blogs had linked to me during this year. The main reason, I suspect, is that I do not link out much myself... It's something to improve upon.
If you remember, I ended the last year's post with the following words (I had only 1000 subscribers at that time):
I am setting myself a goal of reaching 5000 subscribers by the end of the next year of blogging (July 2009)! I know that this is very ambitious goal but I am ready to take the challenge!
I can proudly say that I reached my ambitious goal! My blog now has almost 7000 subscribers! If you have not yet subscribed, click here to do it!
Here is the RSS subscriber graph for the whole two years:
Several months ago I approximated the subscriber data with an exponent function and it produced a good fit. Probably if I had continued writing articles at the same pace I did three months ago, I'd have over 10,000 subscribers now.
Anyway, let's now turn to the top 10 most viewed posts:
 1. My job interview experience at Google (144,400 views).
 2. A Unix Utility You Should Know About: Pipe Viewer (124,570 views)
 3. Code Reuse in Google Chrome Browser (115,750 views).
 4. Famous Awk OneLiners Explained, Part I (87,721 views).
 5. MIT Introduction to Algorithms, Part I: Analysis of Algorithms (79,536 views).
 6. A Unix Utility You Should Know About: Netcat (51,354 views).
 7. Famous Sed OneLiners Explained, Part I (49,068 views).
 8. Vim Plugins You Should Know About, Part I: surround.vim (41,388 views).
 9. 10 Awk Tips, Tricks and Pitfalls (29,689 views).
 10. Low Level Bit Hacks You Absolutely Must Know (22,916 views).
The article that I liked the most myself but which didn't make it to top ten was the "Set Operations in Unix Shell". I just love this Unix stuff I did there.
I am also very proud for the following three article series that I wrote:
 1. Review of MIT's Introduction to Algorithms course (14 parts).
 2. Famous Awk OneLiners Explained (4 parts: 1, 2, 3, 4).
 3. Famous Sed OneLiners Explained (3 parts: 1, 2, 3)
Finally, here is a list of ideas that I have thought for the third year of blogging:
 Publish three ebooks on Awk OneLiners, Sed OneLiners and Perl OneLiners.
 Launch mathematics, physics and general science blog.
 Write about mathematical foundations of cryptography and try to implement various cryptosystems and cryptography protocols.
 Publish my review of MIT's Linear Algebra course (in math blog, so the main topic of catonmat stays computing).
 Publish my review of MIT's Physics courses on Mechanics, Electromagnetism, and Waves (in physics blog).
 Publish my notes on how I learned the C++ language.
 Write more about computer security and ethical hacking.
 Write several book reviews.
 Create a bunch of various fun utilities and programs.
 Create at least one useful web project.
 Add a knowledge database to catonmat, create software to allow easy publishing to it.
 If time allows, publish reviews of important computer science publications.
I'll document everything here as I go, so if you are interested in these topics stay with me by subscribing to my rss feed!
And to make things more challenging again, I am setting a new goal for the next year of blogging. The goal is to reach 20,000 subscribers by July 2010!
Hope to see you all on my blog again! Now it's time for this delicious cake:
MIT's Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms
This is a happy and sad moment at the same time  I have finally reached the last two lectures of MIT's undergraduate algorithms course. These last two lectures are on a fairly new area of algorithm research called "cache oblivious algorithms."
Cacheoblivious algorithms take into account something that has been ignored in all the lectures so far, particularly, the multilevel memory hierarchy of modern computers. Retrieving items from various levels of memory and cache make up a dominant factor of running time, so for speed it is crucial to minimize these costs. The main idea of cacheoblivious algorithms is to achieve optimal use of caches on all levels of a memory hierarchy without knowledge of their size.
Cacheoblivious algorithms should not be confused with cacheaware algorithms. Cacheaware algorithms and data structures explicitly depend on various hardware configuration parameters, such as the cache size. Cacheoblivious algorithms do not depend on any hardware parameters. An example of cacheaware (not cacheoblivious) data structure is a BTree that has the explicit parameter B, the size of a node. The main disadvantage of cacheaware algorithms is that they are based on the knowledge of the memory structure and size, which makes it difficult to move implementations from one architecture to another. Another problem is that it is very difficult, if not impossible, to adapt some of these algorithms to work with multiple levels in the memory hierarchy. Cacheoblivious algorithms solve both problems.
Lecture twentytwo introduces the terminology and notation used in cacheoblivious algorithms, explains the difference between cacheoblivious and cacheaware algorithms, does a simple memory analysis of several simple algorithms and culminates with a cacheoblivious algorithm for matrix multiplication.
The final lecture twentythree is the most difficult in the whole course and shows cacheoblivious binary search trees and cacheoblivious sorting called funnel sort.
Use this supplementary reading material by professor Demaine to understand the material better: Cacheoblivious algorithms and data structures (.pdf).
Lecture 22: Cache Oblivious Algorithms I
Lecture twentytwo starts with an introduction to the modern memory hierarchy (CPU cache L1, L2, L3, main memory, disk cache, etc.) and with the notation and core concepts used in cacheoblivious algorithms.
A powerful result in cacheoblivious algorithm design is that if an algorithm is efficient on two levels of cache, then it's efficient on any number of levels. Thus the study of cacheobliviousness can be simplified to twolevel memory hierarchy, say the CPU cache and main memory, where the accesses to cache are instant but are orders of magnitude slower to main memory. Therefore the main question cacheoblivious algorithm analysis tries to address is how many memory transfers (MTs) does a problem of size N take. The notation used for this is MT(N). For an algorithm to be efficient, the number of memory transfers should be as small as possible.
Next the lecture analysis the number of memory transfers for basic array scanning and array reverse algorithms. Since array scanning is consequential, N elements can be processed with O(N/B) accesses, where B is the block size  number of elements that are automatically fetched as Nth element is accessed. That is MT(N) = O(N/B) for array scanning. The same bound holds for reversing an array, since it can be viewed two scans  one from the beginning and one from the end.
Next it's shown that the classical binary search (covered in lecture 3) is not cache efficient, but order statistics problem (covered in lecture 6) is cache efficient.
Finally the lecture describes a cache efficient way to multiply matrices by storing them blockwise in memory.
You're welcome to watch lecture twentytwo:
Topics covered in lecture twentytwo:
 [00:10] Introduction and history of cacheoblivious algorithms.
 [02:00] Modern memory hierarchy in computers: Caches L1, L2, L3, main memory, disk cache.
 [06:15] Formula for calculating the cost to access a block of memory.
 [08:18] Amortized cost to access one element in memory.
 [11:00] Spatial and temporal locality of algorithms.
 [13:45] Twolevel memory model.
 [16:30] Notation: total cache size M, block size B, number of blocks M/B.
 [20:40] Notation: MT(N)  number of memory transfers of a problem of size N.
 [21:45] Cacheaware algorithms.
 [22:50] Cacheoblivious algorithms.
 [28:35] Blocking of memory.
 [32:45] Cacheoblivious scanning algorithm (visitor pattern).
 [36:20] Cacheoblivious ArrayReverse algorithm.
 [39:05] Memory transfers in classical binary search algorithm.
 [43:45] Divide and conquer algorithms.
 [45:50] Analysis of memory transfers in order statistics algorithm.
 [01:00:50] Analysis of classical matrix multiplication (with row major, column major memory layout).
 [01:07:30] Cache oblivious matrix multiplication.
Lecture twentytwo notes:
Lecture 23: Cache Oblivious Algorithms II
This was probably the most complicated lecture in the whole course. The whole lecture is devoted to two subjects  cacheoblivious search trees and cacheoblivious sorting.
While it's relatively easy to understand the design of cacheoblivious way of storing search trees in memory, it's amazingly difficult to understand the cacheefficient sorting. It's called funnel sort which is basically an nway merge sort (covered in lecture 1) with special cacheoblivious merging function called kfunnel.
You're welcome to watch lecture twentythree:
Topics covered in lecture twentythree:
 [01:00] Cacheoblivious static search trees (binary search trees).
 [09:35] Analysis of static search trees.
 [18:15] Cacheaware sorting.
 [19:00] Sorting by repeated insertion in binary tree.
 [21:40] Sorting by binary merge sort.
 [31:20] Sorting by Nway mergesort.
 [36:20] Sorting bound for cacheoblivious sorting algorithms.
 [38:30] Cacheoblivious sorting.
 [41:40] Definition of KFunnel (cacheoblivious merging).
 [43:35] Funnel sort.
 [54:05] Construction of KFunnel.
 [01:03:10] How to fill buffer in kfunnel.
 [01:07:30] Analysis of fill buffer.
Lecture twentythree notes:


Have fun with the cache oblivious algorithms! I'll do a few more posts that will summarize all these lectures and highlight key ideas.
If you loved this, please subscribe to my blog!
On the Linear Time Algorithm For Finding Fibonacci Numbers
In this article I'd like to show how the theory does not always match the practice. I am sure you all know the linear time algorithm for finding Fibonacci numbers. The analysis says that the running time of this algorithm is O(n). But is it still O(n) if we actually run it? If not, what is wrong?
Let's start with the simplest linear time implementation of the Fibonacci number generating algorithm in Python:
def LinearFibonacci(n): fn = f1 = f2 = 1 for x in xrange(2, n): fn = f1 + f2 f2, f1 = f1, fn return fn
The theory says that this algorithm should run in O(n)  given the nth Fibonacci number to find, the algorithm does a single loop up to n.
Now let's verify if this algorithm is really linear in practice. If it's linear then the plot of n vs. running time of LinearFibonacci(n) should be a line. I plotted these values for n up to 200,000 and here is the plot that I got:
Note: Each data point was averaged over 10 calculcations.
Oh no! This does not look linear at all! It looks quadratic! I fitted the data with a quadratic function and it fit nearly perfectly. Do you know why the seemingly linear algorithm went quadratic?
The answer is that the theoretical analysis assumed that all the operations in the algorithm executed in constant time. But this is not the case when we run the algorithm on a real machine! As the Fibonacci numbers get larger, each addition operation for calculating the next Fibonacci number "fn = f1 + f2 " runs in time proportional to the length of the previous Fibonacci number. It's because these huge numbers no longer fit in the basic units of computation in the CPU; so a big integer library is required. The addition of two numbers of length O(n) in a big integer library takes time of O(n).
I'll show you that the running time of the reallife linear Fibonacci algorithm really is O(n^2) by taking into account this hidden cost of bigint library.
So at each iteration i we have a hidden cost of O(number of digits of f_{i}) = O(digits(f_{i})). Let's sum these hidden cost for the whole loop up to n:
Now let's find the number of digits in the nth Fibonacci number. To do that let's use the wellknown Binet's formula, which tells us that the nth Fibonacci number f_{n} can be expressed as:
It is also wellknown that the number of digits in a number is integer part of log_{10}(number) + 1. Thus the number of digits in the nth Fibonacci number is:
Thus if we now sum all the hidden costs for finding the nth Fibonacci number we get:
There we have it. The running time of this "linear" algorithm is actually quadratic if we take into consideration that each addition operation runs proportionally to the length of addends.
Next time I'll show you that if the addition operation runs in constant time, then the algorithm is truly linear; and later I will do a similar analysis of the logarithmic time algorithm for finding Fibonnaci numbers that uses this awesome matrix identity:
Don't forget to subscribe if you are interested! It's well worth every byte!