This article is part of the article series "MIT Introduction to Algorithms."
<- previous article next article ->

MIT AlgorithmsThis is the thirteenth post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review lectures twenty and twenty-one on parallel algorithms. These lectures cover the basics of multithreaded programming and multithreaded algorithms.

Lecture twenty begins with a good overview of multithreaded programming paradigm, introduces to various concepts of parallel programming and at the end talks about the Cilk programming language.

Lecture twenty-one implements several multithreaded algorithms, such as nxn matrix addition, nxn matrix multiplication, and parallel merge-sort. It then goes into great detail to analyze the running time and parallelism of these algorithms.

Lecture 20: Parallel Algorithms I

The lecture starts with an introduction to a particular model of parallel computation called dynamic multithreading. It's easiest to understand this model with an example. Here is a multithreaded algorithm that computes Fibonacci numbers:

1   if n < 2:                     | thread A
2     return n                    | thread A
3   x = spawn Fibonacci(n-1)      | thread A
4   y = spawn Fibonacci(n-2)      | thread B
5   sync                          | thread C
6   return x + y                  | thread C

# this is a really bad algorithm, because it's
# exponential time.
# see lecture three for four different
# classical algorithms that compute Fibonacci numbers

(A thread is defined to be a maximal sequence of instructions not containing the parallel control statements spawn, sync, and return, not something like Java threads or Posix threads.)

I put the most important keywords in bold. They are "spawn" and "sync".

The keyword spawn is the parallel analog of an ordinary subroutine call. Spawn before the subroutine call in line 3 indicates that the subprocedure Fibonacci(n-1) can execute in parallel with the procedure Fibonacci(n) itself. Unlike an ordinary function call, where the parent is not resumed until after its child returns, in the case of a spawn, the parent can continue to execute in parallel with the child. In this case, the parent goes on to spawn Fibonacci(n-2). In general, the parent can continue to spawn off children, producing a high degree of parallelism.

A procedure cannot safely use the return values of the children it has spawned until it executes a sync statement. If any of its children have not completed when it executes a sync, the procedure suspends and does not resume until all of its children have completed. When all of its children return, execution of the procedure resumes at the point immediately following the sync statement. In the Fibonacci example, the sync statement in line 5 is required before the return statement in line 6 to avoid the anomaly that would occur if x and y were summed before each had been computed.

The spawn and sync keywords specify logical parallelism, not "actual" parallelism. That is, these keywords indicate which code may possibly execute in parallel, but what actually runs in parallel is determined by a scheduler, which maps the dynamically unfolding computation onto the available processors.

The lecture then continues with a discussion on how a parallel instruction stream can be viewed as a directed acyclic graph G = (V, E), where threads make up the set V of vertices, and each spawn and return statement makes up an edge in E. There are also continuation edges in E that exist between statements.

In Fibonacci example there are three threads: lines 1, 2, 3 make up the thread A, line 4 is thread B and lines 5, 6 are thread C.

Here is how a graph of Fibonacci(4) computation looks like:

Multithreading Directed Acyclic Graph

A dag representing multithreaded computation of Fibonacci(4).

Here the threads are show as circles, and each group of threads belonging to the same procedure are surrounded by a rounded rectangle. Downward edges are spawn dependencies, horizontal edges represent continuation dependencies within a procedure, and upward edges are return dependencies. Every computation starts with a single initial thread, and ends with a single final thread.

So to compute Fib(4), first Fib(3) and Fib(2) have to be computed. Fib(4) spawns one thread to compute Fib(3) and another thread to compute Fib(2) and sync'es. In order to compute Fib(3), values of Fib(2) and Fib(1) have to be known, and to compute Fib(2) values of Fib(1) and Fib(0) have to be known. Fib(3) and Fib(2) spawn their threads and sync. This way the computation unwinds.

Next, the lecture introduces some concepts for measuring performance of multithreaded algorithms.

First some time measures:

  • Tp - running time of an algorithm on p processors,
  • T1 - running time of algorithm on 1 processor, and
  • T as critical path length, that is the longest time to execute the algorithm on infinite number of processors.

Based on these measures, the speedup of a computation on p processors is T1/Tp, that is how much faster a p processor machine executes the algorithm than a one processor machine. If the speedup is O(p), then we say that the computation has a linear speedup. If, however, speedup > p, then we call it a super-linear speedup. The maximum possible speedup is called parallelism and is computed by T1/T.

After these concepts the lecture puts forward the concept of scheduling and greedy scheduler.

A scheduler maps computation to p processors.

The greedy scheduler schedules as much as it can at every time step. On a p-processor machine every step can be classified into two types. If there are p or more threads ready to execute, the scheduler executes any p of them (this step is also called a complete step). If there are fewer than p threads, it executes all of them (called incomplete step). The greedy scheduler is an offline scheduler and it's 2-competitive (see lecture 13 on online/offline algorithms and the meaning of competitive algorithm).

The lecture ends with an introduction to Cilk parallel, multithreaded programming language. It uses a randomized online scheduler. It was used to write two chess programs *Socrates and Cilkchess.

See Charles Leiseron's handout for a much more comprehensive overview of dynamic multithreading: A Minicourse on Dynamic Multithreaded Algorithms (pdf).

You're welcome to watch lecture twenty:

Topics covered in lecture twenty:

  • [01:33] Parallel algorithms.
  • [02:33] Dynamic multi-threading.
  • [03:15] A multithreaded algorithm for fibonnaci numbers.
  • [04:50] Explanation of spawn and sync.
  • [06:45] Logical parallelism.
  • [08:00] Multithreaded computation.
  • [12:15] Fibonacci(4) computation graph, and its concepts - init thread, spawn edge, continuation edge, return edge, final thread.
  • [17:45] Edges: spawn, return, continuation.
  • [19:40] Performance measures: running time on p processors Tp, work (serial time) T1, critcial path length (longest path) Tinfinity.
  • [24:15] Lower bounds on Tp (running time on p processors).
  • [29:35] Speedup, linear speedup, superlinear speedup.
  • [32:55] Maximum possible speedup.
  • [36:40] Scheduling.
  • [38:55] Greedy scheduler.
  • [43:40] Theorem by Grant and Brent (bound on Tp for greedy scheduler).
  • [46:20] Proof of Grant, Brent theorem.
  • [56:50] Corollary on greedy scheduler's linear speedup.
  • [01:02:20] Cilk programming language.
  • [01:06:00] Chess programs: *Socrates, Clikchess.
  • [01:06:30] Analysis of *Socrates running time on 512 processors.

Lecture twenty notes:

MIT Algorithms Lecture 20 Notes Thumbnail. Page 1 of 2.
Lecture 20, page 1 of 2: dynamic multithreading, multithreaded computation, performance measures, speedup, scheduling, greedy scheduler.

MIT Algorithms Lecture 20 Notes Thumbnail. Page 2 of 2.
Lecture 20, page 2 of 2: grant, brent theorem, cilk, socrates, cilkchess.

Lecture 21: Parallel Algorithms II

Lecture twenty-one is all about real multithreaded algorithms and their analysis.

It starts with a hugly important problem of multithreaded matrix multiplication: given two nxn matrices A and B, write a multithreaded algorithm Matrix-Multiply that multiplies them, producing matrix C. The natural approach is to use divide and conquer method from lecture three, and formulate the problem as follows:

Multithreaded Matrix Multiplication

Each nxn matrix can be expressed as 8 multiplications and 4 additions of (n/2)x(n/2) submatrices. So the first thing we need is Matrix-Add algorithm that adds two matrices in parallel.

Here is the parallel Matrix-Add algorithm:

Matrix-Add(C, T, n):
   // Adds matrices C and T in-place, producing C = C + T
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = C[1, 1] + T[1, 1] 
     partition C and T into (n/2)x(n/2) submatrices
     spawn Matrix-Add(C11, T11, n/2) 
     spawn Matrix-Add(C12, T12, n/2) 
     spawn Matrix-Add(C21, T21, n/2) 
     spawn Matrix-Add(C22, T22, n/2) 

The base case for this algorithm is when matrices are just scalars, then it just adds the only element of both matrices. In all other cases, it partitions matrices C and T into (n/2)x(n/2) matrices and spawns four subproblems.

Now we can implement the Matrix-Multiply algorithm, by doing 8 multiplications in parallel and then calling Matrix-Add to do 4 additions.

Here is the parallel Matrix-Multiply algorithm:

Matrix-Multiply(C, A, B, n):
   // Multiplies matrices A and B, storing the result in C.
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = A[1, 1] · B[1, 1] 
     allocate a temporary matrix T[1...n, 1...n] 
     partition A, B, C, and T into (n/2)x(n/2) submatrices 
     spawn Matrix-Multiply(C12,A11,B12, n/2) 
     spawn Matrix-Multiply(C21,A21,B11, n/2) 
     spawn Matrix-Multiply(C22,A21,B12, n/2) 
     spawn Matrix-Multiply(T11,A12,B21, n/2) 
     spawn Matrix-Multiply(T12,A12,B22, n/2) 
     spawn Matrix-Multiply(T21,A22,B21, n/2) 
     spawn Matrix-Multiply(T22,A22,B22, n/2) 
     Matrix-Add(C, T, n)

The lecture then proceeds to calculating how good the algorithm is, turns out the parallelism (ratio of time of algorithm running on a single processor T1 to running on infinite number of processors Tinf) for doing 1000x1000 matrix multiplication is 107, which the parallel algorithm is fast as hell.

Matrix-Multiply used a temporary matrix which was actually unnecessary. To achieve higher performance, it's often advantageous for an algorithm to use less space, because more space means more time. Therefore a faster algorithm can be added that multiplies matrices and adds them in-place. See lecture at 33:05 for Matrix-Multiply-Add algorithm. The parallelism of this algorithm for 1000x1000 matrix multiplication is 106, smaller than the previous one, but due to fact that no temporary matrices had to be used, it beats Matrix-Multiply algorithm in practice.

The lecture then proceeds to parallel sorting. The classical sorting algorithms were covered in lectures four and five. This lecture parallelizes the Merge-Sort algorithm.

Here is the parallel merge-sort algorithm:

Parallel-Merge-Sort(A, p, r): // sort A[p...r]
  if p < r:
    q = floor((p+r)/2)
    spawn Parallel-Merge-Sort(A, p, q)
    spawn Parallel-Merge-Sort(A, q+1, r)
    Merge(A, p, q, r) // merge A[p...q] with A[q+1...r]

Instead of recursively calling Parallel-Merge-Sort for subproblems, we spawn them, thus utilizing the power of parallelism. After doing analysis turns out that due to fact that we use the classical Merge() function, the parallelism is just lg(n). That's puny!

To solve this problem, a Parallel-Merge algorithm has to be written. The lecture continues with presenting a Parallel-Merge algorithm, and after that does the analysis of the same Parallel-Merge-Sort algorithm but with Parallel-Merge function. Turns out this algorithm has parallelism of n/lg2(n), that is much better. See lecture at 51:00 for Parallel-Merge algorithm.

The best parallel sorting algorithm know to date has the parallelism of n/lg(n).

You're welcome to watch lecture twenty-one:

Topics covered in lecture twenty-one:

  • [00:35] Multithreaded algorithms.
  • [01:32] Multithreaded matrix multiplication.
  • [02:05] Parallelizing with divide and conquer.
  • [04:30] Algorithm Matrix-Multiply, that computes C = A*B.
  • [10:13] Algorithm Matrix-Add, that computes C = C+T.
  • [12:45] Analysis of running time of Matrix-Multiply and Matrix-Add.
  • [19:40] Analysis of critical path length of Matrix-Multiply and Matrix-Add.
  • [26:10] Parallelism of Matrix-Multiply and Matrix-Add.
  • [33:05] Algorithm Matrix-Multiply-Add, that computes C = C + A*B.
  • [35:50] Analysis of Matrix-Multiply-Add.
  • [42:30] Multithreaded, parallel sorting.
  • [43:30] Parallelized Merge-Sort algorithm.
  • [45:55] Analysis of multithreaded Merge-Sort.
  • [51:00] Parallel-Merge algorithm.
  • [01:00:30] Analysis of Parallel-Merge algorithm.

Lecture twenty-one notes:

MIT Algorithms Lecture 21 Notes Thumbnail. Page 1 of 2.
Lecture 21, page 1 of 2: multithreaded algorithms, matrix multiplication, analysis (critical path length, parallelism).

MIT Algorithms Lecture 21 Notes Thumbnail. Page 2 of 2.
Lecture 21, page 2 of 2: paralel sorting, merge sort, parallel merge, analysis.

Have fun with parallel, multithreaded algorithms! The next post is going to be on cache oblivious algorithms! They are a class of algorithms that exploit all the various caches in modern computer architecture to make them run notably faster.

JavaScript: The Good PartsI found a really nice video lecture on JavaScript. I'm a JavaScript journeyman myself so I decided to review it to learn something new. I have written about JavaScript in the past -- the previous post on JavaScript was "Learning JavaScript through Video Lectures".

This lecture is given by JavaScript guru Doug Crockford. He's the author of JSON, JSMin JavaScript minifier and JSLint.

The talk is based on Douglas's recent book "JavaScript: The Good Parts". It's excellent.

The lecture begins with a brief intro of why JavaScript is the way it is and how it came to be so. An interesting point Doug makes is that JavaScript is basically Scheme, except with Java syntax. He talks about the bad and good parts of JavaScript and gives a few examples of common JavaScript problems, their solutions and patterns. Douglas also talks about how he discovered JSON. After the talk there is a 13 minute Q and A.

You're welcome to watch the video lecture. It's 1 hour long and it will definitely make your understanding of JavaScript better:

Here are some interesting points from the lecture:

[09:57] JavaScript was influenced by Scheme, Java and Perl. From Scheme it borrowed lambda functions and loose typing. From Java it took most of the syntax, and from Perl some of its regular expressions. And finally, it derived the idea of prototypal inheritance and dynamic objects from Self. See my previous post on JavaScript for explanation of prototypal inheritance.

[11:38] JavaScript bad parts:

  • Global variables. Global variables make it harder to run independent subprograms in the same program. If the subprograms happen to have global variables that share the same names, then they will interfere with each other and likely fail, usually in difficult to diagnose ways.
  • Newlines get converted into semicolons. JavaScript has a mechanism that tries to correct faulty programs by automatically inserting semicolons. It sometimes inserts semicolons in places where they are not welcome. For example:
        status: true
    returns undefined because JavaScript inserts ';' after return.Correct way:
    return {
        status: true

  • Operator typeof is not very helpful. For example, "typeof null" is "object", "typeof [1,2,3]" is also "object".
  • Operator + adds numbers and concatenates. The + operator can add or concatenate. Which one it does depends on the types of the parameters. If either operand is an empty string, it produces the other operand converted to a string. If both operands are numbers, it produces the sum. Otherwise, it converts both operands to strings and concatenates them. This complicated behavior is a common source of bugs. If you intend + to add, make sure that both operands are numbers.
  • Operators == and != do type coercion. Use of the === and !== operators is always preferred.
  • Too many falsy values. 0, Nan, '' (empty string), false, null, undefined are all false.

[17:25] for ... in operator mixes inherited functions with the desired data members (it does deep reflection). Use object.hasOwnProperty to filter out names that belong to object itself:

for (name in object) {
    if (object.hasOwnProperty(name)) {

[22:00] Javascript good parts:

  • Lambda. Enough said.
  • Dynamic objects. Just add a property to an object, or remove it, no need for classes and derivation to create a similar object.
  • Loose Typing. No need for type declarations.
  • Object Literals. {} for object literals, [] for array literals and // for regexp literals.

[23:00] Two schools of inheritance - classical and prototypal. Prototypal inheritance means that objects can inherit properties directly from other objects. The language is class-free.

[24:35] Realization of prototypal inheritance in JavaScript:

if (typeof Object.create !== 'function') {
    Object.create = function (o) {
        function F() {}
        F.prototype = o;
        return new F();

Now a newObject can be created by inheriting from oldObject:

newObject = Object.create(oldObject);

[26:05] Example of global variables and why they are bad and how to solve the problem by using closures.

var my_thing = function () {
    var names = ["zero", "one", ... ];
    return function(n) {
        return names[n];

[29:00] There are four ways to create a new object in JavaScript:

  • Object literal -- var object = { }.
  • New -- var object = new Object()
  • Object.create -- var object = Object.create(old_object).
  • Call another constructor (use different inheritance model).

[42:42] JSLint. JSLint defines a professional subset of JavaScript. JSLint will hurt your feelings.

[52:00] Q/A: Does strict mode change the behavior or does it take things out? -- Can't have "with" in strict mode, changes the way eval() works, changes error handling.

[53:00] Q/A: What's going on with DOM? -- Ajax libraries fix DOM, these changes should be propagated back into DOM API.

[55:30] Q/A: How do you spell lambda in JavaScript? -- function.

[55:54] Q/A: How to solve global variable problem? -- Each of compilation unit should be isolated, but there should be a way how they can introduce (link) to each other.

[56:30] Q/A: How do JavaScript objects differ from hash tables, they seem the same to me?

[57:23] Q/A: What's wrong with HTML 5 and web apps? -- They are doing too much and there are way too many of them.

[59:10] Q/A: How come JSON and JavaScript have almost the same syntax? -- Doug admits he forgot to include Unicode Line Separator (LS) and Paragraph Separator (PS) control codes as whitespace chars.

[01:00:32] Q/A: Why does JSON require quotes around the property names? -- Three reasons: 1. Wanted to align with Python where quotes are required, 2. Wanted to make the grammar of standard simpler, 3. JavaScript has stupid reserved word policy.

[01:02:40] Q/A: Are there any prospects for adding concurrency to the language? -- Definitely no threads. Could be some kind of a messaging model.

If you liked this talk, I recommend that you get Doug's book:

Happy javascripting! ;)

This article is part of the article series "Perl One-Liners Explained."
<- previous article next article ->

Perl One LinersHi all! I am starting yet another article series here. Remember my two articles on Famous Awk One-Liners Explained and Famous Sed One-Liners Explained? They have received more than 150,000 views total and they attract a few thousand new visitors every week. Inspired by their success, I am going to create my own perl1line.txt file and explain every single oneliner in it. I hope it becomes as famous as awk1line.txt and sed1line.txt files.

A tiny intro to those unfamiliar with the awk1line.txt and sed1line.txt files -- they contain around 80 useful awk and sed one-liners for doing various text file manipulations. Some examples are double spacing a file, numbering the lines, doing various text substitutions, etc. They were compiled by Eric Pement. Kudos to him!

The article will be divided in seven parts or more parts. In parts 1 - 6 I will create the one-liners and explain them, and in the last part I will release the perl1line.txt file. I found that splitting the article in many parts is much easier to get it written, that's why I do it.

Here is the general plan:

The one-liners will make heavy use of Perl special variables. Luckily, a few years ago I compiled all the Perl special vars in a single file and called it Perl special variable cheat-sheet. Even tho it's mostly copied out of perldoc perlvar, it's still handy to have in front of you. I suggest that you print it.

Awesome news: I have written an e-book based on this article series. Check it out:

Other than that, I can't wait to start writing the article, so here I go:

File Spacing

1. Double space a file.

perl -pe '$\="\n"'

This one-liner double spaces a file. There are three things to explain in this one-liner. The "-p" and "-e" command line options, and the "$\" variable.

First let's start with the "-e" option. The "-e" option can be used to enter a Perl program directly in the command line. Typically you don't want to create source files for every small program. By using "-e" you can handily specify the program to execute on the command line.

Next the "-p" switch. Specifying "-p" to a Perl program causes it to assume the following loop around your program:

while (<>) {
    # your program goes here
} continue {
    print or die "-p failed: $!\n";

This construct loops over all the input, executes your code and prints the value of "$_". This way you can effectively modify all or some lines of input. The "$_" variable can be explained as an anonymous variable that gets filled with the good stuff.

The "$\" variable is similar to ORS in Awk. It gets appended after every "print" operation. Without any arguments "print" prints the contents of "$_" (the good stuff).

In this one-liner the code specified by "-e" is '$\="\n"', thus the whole program looks like this:

while (<>) {
    $\ = "\n";
} continue {
    print or die "-p failed: $!\n";

What happens is that after reading each line, "$_" gets filled with it (including the existing line's newline), the "$\" gets set to a newline itself and "print" is called. As I already mentioned, without any arguments "print" prints the contents of "$_" and "$\" gets appended at the end. The result is that each line gets printed unmodified and it's followed by the "$\" which has been set to newline. The input has been double-spaced.

There is actually no need to set "$\" to newline on each line. It was just the shortest possible one-liner that double-spaced the file. Here are several others that do the same:

perl -pe 'BEGIN { $\="\n" }'

This one sets the "$\" to newline just once before Perl does anything (BEGIN block gets executed before everything else).

perl -pe '$_ .= "\n"'

This one-liner is equivalent to:

while (<>) {
    $_ = $_ . "\n"
} continue {
    print or die "-p failed: $!\n";

It appends another new-line at the end of each line, then prints it out.

The cleanest and coolest way to do it is probably use the substitution "s///" operator:

perl -pe 's/$/\n/'

It replaces the regular expression "$" that matches at the end of line with a newline, effectively adding a newline at the end.

2. Double space a file, except the blank lines.

perl -pe '$_ .= "\n" unless /^$/'

This one-liner double spaces all lines that are not completely empty. It does it by appending a newline character at the end of each line that is not blank. The "unless" means "if not". And "unless /^$/" means "if not 'beginning is end of line'". The condition "beginning is end of line" is true only for lines that contain the newline character.

Here is how it looks when expanded:

while (<>) {
    if ($_ !~ /^$/) {
        $_ .= "\n"
} continue {
    print or die "-p failed: $!\n";

A better test that takes spaces and tabs on the line into account is this one:

perl -pe '$_ .= "\n" if /\S/'

Here the line is matched against "\S". "\S" is a regular expression that is the inverse of "\s". The inverse of "\s" is any non-whitespace character. The result is that every line that has at least one non-whitespace character (tab, vertical tab, space, etc) gets double spaced.

3. Triple space a file.

perl -pe '$\="\n\n"'


perl -pe '$_.="\n\n"'

They are the same as one-liner #1, except that two new-lines get appended after each line.

4. N-space a file.

perl -pe '$_.="\n"x7'

This one-liner uses inserts 7 new lines after each line. Notice how I used '"\n" x 7' to repeat the newline char 7 times. The "x" operator repeats the thing on the left N times.

For example:

perl -e 'print "foo"x5'

would print "foofoofoofoofoo".

5. Add a blank line before every line.

perl -pe 's//\n/'

This one-liner uses the "s/pattern/replacement/" operator. It substitutes the first pattern (regular expression) in the "$_" variable with the replacement. In this one-liner the pattern is empty, meaning it matches any position between chars (and in this case it's the position before first char) and replaces it with "\n". The effect is that a newline char gets inserted before the line.

6. Remove all blank lines.

perl -ne 'print unless /^$/'

This one-liner uses "-n" flag. The "-n" flag causes Perl to assume to following loop around your program:

while (<>) {
    # your program goes here

What happens here is that each line gets read by the diamond "<>" operator and is placed in the dolar underscore special variable "$_". At this moment you are free to do with it whatever you want. You specify that in your main program text.

In this one-liner the main program is "print unless /^$/", it gets inserted in the while loop above and the whole Perl program becomes:

while (<>) {
    print unless /^$/

Unraveling it further:

while (<>) {
    print $_ unless $_ =~ /^$/

This one-liner prints all non-blank lines.

A few other ways to do the same:

perl -lne 'print if length'

This one uses the "-l" command line argument. The "-l" automatically chomps the input line (basically gets rid of newline at the end). Next the line is tested for its length. If there are any chars left, it evals to true and the line gets printed.

perl -ne 'print if /\S/'

This one-liner behaves differently than the "print unless /^$/" one-liner. The "print unless /^$/" one-liner prints lines with spaces and tabs, this one doesn't.

7. Remove all consecutive blank lines, leaving just one.

perl -00 -pe ''

Ok, this is really tricky, ain't it? First of all it does not have any code, the -e is empty. Next it has a silly -00 command line option. This command line option turns paragraph slurp mode on. A paragraph is text between two newlines. All the other newlines get ignored. The paragraph gets put in "$_" and the "-p" option prints it out.

Later I found a shorter version of this one-liner:

perl -00pe0

8. Compress/expand all blank lines into N consecutive ones.

perl -00 -pe '$_.="\n"x4'

This one-liner combines the previous one and one-liner #4. It slurps lines paragraph wise, then appends (N-1) new-line. This one-liner expands (or compresses) all new-lines to 5 ("\n" x 4 prints four, and there was one at the end of paragraph itself, so 5).

Perl one-liners explained e-book

I've now written the "Perl One-Liners Explained" e-book based on this article series. I went through all the one-liners, improved explanations, fixed mistakes and typos, added a bunch of new one-liners, added an introduction to Perl one-liners and a new chapter on Perl's special variables. Please take a look:

Have Fun!

Have fun with these file spacing one-liners. The next part is going to be about line numbering and calculations one-liners.

Can you think of other file spacing operations that I did not include here?

This article is part of the article series "Unix Utilities You Should Know About."
<- previous article next article ->

Unix UtilitiesThis is the second post in the article series about Unix utilities that you should know about. In this post I will introduce you to the netcat tool or simply nc.

Netcat is often referred to as a "Swiss Army knife" utility, and for a good reason. Just like the multi-function usefulness of the venerable Swiss Army pocket knife, netcat's functionality is as helpful. Some of its features include port scanning, transferring files, port listening and it can be used a backdoor.

In 2006 netcat was ranked #4 in "Top 100 Network Security Tools" survey, so it's definitely a tool to know.

See the first post on pipe viewer for the introduction to this article series. If you feel like you are interested in this stuff, I suggest that you subscribe to my rss feed to receive my future posts automatically.

How to use nc?

Let's start with a few very simple examples and build up on those.

If you remember, I said that netcat was a Swiss Army knife. What would a Swiss Army knife be if it also wasn't a regular knife, right? That's why netcat can be used as a replacement of telnet:

$ nc 80

It's actually much more handy than the regular telnet because you can terminate the connection at any time with ctrl+c, and it handles binary data as regular data (no escape codes, nothing).

You may add "-v" parameter for more verboseness, and two -v's (-vv) to get statistics of how many bytes were transmitted during the connection.

Netcat can also be used as a server itself. If you start it as following, it will listen on port 12345 (on all interfaces):

$ nc -l -p 12345

If you now connect to port 12345 on that host, everything you type will be sent to the other party, which leads us to using netcat as a chat server. Start the server on one computer:

# On a computer A with IP
$ nc -l -p 12345

And connect to it from another:

# On computer B
$ nc 12345

Now both parties can chat!

Talking of which, the chat can be turned to make two processes talk to each other, thus making nc do I/O over network! For example, you can send the whole directory from one computer to another by piping tar to nc on the first computer, and redirecting output to another tar process on the second.

Suppose you want to send files in /data from computer A with IP to computer B (with any IP). It's as simple as this:

# On computer A with IP
$ tar -cf - /data | nc -l -p 6666

# On computer B
$ nc 6666 | tar -xf -

Don't forget to combine the pipeline with pipe viewer from previous article in this series to get statistics on how fast the transfer is going!

A single file can be sent even easier:

# On computer A with IP
$ cat file | nc -l -p 6666

# On computer B
$ nc 6666 > file

You may even copy and restore the whole disk with nc:

# On computer A with IP
$ cat /dev/hdb | nc -l -p 6666

# On computer B
$ nc 6666 > /dev/hdb

Note: It turns out that "-l" can't be used together with "-p" on a Mac! The solution is to replace "-l -p 6666" with just "-l 6666". Like this:

$ nc -l 6666

# nc now listens on port 6666 on a Mac computer

An uncommon use of netcat is port scanning. Netcat is not the best tool for this job, but it does it ok (the best tool is nmap):

$ nc -v -n -z -w 1 1-1000 
(UNKNOWN) [] 445 (microsoft-ds) open
(UNKNOWN) [] 139 (netbios-ssn) open
(UNKNOWN) [] 111 (sunrpc) open
(UNKNOWN) [] 80 (www) open
(UNKNOWN) [] 25 (smtp) : Connection timed out
(UNKNOWN) [] 22 (ssh) open

The "-n" parameter here prevents DNS lookup, "-z" makes nc not to receive any data from the server, and "-w 1" makes the connection timeout after 1 second of inactivity.

Another uncommon behavior is using netcat as a proxy. Both ports and hosts can be redirected. Look at this example:

$ nc -l -p 12345 | nc 80

This starts a nc server on port 12345 and all the connections get redirected to If you now connect to that computer on port 12345 and do a request, you will find that no data gets sent back. That's correct, because we did not set up a bidirectional pipe. If you add another pipe, you can get the data back on another port:

$ nc -l -p 12345 | nc 80 | nc -l -p 12346

After you have sent the request on port 12345, connect on port 12346 to get the data.

Probably the most powerful netcat's feature is making any process a server:

$ nc -l -p 12345 -e /bin/bash

The "-e" option spawns the executable with it's input and output redirected via network socket. If you now connect to the host on port 12345, you may use bash:

$ nc localhost 12345
ls -las
total 4288
   4 drwxr-xr-x 15 pkrumins users    4096 2009-02-17 07:47 .
   4 drwxr-xr-x  4 pkrumins users    4096 2009-01-18 21:22 ..
   8 -rw-------  1 pkrumins users    8192 2009-02-16 19:30 .bash_history
   4 -rw-r--r--  1 pkrumins users     220 2009-01-18 21:04 .bash_logout

The consequences are that nc is a popular hacker tool as it is so easy to create a backdoor on any computer. On a Linux computer you may spawn /bin/bash and on a Windows computer cmd.exe to have total control over it.

That's everything I can think of. Do you know any other netcat uses that I did not include?

How to install nc?

If you're on Debian or Debian based system such as Ubuntu do the following:

$ sudo aptitude install netcat

If you're on Fedora or Fedora based system such as CentOS do:

$ sudo yum install netcat

If you're on Slackware, FreeBSD, NetBSD, Solaris or Mac, download the source code of nc and just:

$ tar -zxf nc-version.tar.gz
$ cd nc-version
$ ./configure && sudo make install

Another way to do it on Mac, if you have MacPorts is:

$ sudo port install netcat

On Slackware you can actually install it as a package from n/ package directory:

$ sudo installpkg nc-1.10-i386-1.tgz

If you're on Windows, download the Windoze port of it from securityfocus.

The manual of the utility can be found here man nc.

Have fun netcatting, and until next time!

This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

awk programming one-liners explainedThis is an update post on my three-part article Famous Awk One-Liners Explained.

I received an email from Eric Pement (the original author of Awk one-liners) and he said that there was a new version of awk1line.txt file available. I did a diff and found that there were seven new one-liners in it!

The new file has two new sections "String Creation" and "Array Creation" and it updates "Selective Printing of Certain Lines" section. I'll explain the new one-liners in this article.

Here is the latest version of awk1line.txt: awk1line-new.txt.

The original Eric Pement's Awk one-liner collection consists of five sections, and I explained them in my previous three articles:

Awesome news: I have written an e-book based on this article series. Check it out:

Okay, let's roll with the new one-liners:

String Creation

1. Create a string of a specific length (generate a string of x's of length 513).

awk 'BEGIN { while (a++<513) s=s "x"; print s }'

This one-liner uses the "BEGIN { }" special block that gets executed before anything else in an Awk program. In this block a while loop appends character "x" to variable "s" 513 times. After it has looped, the "s" variable gets printed out. As this Awk program does not have a body, it quits after executing the BEGIN block.

This one-liner printed the 513 x's out, but you could have used it for anything you wish in BEGIN, main program or END blocks.

Unfortunately this is not the most effective way to do it. It's a linear time solution. My friend waldner (who, by the way, wrote a guest post on 10 Awk Tips, Tricks and Pitfalls) showed me a solution that's logarithmic time (based on idea of recursive squaring):

function rep(str, num,     remain, result) {
    if (num < 2) {
        remain = (num == 1)
    } else {
        remain = (num % 2 == 1)
        result = rep(str, (num - remain) / 2)
    return result result (remain ? str  : "")

This function can be used as following:

awk 'BEGIN { s = rep("x", 513) }'

2. Insert a string of specific length at a certain character position (insert 49 x's after 6th char).

gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1'

This one-liner works only with Gnu Awk, because it uses the interval expression ".{6}" in the Awk program's body. Interval expressions were not traditionally available in awk, that's why you have to use "--re-interval" option to enable them.

For those that do not know what interval expressions are, they are regular expressions that match a certain number of characters. For example, ".{6}" matches any six characters (the any char is specified by the dot "."). An interval expression "b{2,4}" matches at least two, but not more than four "b" characters. To match words, you have to give them higher precedence - "(foo){4}" matches "foo" repeated four times - "foofoofoofoo".

The one-liner starts the same way as the previous - it creates a 49 character string "s" in the BEGIN block. Next, for each line of the input, it calls sub() function that replaces the first 6 characters with themselves and "s" appended. The "&" in the sub() function means the matched part of regular expression. The '"&" s' means matched part of regex and contents of variable "s". The "1" at the end of whole Awk one-liner prints out the modified line (it's syntactic sugar for just "print" (that itself is syntactic sugar for "print $0")).

The same can be achieved with normal standard Awk:

awk 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^....../,"&" s) }; 1

Here we just match six chars "......" at the beginning of line, and replace them with themselves + contents of variable "s".

It may get troublesome to insert a string at 29th position for example... You'd have to go tapping "." twenty-nine times ".............................". Better use Gnu Awk then and write ".{29}".

Once again, my friend waldner corrected me and pointed to Awk Feature Comparsion chart. The chart suggests that the original one-liner with ".{6}" would also work with POSIX awk, Busybox awk, and Solaris awk.

Array Creation

3. Create an array from string.

split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

This is not a one-liner per se but a technique to create an array from a string. The split(Str, Arr, Regex) function is used do that. It splits string Str into fields by regular expression Regex and puts the fields in array Arr. The fields are placed in Arr[1], Arr[2], ..., Arr[N]. The split() function itself returns the number of fields the string was split into.

In this piece of code the Regex is simply space character " ", the array is month and string is "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec". After the split, month[1] is "Jan", month[2] is "Feb", ..., month[12] is "Dec".

4. Create an array named "mdigit", indexed by strings.

for (i=1; i<=12; i++) mdigit[month[i]] = i

This is another array creation technique and not a real one-liner. This technique creates a reverse lookup array. Remember from the previous "one-liner" that month[1] was "Jan", ..., month[12] was "Dec". Now we want to the reverse lookup and find the number for each month. To do that we create a reverse lookup array "mdigit", such that mdigit["Jan"] = 1, ..., mdigit["Dec"] = 12.

It's really trivial, we loop over month[1], month[2], ..., month[12] and set mdigit[month[i]] to i. This way mdigit["Jan"] = 1, etc.

Selective Printing of Certain Lines

5. Print all lines where 5th field is equal to "abc123".

awk '$5 == "abc123"'

This one-liner uses idiomatic Awk - if the given expression is true, Awk prints out the line. The fifth field is referenced by "$5" and it's checked to be equal to "abc123". If it is, the expression is true and the line gets printed.

Unwinding this idiom, this one-liner is really equal to:

awk '{ if ($5 == "abc123") { print $0 } }'

6. Print any line where field #5 is not equal to "abc123".

awk '$5 != "abc123"'

This is exactly the same as previous one-liner, except it negates the comparison. If the fifth field "$5" is not equal to "abc123", then print it.

Unwinding it, it's equal to:

awk '{ if ($5 != "abc123") { print $0 } }'

Another way is to literally negate the whole previous one-liner:

awk '!($5 == "abc123")'

7. Print all lines whose 7th field matches a regular expression.

awk '$7  ~ /^[a-f]/'

This is also idiomatic Awk. It uses "~" operator to test if the seventh "$7" field matches a regular expression "^[a-f]". This regular expression means "all lines that start with a lower-case letter a, b, c, d, e, or f".

awk '$7 !~ /^[a-f]/'

This one-liner matches negates the previous one and prints all lines that do not start with a lower-case letter a, b, c, d, e, and f.

Another way to write the same is:

awk '$7 ~ /^[^a-f]/'

Here we negated the group of letters [a-f] by adding "^" in the group. That's a regex trick to know.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

Have fun with these Awk oneliners!

If you haven't already, I recommend that you download my Awk cheat-cheet, read the "10 Awk Tips, Tricks and Pitfalls" article, and study the source code of my YouTube Video Downloader, written entirely in Gnu Awk.