This article is part of the article series "Awk One-Liners Explained."
<- previous article next article ->

awk programming one-liners explainedThis is an update post on my three-part article Awk One-Liners Explained.

I received an email from Eric Pement (the original author of Awk one-liners) and he said that there was a new version of awk1line.txt file available. I did a diff and found that there were seven new one-liners in it!

The new file has two new sections "String Creation" and "Array Creation" and it updates "Selective Printing of Certain Lines" section. I'll explain the new one-liners in this article.

Here is the latest version of awk1line.txt: awk1line-new.txt.

The original Eric Pement's Awk one-liner collection consists of five sections, and I explained them in my previous three articles:

Awesome news: I have written an e-book based on this article series. Check it out:

Okay, let's roll with the new one-liners:

String Creation

1. Create a string of a specific length (generate a string of x's of length 513).

awk 'BEGIN { while (a++<513) s=s "x"; print s }'

This one-liner uses the "BEGIN { }" special block that gets executed before anything else in an Awk program. In this block a while loop appends character "x" to variable "s" 513 times. After it has looped, the "s" variable gets printed out. As this Awk program does not have a body, it quits after executing the BEGIN block.

This one-liner printed the 513 x's out, but you could have used it for anything you wish in BEGIN, main program or END blocks.

Unfortunately this is not the most effective way to do it. It's a linear time solution. My friend waldner (who, by the way, wrote a guest post on 10 Awk Tips, Tricks and Pitfalls) showed me a solution that's logarithmic time (based on idea of recursive squaring):

function rep(str, num,     remain, result) {
    if (num < 2) {
        remain = (num == 1)
    } else {
        remain = (num % 2 == 1)
        result = rep(str, (num - remain) / 2)
    return result result (remain ? str  : "")

This function can be used as following:

awk 'BEGIN { s = rep("x", 513) }'

2. Insert a string of specific length at a certain character position (insert 49 x's after 6th char).

gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1'

This one-liner works only with Gnu Awk, because it uses the interval expression ".{6}" in the Awk program's body. Interval expressions were not traditionally available in awk, that's why you have to use "--re-interval" option to enable them.

For those that do not know what interval expressions are, they are regular expressions that match a certain number of characters. For example, ".{6}" matches any six characters (the any char is specified by the dot "."). An interval expression "b{2,4}" matches at least two, but not more than four "b" characters. To match words, you have to give them higher precedence - "(foo){4}" matches "foo" repeated four times - "foofoofoofoo".

The one-liner starts the same way as the previous - it creates a 49 character string "s" in the BEGIN block. Next, for each line of the input, it calls sub() function that replaces the first 6 characters with themselves and "s" appended. The "&" in the sub() function means the matched part of regular expression. The '"&" s' means matched part of regex and contents of variable "s". The "1" at the end of whole Awk one-liner prints out the modified line (it's syntactic sugar for just "print" (that itself is syntactic sugar for "print $0")).

The same can be achieved with normal standard Awk:

awk 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^....../,"&" s) }; 1

Here we just match six chars "......" at the beginning of line, and replace them with themselves + contents of variable "s".

It may get troublesome to insert a string at 29th position for example... You'd have to go tapping "." twenty-nine times ".............................". Better use Gnu Awk then and write ".{29}".

Once again, my friend waldner corrected me and pointed to Awk Feature Comparsion chart. The chart suggests that the original one-liner with ".{6}" would also work with POSIX awk, Busybox awk, and Solaris awk.

Array Creation

3. Create an array from string.

split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

This is not a one-liner per se but a technique to create an array from a string. The split(Str, Arr, Regex) function is used do that. It splits string Str into fields by regular expression Regex and puts the fields in array Arr. The fields are placed in Arr[1], Arr[2], ..., Arr[N]. The split() function itself returns the number of fields the string was split into.

In this piece of code the Regex is simply space character " ", the array is month and string is "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec". After the split, month[1] is "Jan", month[2] is "Feb", ..., month[12] is "Dec".

4. Create an array named "mdigit", indexed by strings.

for (i=1; i<=12; i++) mdigit[month[i]] = i

This is another array creation technique and not a real one-liner. This technique creates a reverse lookup array. Remember from the previous "one-liner" that month[1] was "Jan", ..., month[12] was "Dec". Now we want to the reverse lookup and find the number for each month. To do that we create a reverse lookup array "mdigit", such that mdigit["Jan"] = 1, ..., mdigit["Dec"] = 12.

It's really trivial, we loop over month[1], month[2], ..., month[12] and set mdigit[month[i]] to i. This way mdigit["Jan"] = 1, etc.

Selective Printing of Certain Lines

5. Print all lines where 5th field is equal to "abc123".

awk '$5 == "abc123"'

This one-liner uses idiomatic Awk - if the given expression is true, Awk prints out the line. The fifth field is referenced by "$5" and it's checked to be equal to "abc123". If it is, the expression is true and the line gets printed.

Unwinding this idiom, this one-liner is really equal to:

awk '{ if ($5 == "abc123") { print $0 } }'

6. Print any line where field #5 is not equal to "abc123".

awk '$5 != "abc123"'

This is exactly the same as previous one-liner, except it negates the comparison. If the fifth field "$5" is not equal to "abc123", then print it.

Unwinding it, it's equal to:

awk '{ if ($5 != "abc123") { print $0 } }'

Another way is to literally negate the whole previous one-liner:

awk '!($5 == "abc123")'

7. Print all lines whose 7th field matches a regular expression.

awk '$7  ~ /^[a-f]/'

This is also idiomatic Awk. It uses "~" operator to test if the seventh "$7" field matches a regular expression "^[a-f]". This regular expression means "all lines that start with a lower-case letter a, b, c, d, e, or f".

awk '$7 !~ /^[a-f]/'

This one-liner matches negates the previous one and prints all lines that do not start with a lower-case letter a, b, c, d, e, and f.

Another way to write the same is:

awk '$7 ~ /^[^a-f]/'

Here we negated the group of letters [a-f] by adding "^" in the group. That's a regex trick to know.

Awk one-liners explained e-book

I have written my first e-book called "Awk One-Liners Explained". I improved the explanations of the one-liners in this article series, added new one-liners and added three new chapters - introduction to awk one-liners, summary of awk special variables and idiomatic awk. Please take a look:

Have Fun!

Have fun with these Awk oneliners!

If you haven't already, I recommend that you download my Awk cheat-cheet, read the "10 Awk Tips, Tricks and Pitfalls" article, and study the source code of my YouTube Video Downloader, written entirely in Gnu Awk.

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

Don’t Copy That FloppyI have found some new geek songs to post on my Musical Geek Friday. This week it's a song called Hax That Fsck. It's a hip-hop/rap song about a hacker getting owned by FBI and his adventures in hacking Gibson and CIA.

The song is written by Finnish band called "Leech Axss". This is their second geek song that I have posted on Musical Geek Friday. See "Leech Access is Coming at You".

I had problems with understanding some parts of the song. I put "(???)" in the lyrics where I was not sure what was being said. I'd appreciate if someone helped me figure it out the correct lyrics!

The song is strongly NSFW, be sure to put your headphones on before listening at work.


Download this song: hax that fsck (musical geek friday #17)
Downloaded: 89825 times

Download lyrics: hax that fsck lyrics (musical geek friday #17)
Downloaded: 6740 times

Here is the lyrics (I censored the explicit language, see the 'download lyrics' link above for uncensored version):

Hax that f**k, move that box
Exploit a hole in a FireFox
Haxing is about breaking in, not fame
Whoever you be haxing is all the same

Crack that code, buffer overflow
Alt-F4 makes a lamer go
Hax that Gibson and CIA
Sell your soul to the zero da-aa-ay

As you well know, I net sex with duty creatures (???)
I now have sex with the federal agents
You bruteforce the password of
Knock, knock, it's the feds and the bubba is on

Right to remain silent like my box in ice
Perfect source code, never write it twice
Better go figure it out before you talk to a fed
It's a feeling very close to a blue screen of death

A monkey Steve Jobs intringing with Bill Gates
That ain't gonna tway with the fscking absura blades (????)
Apple in my eye and a panther in my tank
I bought an itunes that are like tri-hand ****

Still in the game but before I went away
Nerdies want my wishes and they want my zero days
But behave you geeks, and I won't go latest patch
The one that cycle is Christina Aguilera snatch

Hax that f**k, move that box
Exploit a hole in the FireFox
Haxing is about breaking in, not fame
Whoever you be haxing is all the same

Crack that code, buffer overflow
Alt-F4 makes a lamer go
Hax that Gibson and CIA
Sell your soul to the zero da-aa-ay

I read b0w and b0g
I sport AMD and fscking RISC
I am not like you, I never disrespect my box
Points to stolen motherf*****g Mozilla leanfox

What's under the hood, I've got future of jobs
Thousand mp3s in the court
Drop it like it's hot, drop it like it's hot, drop it like it's hot
Check my benchmark, you skinny retard

[Hshhha] loves a key generator
If it's not 0day, don't seed it later
I met my share of lamers, I've got some spleen
Zero entropy is like Commander Keen

I've been in hax of yours and to haxifice
I've seen in the eyes which drugsterize (???)
I'm haxing my way out of the paper bag
Authenticating g in my f*****g ... (???)

Hax that f**k, move that box
Exploit a hole in the FireFox
Haxing is about breaking in, not fame
Whoever you be haxing is all the same

Crack that code, buffer overflow
Alt-F4 makes a lamer go
Hack that Gibbson and CIA
Sell your soul to the zero da-aa-ay

Hax that fsck, move that box
Exploit a hole in the FireFox
Haxing is about breaking in, not fame
Whoever you be haxing is all the same

Crack that code, buffer overflow
Alt-F4 makes a lamer go
Hax that Gibson and CIA
Sell your soul to the zero da-aa-ay

Download "Hax That Fsck" Song

Download this song: hax that fsck (musical geek friday #17)
Downloaded: 89825 times

Download lyrics: hax that fsck lyrics (musical geek friday #17)
Downloaded: 6740 times

Click to listen:

Have fun and until next geeky Friday! :)

This article is part of the article series "Vim Plugins You Should Know About."
<- previous article next article ->

Vim Plugins, surround.vimThis is the third post in the article series "Vim Plugins You Should Know About". This time I am going to introduce you to a plugin called "matchit.vim".

If you are intrigued by this topic, I suggest that you subscribe to my posts! For the introduction and first post in this article series, follow this link - Vim Plugins You Should Know About, Part I: surround.vim. Part II is here: repeat.vim.

Matchit extends the existing functionality of "%" key (percent key). I'll first briefly remind you what the original "%" does and then explain how matchit.vim enhances it.

The original "%" key allows you to jump between various pairs of characters and some programming constructs. For example, it jumps between pairs of parenthesis ( )'s, { }'s, [ ]'s. It also jumps between opening and closing tags of C style comments /* and */. And it's smart enough to jump between C preprocessor directives - from #if to #endif and match #elif or #else in between.

Here is an example. Suppose you have this code and you press "%", the cursor jumps between { and } parens:

matchit vim plugin example

Matchit.vim extends this functionality. It's written by Benji Fisher and it adds support to cycle between if, else if, else, endif keywords in various programming languages. Another improvement is the ability to find pairs of HTML tags, such as <p> ... </p>. Another handy mapping is "g%" that does "%" in opposite direction (goes from endif to else to else if to if). The plugin also includes several other mappings like "]%", "[%" and "a%" but I could not figure out how to effectively use them in real life code, so I don't use them at all.

Here is another example. Suppose you are editing this HTML and you quickly want to go to the corresponding closing tag of <body>. Just press "%":

matchit vim html example

Overall it's a great plugin to have in your inventory!

How to install matchit.vim?

Matchit.vim has been included in vim since version 6.0. However there are newer versions of the script available with bug fixes and enhancements.

To get the latest version:

  • 1. Download
  • 2. Extract to ~/.vim (on Unix/Linux) or ~\vimfiles (on Windows).
  • 3. Run :helptags ~/.vim/doc (on Unix/Linux) or :helptags ~/vimfiles/doc (on Windows) to rebuild the tags file (so that you can read :help %, :help g%, etc.)
  • 4. Restart Vim or source matchit.vim with ":so ~/.vim/plugin/matchit.vim" on Unix or ":so ~/vimfiles/plugin/matchit.vim" on Windows).
  • 5. Use '%' to find corresponding

For Python programmers: Turns out the original matchit.vim plugin does not match if / elif / else. Benji extended matchit.vim itself and created "python_matchit.vim". This extension allows us to use the "%" key to cycle through if/elif/else, try/except/catch, for/continue/break, and while/continue/break structures. The script also defines g% to cycle in the opposite direction, and it defines two other motions, [% and ]%, go to the start and end of the current block, respectively.

How to install python_matchit.vim?

Python_matchit.vim is a filetype plugin. It has to be installed in "ftplugin" directory. Follow these steps to get it installed:

  • 1. Download python_matchit.vim.
  • 2. Put it in ~/.vim/ftplugin (on Unix/Linux) or ~\vimfiles\ftplugin (on Windows).
  • 3. Restart Vim or source matchit.vim with ":so ~/.vim/ftplugin/python_matchit.vim" on Unix or ":so ~/vimfiles/ftplugin/python_matchit.vim" on Windows).

The same steps can be taken to install matchit for Ruby.

Have Fun!

Happy matching with matchit.vim!

This article is part of the article series "Unix Utilities You Should Know About."
<- previous article next article ->

Unix UtilitiesHi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about. The articles will discuss one Unix program at a time. I'll try to write a good introduction to the tool and give as many examples as I can think of.

Before I start, I want to clarify one thing - Why am I starting so many article series? The answer is that I want to write about many topics simultaneously and switch between them as I feel inspired.

The first post in this series is going to be about not so well known Unix program called Pipe Viewer or pv for short. Pipe viewer is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Pipe viewer is written by Andrew Wood, an experienced Unix sysadmin. The homepage of pv utility is here: pv utility.

If you feel like you are interested in this stuff, I suggest that you subscribe to my rss feed to receive my future posts automatically.

How to use pv?

Ok, let's start with some really easy examples and progress to more complicated ones.

Suppose that you had a file "access.log" that is a few gigabytes in size and contains web logs. You want to compress it into a smaller file, let's say a gunzip archive (.gz). The obvious way would be to do:

$ gzip -c access.log > access.log.gz

As the file is so huge (several gigabytes), you have no idea how long to wait. Will it finish soon? Or will it take another 30 mins?

By using pv you can precisely time how long it will take. Take a look at doing the same through pv:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=>      ] 15% ETA 0:00:59

Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

You may stick several pv processes in between. For example, you can time how fast the data is being read from the disk and how much data is gzip outputting:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source:  760MB 0:00:15 [37.4MB/s] [=>     ] 19% ETA 0:01:02
  gzip: 34.5MB 0:00:15 [1.74MB/s] [  <=>  ]

Here we specified the "-N" parameter to pv to create a named stream. The "-c" parameter makes sure the output is not garbaged by one pv process writing over the other.

This example shows that "access.log" file is being read at a speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!

Notice how the gzip does not include how much data is left or how fast it will finish. It's because the pv process after gzip has no idea how much data gzip will produce (it's just outputting compressed data from input stream). The first pv process, however, knows how much data is left, because it's reading it.

Another similar example would be to pack the whole directory of files into a compressed tarball:

$ tar -czf - . | pv > out.tgz
 117MB 0:00:55 [2.7MB/s] [>         ]

In this example pv shows just the output rate of "tar -czf" command. Not very interesting and it does not provide information about how much data is left. We need to provide the total size of data we are tarring to pv, it's done this way:

$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
 253MB 0:00:05 [46.7MB/s] [>     ]  1% ETA 0:04:49

What happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk '{print $1}'" returns number of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.

Another fine example is copying large amounts of data over network by using help of "nc" utility that I will write about some other time.

Suppose you have two computers A and B. You want to transfer a directory from A to B very quickly. The fastest way is to use tar and nc, and time the operation with pv.

# on computer A, with IP address
$ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
# on computer B
$ nc 6666 | pv | tar -xf -

That's it. All the files in /path/to/dir on computer A will get transferred to computer B, and you'll be able to see how fast the operation is going.

If you want the progress bar, you have to do the "pv -s $(...)" trick from the previous example (only on computer A).

Another funny example is by my blog reader alexandru. He shows how to time how fast the computer reads from /dev/zero:

$ pv /dev/zero > /dev/null
 157GB 0:00:38 [4,17GB/s]

That's about it. I hope you enjoyed my examples and learned something new. I love explaining things and teaching! :)

How to install pv?

If you're on Debian or Debian based system such as Ubuntu do the following:

$ sudo aptitude install pv

If you're on Fedora or Fedora based system such as CentOS do:

$ sudo yum install pv

If you're on Slackware, go to pv homepage, download the pv-version.tar.gz archive and do:

$ tar -zxf pv-version.tar.gz
$ cd pv-version
$ ./configure && sudo make install

If you're a Mac user:

$ sudo port install pv

If you're OpenSolaris user:

$ pfexec pkg install pv

If you're a Windows user on Cygwin:

$ ./configure
$ export DESTDIR=/cygdrive/c/cygwin
$ make
$ make install

The manual of the utility can be found here man pv.

Have fun measuring your pipes with pv, and until next time!

This article is part of the article series "MIT Introduction to Algorithms."
<- previous article next article ->

MIT AlgorithmsThis is the twelfth post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review a trilogy of lectures on graph and shortest path algorithms. They are lectures seventeen, eighteen and nineteen. They'll cover Dijkstra's Algorithm, Breadth-First Search Algorithm and Bellman-Ford Algorithm for finding single-source shortest paths as well as Floyd-Warshall Algorithm and Johnson's Algorithm for finding all-pairs shortest paths.

These algorithms require a thorough understanding of graphs. See the previous lecture for a good review of graphs.

Lecture seventeen focuses on the single-source shortest-paths problem: Given a graph G = (V, E), we want to find a shortest path from a given source vertex s ∈ V to each vertex v ∈ V. In this lecture the weights of edges are restricted to be positive which leads it to Dijkstra's algorithm and in a special case when all edges have unit weight to Breadth-first search algorithm.

Lecture eighteen also focuses on the same single-source shortest-paths problem, but allows edges to be negative. In this case a negative-weight cycles may exist and Dijkstra's algorithm would no longer work and would produce incorrect results. Bellman-Ford algorithm therefore is introduced that runs slower than Dijkstra's but detects negative cycles. As a corollary it is shown that Bellman-Ford solves Linear Programming problems with constraints in form xj - xi <= wij.

Lecture nineteen focuses on the all-pairs shortest-paths problem: Find a shortest path from u to v for every pair of vertices u and v. Although this problem can be solved by running a single-source algorithm once from each vertex, it can be solved faster with Floyd-Warshall algorithm or Johnson's algorithm.

Lecture 17: Shortest Paths I: Single-Source Shortest Paths and Dijkstra's Algorithm

Lecture seventeen starts with a small review of paths and shortest paths.

It reminds that given a graph G = (V, E, w), where V is a set of vertices, E is a set of edges and w is weight function that maps edges to real-valued weights, a path p from a vertex u to a vertex v in this graph is a sequence of vertices (v0, v1, ..., vk) such that u = v0, v = vk and (vi-1, vi) ∈ E. The weight w(p) of this path is a sum of weights over all edges = w(v0, v1) + w(v1, v2) + ... + w(vk-1, vk). It also reminds that a shortest path from u to v is the path with minimum weight of all paths from u to v, and that a shortest path in a graph might not exist if it contains a negative weight cycle.

The lecture then notes that shortest paths exhibit the optimal substructure property - a subpath of a shortest path is also a shortest path. The proof of this property is given by cut and paste argument. If you remember from previous two lectures on dynamic programming and greedy algorithms, an optimal substructure property suggests that these two techniques could be applied to solve the problem efficiently. Indeed, applying the greedy idea, Dijkstra's algorithm emerges.

Here is a somewhat precise definition of single-source shortest paths problem with non-negative edge weights: Given a graph G = (V, E), and a starting vertex s ∈ V, find shortest-path weights for all vertices v ∈ V.

Here is the greedy idea of Dijkstra's algorithm:

  • 1. Maintain a set S of vertices whose shortest-path from s are known (s ∈ S initially).
  • 2. At each step add vertex v from the set V-S to the set S. Choose v that has minimal distance from s (be greedy).
  • 3. Update the distance estimates of vertices adjacent to v.

I have also posted a video interview with Edsger Dijkstra - Edsger Dijkstra: Discipline in Thought, please take a look if you want to see how Dijkstra looked like. :)

The lecture continues with an example of running Dijkstra's algorithm on a non-trivial graph. It also introduces to a concept of a shortest path tree - a tree that is formed by edges that were last relaxed in each iteration (hard to explain in English, see lecture at 43:40).

The other half of lecture is devoted to three correctness arguments of Dijkstra's algorithm. The first one proves that relaxation never makes a mistake. The second proves that relaxation always makes the right greedy choice. And the third proves that when algorithm terminates the results are correct.

At the final minutes of lecture, running time of Dijkstra's algorithm is analyzed. Turns out that the running time depends on what data structure is used for maintaining the priority queue of the set V-S (step 2). If we use an array, the running time is O(V2), if we use binary heap, it's O(E·lg(V)) and if we use Fibonacci heap, it's O(E + V·lg(V)).

Finally a special case of weighted graphs is considered when all weights are unit weights. In this case a single-source shortest-paths problem can be solved by a the Breadth-first search (BFS) algorithm that is actually a simpler version of Dijkstra's algorithm with priority queue replaced by a FIFO! The running time of BFS is O(V+E).

You're welcome to watch lecture seventeen:

Topics covered in lecture seventeen:

  • [01:40] Review of paths.
  • [03:15] Edge weight functions.
  • [03:30] Example of a path, its edge weights, and weight of the path.
  • [04:22] Review of shortest-paths.
  • [05:15] Shortest-path weight.
  • [06:30] Negative edge weights.
  • [10:55] Optimal substructure of a shortest path.
  • [11:50] Proof of optimal substructure property: cut and paste.
  • [14:23] Triangle inequality.
  • [15:15] Geometric proof of triangle inequality.
  • [16:30] Single-source shortest paths problem.
  • [18:32] Restricted single-source shortest paths problem: all edge weights positive or zero.
  • [19:35] Greedy idea for ss shortest paths.
  • [26:40] Dijkstra's algorithm.
  • [35:30] Example of Dijkstra's algorithm.
  • [43:40] Shortest path trees.
  • [45:12] Correctness of Dijkstra's algorithm: why relaxation never makes mistake.
  • [53:55] Correctness of Dijkstra's algorithm: why relaxation makes progress.
  • [01:01:00] Correctness of Dijkstra's algorithm: why it gives correct answer when it terminates.
  • [01:15:40] Running time of Dijkstra's algorithm.
  • [01:18:40] Running time depending on using array O(V^2), binary heap O(E·lg(V)) and Fibonacci heap O(E + V·lg(V)) for priority queue.
  • [01:20:00] Unweighted graphs.
  • [01:20:40] Breadth-First Search (BFS) algorithm.
  • [01:23:23] Running time of BFS: O(V+E).

Lecture seventeen notes:

MIT Algorithms Lecture 17 Notes Thumbnail. Page 1 of 2.
Lecture 17, page 1 of 2: paths, shortest paths, negative edge weights, optimal substructure, triangle inequality, single-source shortest paths, dijkstra's algorithm.

MIT Algorithms Lecture 17 Notes Thumbnail. Page 2 of 2.
Lecture 17, page 2 of 2: example of dijkstra's algorithm, correctness of dijkstra's algorithm, running time of dijkstra, unweighted graphs, breadth first search (bfs).

Lecture 18: Shortest Paths II: Bellman-Ford Algorithm

Lecture eighteen begins with recalling that if a graph contains a negative weight cycle, then a shortest path may not exist and gives a geometric illustration of this fact.

Right after this fact, it jumps to Bellman-Ford algorithm. The Bellman-Ford algorithm solves the single-source shortest-paths problem in the general case in which edge weights may be negative. Given a weighted, directed graph G = (V, E) with source s and weight function w: E → R, the Bellman-Ford algorithm produces the shortest paths from s and their weights, if there is no negative weight cycle, and it produces no answer if there is a negative weight cycle.

The algorithm uses relaxation, progressively decreasing an estimate on the weight of a shortest path from the source s to each vertex v ∈ V until it achieves the actual shortest-path weight.

The running time of Bellman-Ford algorithm is O(VE). The lecture also gives a correctness proof of Bellman-Ford algorithm.

The other half of the lecture is devoted to a problem that can be effectively solved by Bellman-Ford. It's called the linear feasibility problem that it is a special case of linear programming (LP) problem, where there is no objective but the constraints are in form xi <= wij. It is noted that Bellman-Ford is actually a simple case of LP problem.

The lecture ends with an application of Bellman-Ford to solving a special case of VLSI layout problem in 1 dimension.

You're welcome to watch lecture eighteen:

Topics covered in lecture eighteen:

  • [00:20] A long, long time ago... in a galaxy far, far away...
  • [00:40] Quick review of previous lecture - Dijkstra's algorithm, non-negative edge weights.
  • [01:40] Description of Bellman-Ford algorithm.
  • [04:30] Bellman-Ford algorithm.
  • [08:50] Running time of Bellman-Ford O(VE).
  • [10:05] Example of Bellmen-Ford algorithm.
  • [18:40] Correctness of Bellman-Ford algorithm:
  • [36:30] Linear programming (LP).
  • [42:48] Efficient algorithms for solving LPs: simplex algorithm (exponential in worst case, but practical), ellipsoid algorithm (polynomial time, impractical), interior point methods (polynomial), random sampling (brand new, discovered at MIT).
  • [45:58] Linear feasibility problem - LP with no objective.
  • [47:30] Difference constraints - constraints in form xj - xi <= wij.
  • [49:50] Example of difference constraints.
  • [51:04] Constraint graph.
  • [54:05] Theorem: Negative weight cycle in constraint means difference constraints are infeasible/unsatisfiable.
  • [54:50] Proof.
  • [59:15] Theorem: If no negative weight cycle then satisfiable.
  • [01:00:23] Proof.
  • [01:08:20] Corollary: Bellman-Ford solves a system of of m difference constraints on n variables in O(mn) time.
  • [01:12:30] VLSI Layout problem solved by Bellman-Ford.

Lecture eighteen notes:

MIT Algorithms Lecture 18 Notes Thumbnail. Page 1 of 2.
Lecture 18, page 1 of 2: bellman-ford algorithm, correctness of bellman-ford, linear programming.

MIT Algorithms Lecture 18 Notes Thumbnail. Page 2 of 2.
Lecture 18, page 2 of 2: feasibility problem, bellman-ford and difference constraints, vlsi layouts.

Lecture 19: Shortest Paths III: All-Pairs Shortest Paths and Floyd-Warshall Algorithm

Lecture nineteen starts with a quick review of lectures seventeen and eighteen. It reminds the running times of various single-source shortest path algorithms, and mentions that in case of a directed acyclic graphs (which was not covered in previous lectures), you can run topological sort and 1 round of Bellman-Ford that makes it find single-source shortest paths in linear time (for graphs) in O(V+E).

The lecture continues all-pairs shortest paths problem, where we want to know the shortest path between every pair of vertices.

A naive approach to this problem is run single-source shortest path from each vertex. For example, on an unweighted graph we'd run BFS algorithm |V| times that would give O(VE) running time. On a non-negative edge weight graph it would be |V| times Dijkstra's algorithm, giving O(VE + V2lg(V)) time. And in general case we'd run Bellman-Ford |V| times that would make the algorithm run in O(V2E) time.

The lecture continues with a precise definition of all-pairs shortest paths problem: given a directed graph, find an NxN matrix (N = |V|), where each entry aij is the shortest path from vertex i to vertex j.

In general case, if the graph has negative edges and it's dense, the best we can do so far is run Bellman-Ford |V| times. Recalling that E = O(V2) in a dense graph, the running time is O(V2E) = O(V4) - hypercubed in number of vertices = slow.

Lecture then proceeds with a dynamic programming algorithm without knowing if it will be faster or not. It's too complicated to explain here, and I recommend watching lecture at 11:54 to understand it. Turns out this dynamic programming algorithm does not give a performance boost and is still O(V4), but it gives some wicked ideas.

The most wicked idea is to connect matrix multiplication with the dynamic programming recurrence and using repeated squaring to beat O(V4). This craziness gives O(V3lgV) time that is an improvement. Please see 23:40 in the lecture for full explanation.

After all this the lecture arrives at Floyd-Warshall algorithm that finds all-pairs shortest paths in O(V3). The algorithm is derived from a recurrence that the shortest path from vertex i to vertex j is minimum of { shortest path from i to j directly or shortest path from i to k and shortest path from k to j }.

Finally the lecture explains Johnson's algorithm that runs in O(VE + V2log(V)) time for sparse graphs. The key idea in this algorithm is to reweigh all edges so that they are all positive, then run Dijkstra from each vertex and finally undo the reweighing.

It turns out, however, that to find the function for reweighing all edges, a set of difference constraints need to be satisfied. It makes us first run Bellman-Ford to solve these constraints.

Reweighing takes O(EV) time, running Dijkstra on each vertex takes O(VE + V2lgV) and undoing reweighing takes O(V2) time. Of these terms O(VE + V2lgV) dominates and defines algorithm's running time (for dense it's still O(V3).

You're welcome to watch lecture nineteen:

Topics covered in lecture nineteen:

  • [01:00] Review of single-source shortest path algorithms.
  • [04:45] All-pairs shortest paths by running single-source shortest path algorithms from each vertex.
  • [05:35] Unweighted edges: |V|xBFS = O(VE).
  • [06:35] Nonnegative edge weights: |V|xDijkstra = O(VE + V2lg(V)).
  • [07:40] General case: |V|xBellman-Ford = O(V2E).
  • [09:10] Formal definition of all-pairs shortest paths problem.
  • [11:08] |V|xBellman-Ford for dense graphs (E = O(V2)) is O(V4).
  • [11:54] Trying to beat O(V4) with dynamic programming.
  • [19:30] Dynamic programming algorithm, still O(V4).
  • [23:40] A better algorithm via wicked analogy with matrix multiplication. Running time: O(V3lg(V)).
  • [37:45] Floyd-Warshall algorithm. Runs in O(V3).
  • [47:35] Transitive closure problem of directed graphs.
  • [53:30] Johnson's algorithm. Runs in O(VE + V2log(V))

Lecture nineteen notes:

MIT Algorithms Lecture 19 Notes Thumbnail. Page 1 of 2.
Lecture 19, page 1 of 2: all-pairs shortest paths, dynamic programming algorithm, matrix multiplication analogy.

MIT Algorithms Lecture 19 Notes Thumbnail. Page 2 of 2.
Lecture 19, page 2 of 2: floyd-warshall algorithm, transitive closure problem, johnson's algorithn.

Have fun with shortest path algorithms! The next post is going to be an introduction to parallel algorithms - things like dynamic multithreading, scheduling and multithreaded algorithms.

PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"). Chapters 24 and 25, called "Single-Source Shortest Paths" and "All-Pairs Shortest Paths" explain everything I wrote about here in much more detail. If these topics excite you, you may want to buy this awesome book: