post 'good coders code, great reuse' to del.icio.us post 'good coders code, great reuse' to digg post 'good coders code, great reuse' to reddit subscribe to 'good coders code, great reuse' posts via feed
good coders code, great reuse

Programming can be fun, so can cryptography; however they should not be combined.

Kreitzberg and Shneiderman

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Introduction to Algorithms 11 Nov 2009 01:15 pm
1 Star2 Stars3 Stars4 Stars5 Stars (25 votes, average: 4.68 out of 5)
Loading ... Loading ...

As you all may know, I watched and posted my lecture notes of the whole MIT Introduction to Algorithms course. In this post I want to summarize all the topics that were covered in the lectures and point out some of the most interesting things in them.

Actually, before I wrote this article, I had started writing an article called “The coolest things that I learned from MIT’s Introduction to Algorithms” but quickly did I realize that what I was doing was listing the topics in each article and not really pointing out the coolest things. Therefore I decided to write a summary article first (I had promised to do so), and only then write an article on really the most exciting topics.

Talking about the summary, I watched a total of 23 lectures and it resulted in 14 blog posts. It took nearly a year to publish them here. The first blog post in this series was written in August 2008, and the last in July 2009. Here is a list of all the posts:

I’ll now go through each of the lectures. They require quite a bit of math knowledge to understand. If you are uncertain about your math skills, I’d suggest reading Knuth’s Concrete Mathematics book. It contains absolutely all the necessary math to understand this course.

Lecture 1: Analysis of Algorithms

If you’re a student, or even if you’re not, you must never miss the first lecture of any course, ever! The first lecture tells you what to expect from the course, how it will be taught, what it will cover, who the professor is, what the prerequisites are, and a bunch of other important and interesting things.

In this lecture you also get to know professor Charles E. Leiserson (author of CLRS) and he explains the following topics:

  • Why study algorithms and their performance?
  • What is the analysis of algorithms?
  • What can be more important than the performance of algorithms?
  • The sorting problem.
  • Insertion sort algorithm.
  • Running time analysis of insertion sort.
  • Asymptotic analysis.
  • Worst-case, average-case, best-case running time analysis.
  • Analysis of insertion sort’s worst-case running time.
  • Asymptotic notation - theta notation - Θ.
  • Merge sort algorithm.
  • The recursive nature of merge sort algorithm.
  • Running time recurrence for merge sort.
  • Recursion trees.
  • Running time analysis of merge sort by looking at the recursion tree.
  • General recurrence for divide and conquer algorithms.

I personally found the list of things that can be more important than the performance of the program interesting. These things are modularity, correctness, maintainability, security, functionality, robustness, user-friendliness, programmer’s time, simplicity, extensibility, reliability, scalability.

Follow this link to the full review of lecture one.

Lecture 2: Analysis of Algorithms (continued)

The second lecture is presented by Eric Demaine. He’s the youngest professor in the history of MIT.

Here are the topics that he explains in the second lecture:

  • Asymptotic notation.
  • Big-o notation - O.
  • Set definition of O-notation.
  • Capital-omega notation - Ω.
  • Theta notation - Θ.
  • Small-o notation - o.
  • Small-omega notation - ω.
  • Solving recurrences by substitution method.
  • Solving recurrences by recursion-tree method.
  • Solving recurrences by the Master’s method.
  • Intuitive sketch proof of the Master’s method.

An interesting thing in this lecture is the analogy of (O, Ω, Θ, o, ω) to (≤, ≥, =, <, >).

For example, if we say f(n) = O(n2) then by using the analogy we can think of it as f(n) ≤ c·n2, that is, function f(n) is always smaller than or equal to c·n2, or in other words, it’s bounded above by function c·n2, which is exactly what f(n) = O(n2) means.

Follow this link to the full review of lecture two.

Lecture 3: Divide and Conquer

The third lecture is all about the divide-and-conquer algorithm design method and its applications. The divide and conquer method solves a problem by 1) breaking it into a number of subproblems (divide step), 2) solving each problem recursively (conquer step), 3) combining the solutions (combine step).

Here are the topics explained in the third lecture:

  • The nature of divide and conquer algorithms.
  • An example of divide and conquer - merge sort.
  • Solving for running time of merge sort by Master’s method.
  • Binary search.
  • Powering a number.
  • Fibonacci numbers.
  • Algorithms for computing Fibonacci numbers.
  • Fibonacci by naive recursive algorithm.
  • Fibonacci by bottom-up algorithm.
  • Fibonacci by naive recursive squaring.
  • Fibonacci by matrix recursive squaring.
  • Matrix multiplication
  • Strassen’s algorithm.
  • VLSI (very large scale integration) layout problem.

I was the most impressed by the four algorithms for computing Fibonacci numbers. I actually wrote about one of them in my publication “On the Linear Time Algorithm For Finding Fibonacci Numbers,” which explains how this algorithms is actually quadratic in practice (but linear in theory).

Follow this link to the full review of lecture three.

Lecture 4: Sorting

Lecture four is devoted entirely to the quicksort algorithm. It’s the industry standard algorithm that is used for sorting in most of the computer systems. You just have to know it.

Topics explained in lecture four:

  • Divide and conquer approach to sorting.
  • Quicksort algorithm.
  • The partition routine in the quicksort algorithm.
  • Running time analysis of quicksort.
  • Worst-case analysis of quicksort.
  • Intuitive, best-case analysis of quicksort.
  • Randomized quicksort.
  • Indicator random variables.
  • Running time analysis of randomized quicksort in expectation.

I loved how the idea of randomizing the partition subroutine in quicksort algorithm led to a running time that is independent of element order. The deterministic quicksort could always be fed an input that triggers the worst-case running time O(n2), but the worst-case running time of randomized quicksort is determined only by the output of the random number generator.

I once wrote another post about quicksort called “Three Beautiful Quicksorts” where I summarized what Jon Bentley’s had to say about the experimental analysis of quicksort’s running time and how the current quicksort algorithm looks in the industry libraries (such as c standard library, which provides qsort function).

Follow this link to the full review of lecture four.

Lecture 5: Sorting (continued)

Lecture five continues on sorting and looks at what limits the running time of sorting to O(n·lg(n)). It then breaks out of this limitation and shows several linear time sorting algorithms.

Topics explained in lecture five:

  • How fast can we sort?
  • Comparsion sort model.
  • Decision trees.
  • Comparsion sort algorithms based on decision trees.
  • Lower bound for decision-tree sorting.
  • Sorting in linear time.
  • Counting sort.
  • The concept of stable sorting.
  • Radix sort.
  • Correctness of radix sort.
  • Running time analysis of radix sort.

The most interesting topic here was how any comparison sort algorithm can be translated into a decision tree (and vice versa), which limits how fast we can sort.

Follow this link to the full review of lecture five.

Lecture 6: Order Statistics

Lecture six deals with the order statistics problem - how to find the k-th smallest element among n elements. The naive algorithm is to sort the list of n elements and return the k-th element in the sorted list, but this approach makes it run in O(n·lg(n)) time. This lecture shows how a randomized, linear-time algorithm (in expectation) for this problem can be constructed.

Topics explained in lecture six:

  • Order statistics.
  • Naive order statistics algorithm via sorting.
  • Randomized divide and conquer order statistics algorithm.
  • Expected running time analysis of randomized order statistics algorithm.
  • Worst-case linear-time order-statistics.

An interesting point in this lecture is that the worst-case, deterministic, linear-time algorithm for order statistics isn’t being used in practice because it performs poorly compared to the randomized linear-time algorithm.

Follow this link to the full review of lecture six.

Lecture 7: Hashing

This is the first lecture of two on hashing. It introduces hashing and various collision resolution strategies.

All the topics explained in lecture seven:

  • Symbol table problem.
  • Direct-access table.
  • The concept of hashing.
  • Collisions in hashing.
  • Resolving collisions by chaining.
  • Analysis of worst-case and average-case search time of chaining.
  • Hash functions.
  • Division hash method.
  • Multiplication hash method.
  • Resolving collisions by open addressing.
  • Probing strategies.
  • Linear probing.
  • Double hashing.
  • Analysis of open addressing.

Follow this link to the full review of lecture seven.

Lecture 8: Hashing (continued)

The second lecture on hashing. It addresses the weakness of hashing - for any choice of hash function, there exists a bad set of keys that all hash to the same value. An adversary can take an advantage of this and attack our program. Universal hashing solves this problem. The other topic explained in this lecture is perfect hashing - given n keys, how to construct a hash table of size O(n) where search takes O(1) guaranteed.

All the topics in lecture eight:

  • Weakness of hashing.
  • Universal hashing.
  • Construction of universal hash functions.
  • Perfect hashing.
  • Markov inequality.

Follow this link to the full review of lecture eight.

Lecture 9: Search Trees

This lecture primarily discusses randomly built binary search trees. (It assumes you know what binary trees are.) Similar to universal hashing (see previous lecture), they solve a problem when you need to build a tree from untrusted data. It turns out that the expected height of a randomly built binary search tree is still O(lg(n)), more precisely, it’s expected to be 3·lg(n) at most.

Topics explained in lecture nine:

  • What are good and bad binary search trees?
  • Binary search tree sort.
  • Analysis of binary search tree sort.
  • BST sort relation to quicksort.
  • Randomized BST sort.
  • Randomly built binary search trees.
  • Convex functions, Jensen’s inequality.
  • Expected height of a randomly built BST.

The most surprising idea in this lecture is that the binary search tree sort (introduced in this lecture) does the same element comparsions as quicksort, that is, they produce the same decision tree.

Follow this link to the full review of lecture nine.

Lecture 10: Search Trees (continued)

This is the second lecture on search trees. It discusses self-balancing trees, more specifically, red-black trees. They balance themselves in such a manner that no matter what the input is, their height is always O(lg(n)).

Topics explained in lecture ten:

  • Balanced search trees.
  • Red-black trees.
  • Height of red-black trees.
  • Rotations in binary trees.
  • How to insert an element in a red-black tree?
  • Insert-element algorithm for red-black trees.

Follow this link to the full review of lecture ten.

Lecture 11: Augmenting Data Structures

The eleventh lecture explains how to build new data structures out of existing ones. For example, how to build a data structure that you can update and query quickly for the i-th smallest element. This is the problem of dynamic order statistics and an easy solution is to augment a binary tree, such as a red-black tree. Another example is interval trees - how to quickly find an interval (such as 5-9) that overlaps some other intervals (such as 4-11 and 8-20).

Topics explained in lecture eleven:

  • Dynamic order statistics.
  • Data structure augmentation.
  • Interval trees.
  • Augmenting red-black trees to have them perform as interval trees.
  • Correctness of augmented red-black tree data structure.

Augmenting data structures require a lot of creativity. First you need to find an underlying data structure (the easiest step) and then think of a way to augment it with data to make it do what you want (the hardest step).

Follow this link to the full review of lecture eleven.

Lecture 12: Skip Lists

This lecture explains skip lists, which is a simple, efficient, easily implementable, randomized search structure. It performs as well as a balanced binary search tree but is much easier to implement. Eric Demaine says he implemented it in 40 minutes before the class (10 minutes to implement and 30 to debug).

In this lecture Eric builds this data structure from scratch. He starts with a linked list and builds up to a pair of linked lists, to three linked lists, until it finds the optimal number of linked lists needed to achieve logarithmic search time.

Next he continues to explain how to algorithmically build such a structure and proves that the search in this data structure is indeed quick.

Follow this link to the full review of lecture twelve.

Lecture 13: Amortized Analysis

Amortized analysis is a technique to show that even if several operations in a sequence of operations are costly, the overall performance is still good. A good example is adding elements to a dynamic list (such as a list in Python). Every time the list is full, Python has to allocate more space and this is costly. Amortized analysis can be used to show that the average cost per insert is still O(1), even though Python occasionally has to allocate more space for the list.

Topics explained in lecture thirteen:

  • How large should a hash table be?
  • Dynamic tables.
  • Amortized analysis.
  • Accounting method of amortized analysis.
  • Dynamic table analysis with accounting method.
  • Potential method of amortized analysis.
  • Dynamic table analysis with potential method.

This is one of the most mathematically complicated lectures.

Follow this link to the full review of lecture thirteen.

Lecture 14: Self-Organizing Lists and Competitive Analysis

This lecture concentrates on self-orginizing lists. A self-organizing list is a list that reorders itself to improve the average access time. The goal is to find a reordering that minimizes the total access time. For example, each time an element is accessed, it’s moved to the front of the list, hoping that it might be accessed soon again. This is called move-to-front heuristic.

Competitive analysis can be used to theoretically reason how well such a strategy as moving items to front performs.

Topics explained in lecture fourteen:

  • Self-organizing lists.
  • Online and offline algorithms
  • Worst-case analysis of self-organizing lists.
  • Competitive analysis.
  • Move-to-front heuristic for self-organizing lists.
  • Amortized cost of move-to-front heuristic.

Follow this link to the full review of lecture fourteen.

Lecture 15: Dynamic Programming

This lecture is about the dynamic programming algorithm design technique. It’s a tabular method (involving constructing a table or some part of a table) that leads to a much faster running time of the algorithm.

The lecture focuses on the longest common subsequence problem, first showing the brute force algorithm, then a recursive one, and finally a dynamic programming algorithm. The brute force algorithm is exponential in the length of strings, the recursive one is also exponential, but the dynamic programming solution is O(n·m) where n is the length of one string, and m is the length of the other.

Topics explained in lecture fifteen:

  • The idea of dynamic programming.
  • Longest common subsequence problem (LCS).
  • Brute force algorithm for LCS.
  • Analysis of brute-force algorithm.
  • Simplified algorithm for LCS.
  • Dynamic programming hallmark #1: optimal substructure.
  • Dynamic programming hallmark #2: overlapping subproblems.
  • Recursive algorithm for LCS.
  • Memoization.
  • Dynamic programming algorithm for LCS.

The most interesting thing in this lecture is the two hallmarks that indicate that the problem may be solved with dynamic programming. They are “optimal substructure” and “overlapping subproblems”.

The first one means that an optimal solution to a problem contains the optimal solution to subproblems. For example, if z = LCS(x,y) - z is the solution to the problem LCS(x,y) - then any prefix of z is a solution to LCS of a prefix of x and prefix of y (prefix of z is a solution to subproblems).

The second one means exactly what it says, that the problem contains many overlapping subproblems.

Follow this link to the full review of lecture fifteen.

Lecture 16: Greedy Algorithms

This lecture introduced greedy algorithms via the minimum spanning three problem. The minimum spanning tree problem asks to find a tree that connects all the vertices of a graph with minimum edge weight. It seems at first that dynamic programming solution could solve it effectively, but if analyzed more carefully, it can be noticed that the problem exhibits another powerful property — the best solution to each of the subproblems leads to globally optimal solution. Therefore it’s called greedy, it always chooses the best solution for subproblems without ever thinking about the whole problem in general.

Topics explained in lecture sixteen:

  • Review of graphs.
  • Graph representations.
  • Adjacency matrices.
  • Adjacency lists.
  • Sparse and dense graphs.
  • Hand shaking lemma.
  • Minimum spanning trees (MSTs).
  • Hallmark for greedy algorithms: greedy choice property.
  • Prim’s algorithm for finding MST.
  • Running time analysis of Prim’s algorithm.
  • Idea of Kruskal’s algorithm for MSTs.

Follow this link to the full review of lecture sixteen.

Lecture 17: Shortest Path Algorithms

This lecture starts a trilogy on shortest path algorithm. In this first episode single-source shortest path algorithms are discussed. The problem can be described as following — how to get from one point on a graph to another by traveling the shortest distance (think of a road network). The Dijkstra’s algorithm solves this problem effectively.

Topics explained in lecture seventeen:

  • Paths in graphs.
  • Shortest paths.
  • Path weights.
  • Negative path weights.
  • Single-source shortest path.
  • Dijkstra’s algorithm.
  • Example of Dijkstra’s algorithm.
  • Correctness of Dijkstra’s algorithm.
  • Unweighted graphs.
  • Breadth First Search.

The most interesting thing here is that the Dijkstra’s algorithm for unweighted graphs reduces to breadth first search algorithm which uses a FIFO instead of a priority queue because there is no longer a need to keep track of the shortest distance (all the paths have the same weight).

Follow this link to the full review of lecture seventeen.

Lecture 18: Shortest Path Algorithms (continued)

The second lecture in trilogy on shortest paths deals with single-source shortest paths that may have negative edge weights. Bellman-Ford algorithm solves the shortest path problem for graphs with negative edges.

Topics explained in lecture eighteen:

  • Bellman-Ford algorithm for shortest paths with negative edges.
  • Negative weight cycles.
  • Correctness of Bellman-Ford algorithm.
  • Linear programming.
  • Linear feasibility problem.
  • Difference constraints.
  • Constraint graph.
  • Using Bellman-Ford algorithm to solve a system of difference constraints.
  • Solving VLSI (very large scale integration) layout problem via Bellman-Ford.

Follow this link to the full review of lecture eighteen.

Lecture 19: Shortest Path Algorithms (continued)

The last lecture in trilogy deals with all-pairs shortest paths problem — determine of the shortest distances between every pair of vertices in a given graph.

Topics explained in lecture nineteen:

  • Review of single source shortest path problem.
  • All-pairs shortest paths.
  • Dynamic programming.
  • Idea from matrix multiplication.
  • Floyd-Warshall algorithm for all-pairs shortest paths.
  • Transitive closure of directed graph.
  • Johnson’s algorithm for all-pairs shortest paths.

An interesting point here is how the Floyd-Warshall algorithm that runs in O((number of vertices)3) can be transformed into something similar to Strassen’s algorithm to compute the transitive closure of a graph (now it runs in O((number of vertices)lg7).

Follow this link to the full review of lecture nineteen.

Lecture 20: Parallel Algorithms

This is an introductory lecture to multithreaded algorithm analysis. It explains the terminology used in multithreaded algorithms, such as, work, critical path length, speedup, parallelism, scheduling, and others.

Topics explained in lecture twenty:

  • Dynamic multithreading.
  • Subroutines: spawn and sync.
  • Logical parallelism and actual parallelism.
  • Multithreaded computation.
  • An example of a multithreaded execution on a recursive Fibonacci algorithm.
  • Measuring performance of a multithreaded computation.
  • The concept of speedup.
  • Maximum possible speedup.
  • Linear speedup.
  • Super-linear speedup.
  • Parallelism.
  • Scheduling.
  • Greedy scheduler.
  • Grand and Brent theorem of competitiveness of greedy schedules.
  • *Socrates and Cilkchess chess programs.

Follow this link to the full review of lecture twenty.

Lecture 21: Parallel Algorithms (continued)

The second lecture on parallel algorithms shows how to design and analyze multithreaded matrix multiplication algorithm and multithreaded sorting.

Topics explained in lecture twenty-one:

  • Multithreaded algorithms.
  • Multithreaded matrix multiplication.
  • Performance analysis of the multithreaded matrix multiplication algorithm.
  • Multithreaded sorting.
  • Multithreaded merge-sort algorithm.
  • Parallel-merge subroutine.
  • Analysis of merge-sort with parallel-merge subroutine.

Follow this link to the full review of lecture twenty-one.

Lecture 22: Cache Oblivious Algorithms

Cache-oblivious algorithms take into account something that has been ignored in all the algorithms so far, particularly, the cache. An algorithm that can be transformed into using cache effectively will perform much better than a one that doesn’t. This lecture is all about how to lay out data structures in memory in such a way that memory transfers are minimized.

Topics explained in lecture twenty-two:

  • Modern memory hierarchy.
  • The concept of spatial locality and temporal locality.
  • Two-level memory model.
  • Cache-oblivious algorithms.
  • Blocking of memory.
  • Memory transfers in a simple scanning algorithm.
  • Memory transfers in string-reverse algorithm.
  • Memory analysis of binary search.
  • Cache oblivious order statistics.
  • Cache oblivious matrix multiplication algorithm.

Follow this link to the full review of lecture twenty-two.

Lecture 23: Cache Oblivious Algorithms (continued)

This is the final lecture of the course. It continues on cache oblivious algorithms and shows how to store binary search trees in memory so that memory transfers are minimized when searching in them. It wraps up with cache oblivious sorting.

Topics explained in lecture twenty-three:

  • Static search trees.
  • Memory efficient layout of static binary search trees in memory.
  • Analysis of static search trees.
  • Cache aware sorting.
  • Cache-oblivious sorting.
  • Funnel sort.
  • K-funnel data structure.

This is the most complicated lecture in the whole course. It takes a day to understand the k-funnel data structure.

Follow this link to the full review of lecture twenty-three.

That’s it. This was the final lecture. I hope you find this summary useful.

Upcoming on my blog — review of MIT’s Linear Algebra course.

At first I thought I’d post Linear Algebra to a separate blog section that does not appear in the RSS feed but then I gave it another thought and came to a conclusion that every competent programmer must know the linear algebra and therefore it’s worth putting them in the feed.

You can surely be a good programmer without knowing linear algebra, but if you want to work on great problems and make a difference, then you absolutely have to know it.

Stay tuned!

Comments (22) Comments | Email Post Email 'Summary of all the MIT Introduction to Algorithms lectures' to a friend | Print Post Print 'Summary of all the MIT Introduction to Algorithms lectures' | Permalink Permalink to 'Summary of all the MIT Introduction to Algorithms lectures' | Trackback Trackback to 'Summary of all the MIT Introduction to Algorithms lectures'
(Popularity: 20%) 48,722 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Introduction to Algorithms 13 Jul 2009 06:00 am
1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 4.56 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is a happy and sad moment at the same time - I have finally reached the last two lectures of MIT’s undergraduate algorithms course. These last two lectures are on a fairly new area of algorithm research called “cache oblivious algorithms.”

Cache-oblivious algorithms take into account something that has been ignored in all the lectures so far, particularly, the multilevel memory hierarchy of modern computers. Retrieving items from various levels of memory and cache make up a dominant factor of running time, so for speed it is crucial to minimize these costs. The main idea of cache-oblivious algorithms is to achieve optimal use of caches on all levels of a memory hierarchy without knowledge of their size.

Cache-oblivious algorithms should not be confused with cache-aware algorithms. Cache-aware algorithms and data structures explicitly depend on various hardware configuration parameters, such as the cache size. Cache-oblivious algorithms do not depend on any hardware parameters. An example of cache-aware (not cache-oblivious) data structure is a B-Tree that has the explicit parameter B, the size of a node. The main disadvantage of cache-aware algorithms is that they are based on the knowledge of the memory structure and size, which makes it difficult to move implementations from one architecture to another. Another problem is that it is very difficult, if not impossible, to adapt some of these algorithms to work with multiple levels in the memory hierarchy. Cache-oblivious algorithms solve both problems.

Lecture twenty-two introduces the terminology and notation used in cache-oblivious algorithms, explains the difference between cache-oblivious and cache-aware algorithms, does a simple memory analysis of several simple algorithms and culminates with a cache-oblivious algorithm for matrix multiplication.

The final lecture twenty-three is the most difficult in the whole course and shows cache-oblivious binary search trees and cache-oblivious sorting called funnel sort.

Use this supplementary reading material by professor Demaine to understand the material better: Cache-oblivious algorithms and data structures (.pdf).

Lecture 22: Cache Oblivious Algorithms I

Lecture twenty-two starts with an introduction to the modern memory hierarchy (CPU cache L1, L2, L3, main memory, disk cache, etc.) and with the notation and core concepts used in cache-oblivious algorithms.

A powerful result in cache-oblivious algorithm design is that if an algorithm is efficient on two levels of cache, then it’s efficient on any number of levels. Thus the study of cache-obliviousness can be simplified to two-level memory hierarchy, say the CPU cache and main memory, where the accesses to cache are instant but are orders of magnitude slower to main memory. Therefore the main question cache-oblivious algorithm analysis tries to address is how many memory transfers (MTs) does a problem of size N take. The notation used for this is MT(N). For an algorithm to be efficient, the number of memory transfers should be as small as possible.

Next the lecture analysis the number of memory transfers for basic array scanning and array reverse algorithms. Since array scanning is consequential, N elements can be processed with O(N/B) accesses, where B is the block size - number of elements that are automatically fetched as N-th element is accessed. That is MT(N) = O(N/B) for array scanning. The same bound holds for reversing an array, since it can be viewed two scans - one from the beginning and one from the end.

Next it’s shown that the classical binary search (covered in lecture 3) is not cache efficient, but order statistics problem (covered in lecture 6) is cache efficient.

Finally the lecture describes a cache efficient way to multiply matrices by storing them block-wise in memory.

You’re welcome to watch lecture twenty-two:

Topics covered in lecture twenty-two:

  • [00:10] Introduction and history of cache-oblivious algorithms.
  • [02:00] Modern memory hierarchy in computers: Caches L1, L2, L3, main memory, disk cache.
  • [06:15] Formula for calculating the cost to access a block of memory.
  • [08:18] Amortized cost to access one element in memory.
  • [11:00] Spatial and temporal locality of algorithms.
  • [13:45] Two-level memory model.
  • [16:30] Notation: total cache size M, block size B, number of blocks M/B.
  • [20:40] Notation: MT(N) - number of memory transfers of a problem of size N.
  • [21:45] Cache-aware algorithms.
  • [22:50] Cache-oblivious algorithms.
  • [28:35] Blocking of memory.
  • [32:45] Cache-oblivious scanning algorithm (visitor pattern).
  • [36:20] Cache-oblivious Array-Reverse algorithm.
  • [39:05] Memory transfers in classical binary search algorithm.
  • [43:45] Divide and conquer algorithms.
  • [45:50] Analysis of memory transfers in order statistics algorithm.
  • [01:00:50] Analysis of classical matrix multiplication (with row major, column major memory layout).
  • [01:07:30] Cache oblivious matrix multiplication.

Lecture twenty-two notes:

MIT Algorithms Lecture 22 Notes Thumbnail. Page 1 of 2.
Lecture 22, page 1 of 2: modern memory hierarchy, spacial and temporal locality, two-level memory model, blocking of memory, basic algorithms: parallel scan.
MIT Algorithms Lecture 22 Notes Thumbnail. Page 2 of 2.
Lecture 22, page 2 of 2: basic algorithms: binary search, divide and conquer algorithms, order statistics, matrix multiplication, block algorithms.

Lecture 23: Cache Oblivious Algorithms II

This was probably the most complicated lecture in the whole course. The whole lecture is devoted to two subjects - cache-oblivious search trees and cache-oblivious sorting.

While it’s relatively easy to understand the design of cache-oblivious way of storing search trees in memory, it’s amazingly difficult to understand the cache-efficient sorting. It’s called funnel sort which is basically an n-way merge sort (covered in lecture 1) with special cache-oblivious merging function called k-funnel.

You’re welcome to watch lecture twenty-three:

Topics covered in lecture twenty-three:

  • [01:00] Cache-oblivious static search trees (binary search trees).
  • [09:35] Analysis of static search trees.
  • [18:15] Cache-aware sorting.
  • [19:00] Sorting by repeated insertion in binary tree.
  • [21:40] Sorting by binary merge sort.
  • [31:20] Sorting by N-way mergesort.
  • [36:20] Sorting bound for cache-oblivious sorting algorithms.
  • [38:30] Cache-oblivious sorting.
  • [41:40] Definition of K-Funnel (cache-oblivious merging).
  • [43:35] Funnel sort.
  • [54:05] Construction of K-Funnel.
  • [01:03:10] How to fill buffer in k-funnel.
  • [01:07:30] Analysis of fill buffer.

Lecture twenty-three notes:

MIT Algorithms Lecture 23 Notes Thumbnail. Page 1 of 2.
Lecture 23, page 1 of 2: static search trees, cache aware sorting.
MIT Algorithms Lecture 23 Notes Thumbnail. Page 2 of 2.
Lecture 23, page 2 of 2: cache oblivious sorting, k-funnels, funnel sort, fill-buffer algorithm and analysis.


Have fun with the cache oblivious algorithms! I’ll do a few more posts that will summarize all these lectures and highlight key ideas.

If you loved this, please subscribe to my blog!

Comments (13) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms'
(Popularity: 25%) 31,267 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Introduction to Algorithms 07 Mar 2009 02:00 pm
1 Star2 Stars3 Stars4 Stars5 Stars (13 votes, average: 4.38 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the thirteenth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lectures twenty and twenty-one on parallel algorithms. These lectures cover the basics of multithreaded programming and multithreaded algorithms.

Lecture twenty begins with a good overview of multithreaded programming paradigm, introduces to various concepts of parallel programming and at the end talks about Cilk programming language.

Lecture twenty-one implements several multithreaded algorithms, such as nxn matrix addition, nxn matrix multiplication, and parallel merge-sort. It then goes into great detail to analyze the running time and parallelism of these algorithms.

Lecture 20: Parallel Algorithms I

The lecture starts with an introduction to a particular model of parallel computation called dynamic multithreading. It’s easiest to understand this model with an example. Here is a multithreaded algorithm that computes Fibonacci numbers:

Fibonacci(n):
1   if n < 2:                     | thread A
2     return n                    | thread A
3   x = spawn Fibonacci(n-1)      | thread A
4   y = spawn Fibonacci(n-2)      | thread B
5   sync                          | thread C
6   return x + y                  | thread C

# this is a really bad algorithm, because it’s
# exponential time.
#
# see lecture three for four different
# classical algorithms that compute Fibonacci numbers

(A thread is defined to be a maximal sequence of instructions not containing the parallel control statements spawn, sync, and return, not something like Java threads or Posix threads.)

I put the most important keywords in bold. They are “spawn” and “sync“.

The keyword spawn is the parallel analog of an ordinary subroutine call. Spawn before the subroutine call in line 3 indicates that the subprocedure Fibonacci(n-1) can execute in parallel with the procedure Fibonacci(n) itself. Unlike an ordinary function call, where the parent is not resumed until after its child returns, in the case of a spawn, the parent can continue to execute in parallel with the child. In this case, the parent goes on to spawn Fibonacci(n-2). In general, the parent can continue to spawn off children, producing a high degree of parallelism.

A procedure cannot safely use the return values of the children it has spawned until it executes a sync statement. If any of its children have not completed when it executes a sync, the procedure suspends and does not resume until all of its children have completed. When all of its children return, execution of the procedure resumes at the point immediately following the sync statement. In the Fibonacci example, the sync statement in line 5 is required before the return statement in line 6 to avoid the anomaly that would occur if x and y were summed before each had been computed.

The spawn and sync keywords specify logical parallelism, not “actual” parallelism. That is, these keywords indicate which code may possibly execute in parallel, but what actually runs in parallel is determined by a scheduler, which maps the dynamically unfolding computation onto the available processors.

The lecture then continues with a discussion on how a parallel instruction stream can be viewed as a directed acyclic graph G = (V, E), where threads make up the set V of vertices, and each spawn and return statement makes up an edge in E. There are also continuation edges in E that exist between statements.

In Fibonacci example there are three threads: lines 1, 2, 3 make up the thread A, line 4 is thread B and lines 5, 6 are thread C.

Here is how a graph of Fibonacci(4) computation looks like:

Multithreading Directed Acyclic Graph

A dag representing multithreaded computation of Fibonacci(4).

Here the threads are show as circles, and each group of threads belonging to the same procedure are surrounded by a rounded rectangle. Downward edges are spawn dependencies, horizontal edges represent continuation dependencies within a procedure, and upward edges are return dependencies. Every computation starts with a single initial thread, and ends with a single final thread.

So to compute Fib(4), first Fib(3) and Fib(2) have to be computed. Fib(4) spawns one thread to compute Fib(3) and another thread to compute Fib(2) and sync’es. In order to compute Fib(3), values of Fib(2) and Fib(1) have to be known, and to compute Fib(2) values of Fib(1) and Fib(0) have to be known. Fib(3) and Fib(2) spawn their threads and sync. This way the computation unwinds.

Next, the lecture introduces some concepts for measuring performance of multithreaded algorithms.

First some time measures:

  • Tp - running time of an algorithm on p processors,
  • T1 - running time of algorithm on 1 processor, and
  • T as critical path length, that is the longest time to execute the algorithm on infinite number of processors.

Based on these measures, the speedup of a computation on p processors is T1/Tp, that is how much faster a p processor machine executes the algorithm than a one processor machine. If the speedup is O(p), then we say that the computation has a linear speedup. If, however, speedup > p, then we call it a super-linear speedup. The maximum possible speedup is called parallelism and is computed by T1/T.

After these concepts the lecture puts forward the concept of scheduling and greedy scheduler.

A scheduler maps computation to p processors.

The greedy scheduler schedules as much as it can at every time step. On a p-processor machine every step can be classified into two types. If there are p or more threads ready to execute, the scheduler executes any p of them (this step is also called a complete step). If there are fewer than p threads, it executes all of them (called incomplete step). The greedy scheduler is an offline scheduler and it’s 2-competitive (see lecture 13 on online/offline algorithms and the meaning of competitive algorithm).

The lecture ends with an introduction to Cilk parallel, multithreaded programming language. It uses a randomized online scheduler. It was used to write two chess programs *Socrates and Cilkchess.

See Charles Leiseron’s handout for a much more comprehensive overview of dynamic multithreading: A Minicourse on Dynamic Multithreaded Algorithms (pdf).

You’re welcome to watch lecture twenty:

Topics covered in lecture twenty:

  • [01:33] Parallel algorithms.
  • [02:33] Dynamic multi-threading.
  • [03:15] A multithreaded algorithm for fibonnaci numbers.
  • [04:50] Explanation of spawn and sync.
  • [06:45] Logical parallelism.
  • [08:00] Multithreaded computation.
  • [12:15] Fibonacci(4) computation graph, and its concepts - init thread, spawn edge, continuation edge, return edge, final thread.
  • [17:45] Edges: spawn, return, continuation.
  • [19:40] Performance measures: running time on p processors Tp, work (serial time) T1, critcial path length (longest path) Tinfinity.
  • [24:15] Lower bounds on Tp (running time on p processors).
  • [29:35] Speedup, linear speedup, superlinear speedup.
  • [32:55] Maximum possible speedup.
  • [36:40] Scheduling.
  • [38:55] Greedy scheduler.
  • [43:40] Theorem by Grant and Brent (bound on Tp for greedy scheduler).
  • [46:20] Proof of Grant, Brent theorem.
  • [56:50] Corollary on greedy scheduler’s linear speedup.
  • [01:02:20] Cilk programming language.
  • [01:06:00] Chess programs: *Socrates, Clikchess.
  • [01:06:30] Analysis of *Socrates running time on 512 processors.

Lecture twenty notes:

MIT Algorithms Lecture 20 Notes Thumbnail. Page 1 of 2.
Lecture 20, page 1 of 2: dynamic multithreading, multithreaded computation, performance measures, speedup, scheduling, greedy scheduler.
MIT Algorithms Lecture 20 Notes Thumbnail. Page 2 of 2.
Lecture 20, page 2 of 2: grant, brent theorem, cilk, socrates, cilkchess.

Lecture 21: Parallel Algorithms II

Lecture twenty-one is all about real multithreaded algorithms and their analysis.

It starts with a hugly important problem of multithreaded matrix multiplication: given two nxn matrices A and B, write a multithreaded algorithm Matrix-Multiply that multiplies them, producing matrix C. The natural approach is to use divide and conquer method from lecture three, and formulate the problem as follows:

Multithreaded Matrix Multiplication

Each nxn matrix can be expressed as 8 multiplications and 4 additions of (n/2)x(n/2) submatrices. So the first thing we need is Matrix-Add algorithm that adds two matrices in parallel.

Here is the parallel Matrix-Add algorithm:

Matrix-Add(C, T, n):
   // Adds matrices C and T in-place, producing C = C + T
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = C[1, 1] + T[1, 1]
   else:
     partition C and T into (n/2)x(n/2) submatrices
     spawn Matrix-Add(C11, T11, n/2)
     spawn Matrix-Add(C12, T12, n/2)
     spawn Matrix-Add(C21, T21, n/2)
     spawn Matrix-Add(C22, T22, n/2)
     sync

The base case for this algorithm is when matrices are just scalars, then it just adds the only element of both matrices. In all other cases, it partitions matrices C and T into (n/2)x(n/2) matrices and spawns four subproblems.

Now we can implement the Matrix-Multiply algorithm, by doing 8 multiplications in parallel and then calling Matrix-Add to do 4 additions.

Here is the parallel Matrix-Multiply algorithm:

Matrix-Multiply(C, A, B, n):
   // Multiplies matrices A and B, storing the result in C.
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = A[1, 1] · B[1, 1]
   else:
     allocate a temporary matrix T[1...n, 1...n]
     partition A, B, C, and T into (n/2)x(n/2) submatrices
     spawn Matrix-Multiply(C12,A11,B12, n/2)
     spawn Matrix-Multiply(C21,A21,B11, n/2)
     spawn Matrix-Multiply(C22,A21,B12, n/2)
     spawn Matrix-Multiply(T11,A12,B21, n/2)
     spawn Matrix-Multiply(T12,A12,B22, n/2)
     spawn Matrix-Multiply(T21,A22,B21, n/2)
     spawn Matrix-Multiply(T22,A22,B22, n/2)
     sync
     Matrix-Add(C, T, n)

The lecture then proceeds to calculating how good the algorithm is, turns out the parallelism (ratio of time of algorithm running on a single processor T1 to running on infinite number of processors Tinf) for doing 1000×1000 matrix multiplication is 107, which the parallel algorithm is fast as hell.

Matrix-Multiply used a temporary matrix which was actually unnecessary. To achieve higher performance, it’s often advantageous for an algorithm to use less space, because more space means more time. Therefore a faster algorithm can be added that multiplies matrices and adds them in-place. See lecture at 33:05 for Matrix-Multiply-Add algorithm. The parallelism of this algorithm for 1000×1000 matrix multiplication is 106, smaller than the previous one, but due to fact that no temporary matrices had to be used, it beats Matrix-Multiply algorithm in practice.

The lecture then proceeds to parallel sorting. The classical sorting algorithms were covered in lectures four and five. This lecture parallelizes the Merge-Sort algorithm.

Here is the parallel merge-sort algorithm:

Parallel-Merge-Sort(A, p, r): // sort A[p...r]
  if p < r:
    q = floor((p+r)/2)
    spawn Parallel-Merge-Sort(A, p, q)
    spawn Parallel-Merge-Sort(A, q+1, r)
    sync
    Merge(A, p, q, r) // merge A[p...q] with A[q+1...r]

Instead of recursively calling Parallel-Merge-Sort for subproblems, we spawn them, thus utilizing the power of parallelism. After doing analysis turns out that due to fact that we use the classical Merge() function, the parallelism is just lg(n). That’s puny!

To solve this problem, a Parallel-Merge algorithm has to be written. The lecture continues with presenting a Parallel-Merge algorithm, and after that does the analysis of the same Parallel-Merge-Sort algorithm but with Parallel-Merge function. Turns out this algorithm has parallelism of n/lg2(n), that is much better. See lecture at 51:00 for Parallel-Merge algorithm.

The best parallel sorting algorithm know to date has the parallelism of n/lg(n).

You’re welcome to watch lecture twenty-one:

Topics covered in lecture twenty-one:

  • [00:35] Multithreaded algorithms.
  • [01:32] Multithreaded matrix multiplication.
  • [02:05] Parallelizing with divide and conquer.
  • [04:30] Algorithm Matrix-Multiply, that computes C = A*B.
  • [10:13] Algorithm Matrix-Add, that computes C = C+T.
  • [12:45] Analysis of running time of Matrix-Multiply and Matrix-Add.
  • [19:40] Analysis of critical path length of Matrix-Multiply and Matrix-Add.
  • [26:10] Parallelism of Matrix-Multiply and Matrix-Add.
  • [33:05] Algorithm Matrix-Multiply-Add, that computes C = C + A*B.
  • [35:50] Analysis of Matrix-Multiply-Add.
  • [42:30] Multithreaded, parallel sorting.
  • [43:30] Parallelized Merge-Sort algorithm.
  • [45:55] Analysis of multithreaded Merge-Sort.
  • [51:00] Parallel-Merge algorithm.
  • [01:00:30] Analysis of Parallel-Merge algorithm.

Lecture twenty-one notes:

MIT Algorithms Lecture 21 Notes Thumbnail. Page 1 of 2.
Lecture 21, page 1 of 2: multithreaded algorithms, matrix multiplication, analysis (critical path length, parallelism).
MIT Algorithms Lecture 21 Notes Thumbnail. Page 2 of 2.
Lecture 21, page 2 of 2: paralel sorting, merge sort, parallel merge, analysis.

Have fun with parallel, multithreaded algorithms! The next post is going to be on cache oblivious algorithms! They are a class of algorithms that exploit all the various caches in modern computer architecture to make them run notably faster.

Comments (11) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms'
(Popularity: 41%) 36,733 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Introduction to Algorithms 27 Jan 2009 05:00 am
1 Star2 Stars3 Stars4 Stars5 Stars (14 votes, average: 3.71 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the twelfth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review a trilogy of lectures on graph and shortest path algorithms. They are lectures seventeen, eighteen and nineteen. They’ll cover Dijkstra’s Algorithm, Breadth-First Search Algorithm and Bellman-Ford Algorithm for finding single-source shortest paths as well as Floyd-Warshall Algorithm and Johnson’s Algorithm for finding all-pairs shortest paths.

These algorithms require a thorough understanding of graphs. See the previous lecture for a good review of graphs.

Lecture seventeen focuses on the single-source shortest-paths problem: Given a graph G = (V, E), we want to find a shortest path from a given source vertex s ∈ V to each vertex v ∈ V. In this lecture the weights of edges are restricted to be positive which leads it to Dijkstra’s algorithm and in a special case when all edges have unit weight to Breadth-first search algorithm.

Lecture eighteen also focuses on the same single-source shortest-paths problem, but allows edges to be negative. In this case a negative-weight cycles may exist and Dijkstra’s algorithm would no longer work and would produce incorrect results. Bellman-Ford algorithm therefore is introduced that runs slower than Dijkstra’s but detects negative cycles. As a corollary it is shown that Bellman-Ford solves Linear Programming problems with constraints in form xj - xi <= wij.

Lecture nineteen focuses on the all-pairs shortest-paths problem: Find a shortest path from u to v for every pair of vertices u and v. Although this problem can be solved by running a single-source algorithm once from each vertex, it can be solved faster with Floyd-Warshall algorithm or Johnson’s algorithm.

Lecture 17: Shortest Paths I: Single-Source Shortest Paths and Dijkstra’s Algorithm

Lecture seventeen starts with a small review of paths and shortest paths.

It reminds that given a graph G = (V, E, w), where V is a set of vertices, E is a set of edges and w is weight function that maps edges to real-valued weights, a path p from a vertex u to a vertex v in this graph is a sequence of vertices (v0, v1, …, vk) such that u = v0, v = vk and (vi-1, vi) ∈ E. The weight w(p) of this path is a sum of weights over all edges = w(v0, v1) + w(v1, v2) + … + w(vk-1, vk). It also reminds that a shortest path from u to v is the path with minimum weight of all paths from u to v, and that a shortest path in a graph might not exist if it contains a negative weight cycle.

The lecture then notes that shortest paths exhibit the optimal substructure property - a subpath of a shortest path is also a shortest path. The proof of this property is given by cut and paste argument. If you remember from previous two lectures on dynamic programming and greedy algorithms, an optimal substructure property suggests that these two techniques could be applied to solve the problem efficiently. Indeed, applying the greedy idea, Dijkstra’s algorithm emerges.

Here is a somewhat precise definition of single-source shortest paths problem with non-negative edge weights: Given a graph G = (V, E), and a starting vertex s ∈ V, find shortest-path weights for all vertices v ∈ V.

Here is the greedy idea of Dijkstra’s algorithm:

  • 1. Maintain a set S of vertices whose shortest-path from s are known (s ∈ S initially).
  • 2. At each step add vertex v from the set V-S to the set S. Choose v that has minimal distance from s (be greedy).
  • 3. Update the distance estimates of vertices adjacent to v.

I have also posted a video interview with Edsger Dijkstra - Edsger Dijkstra: Discipline in Thought, please take a look if you want to see how Dijkstra looked like. :)

The lecture continues with an example of running Dijkstra’s algorithm on a non-trivial graph. It also introduces to a concept of a shortest path tree - a tree that is formed by edges that were last relaxed in each iteration (hard to explain in English, see lecture at 43:40).

The other half of lecture is devoted to three correctness arguments of Dijkstra’s algorithm. The first one proves that relaxation never makes a mistake. The second proves that relaxation always makes the right greedy choice. And the third proves that when algorithm terminates the results are correct.

At the final minutes of lecture, running time of Dijkstra’s algorithm is analyzed. Turns out that the running time depends on what data structure is used for maintaining the priority queue of the set V-S (step 2). If we use an array, the running time is O(V2), if we use binary heap, it’s O(E·lg(V)) and if we use Fibonacci heap, it’s O(E + V·lg(V)).

Finally a special case of weighted graphs is considered when all weights are unit weights. In this case a single-source shortest-paths problem can be solved by a the Breadth-first search (BFS) algorithm that is actually a simpler version of Dijkstra’s algorithm with priority queue replaced by a FIFO! The running time of BFS is O(V+E).

You’re welcome to watch lecture seventeen:

Topics covered in lecture seventeen:

  • [01:40] Review of paths.
  • [03:15] Edge weight functions.
  • [03:30] Example of a path, its edge weights, and weight of the path.
  • [04:22] Review of shortest-paths.
  • [05:15] Shortest-path weight.
  • [06:30] Negative edge weights.
  • [10:55] Optimal substructure of a shortest path.
  • [11:50] Proof of optimal substructure property: cut and paste.
  • [14:23] Triangle inequality.
  • [15:15] Geometric proof of triangle inequality.
  • [16:30] Single-source shortest paths problem.
  • [18:32] Restricted single-source shortest paths problem: all edge weights positive or zero.
  • [19:35] Greedy idea for ss shortest paths.
  • [26:40] Dijkstra’s algorithm.
  • [35:30] Example of Dijkstra’s algorithm.
  • [43:40] Shortest path trees.
  • [45:12] Correctness of Dijkstra’s algorithm: why relaxation never makes mistake.
  • [53:55] Correctness of Dijkstra’s algorithm: why relaxation makes progress.
  • [01:01:00] Correctness of Dijkstra’s algorithm: why it gives correct answer when it terminates.
  • [01:15:40] Running time of Dijkstra’s algorithm.
  • [01:18:40] Running time depending on using array O(V^2), binary heap O(E·lg(V)) and Fibonacci heap O(E + V·lg(V)) for priority queue.
  • [01:20:00] Unweighted graphs.
  • [01:20:40] Breadth-First Search (BFS) algorithm.
  • [01:23:23] Running time of BFS: O(V+E).

Lecture seventeen notes:

MIT Algorithms Lecture 17 Notes Thumbnail. Page 1 of 2.
Lecture 17, page 1 of 2: paths, shortest paths, negative edge weights, optimal substructure, triangle inequality, single-source shortest paths, dijkstra’s algorithm.
MIT Algorithms Lecture 17 Notes Thumbnail. Page 2 of 2.
Lecture 17, page 2 of 2: example of dijkstra’s algorithm, correctness of dijkstra’s algorithm, running time of dijkstra, unweighted graphs, breadth first search (bfs).

Lecture 18: Shortest Paths II: Bellman-Ford Algorithm

Lecture eighteen begins with recalling that if a graph contains a negative weight cycle, then a shortest path may not exist and gives a geometric illustration of this fact.

Right after this fact, it jumps to Bellman-Ford algorithm. The Bellman-Ford algorithm solves the single-source shortest-paths problem in the general case in which edge weights may be negative. Given a weighted, directed graph G = (V, E) with source s and weight function w: E → R, the Bellman-Ford algorithm produces the shortest paths from s and their weights, if there is no negative weight cycle, and it produces no answer if there is a negative weight cycle.

The algorithm uses relaxation, progressively decreasing an estimate on the weight of a shortest path from the source s to each vertex v ∈ V until it achieves the actual shortest-path weight.

The running time of Bellman-Ford algorithm is O(VE). The lecture also gives a correctness proof of Bellman-Ford algorithm.

The other half of the lecture is devoted to a problem that can be effectively solved by Bellman-Ford. It’s called the linear feasibility problem that it is a special case of linear programming (LP) problem, where there is no objective but the constraints are in form xi <= wij. It is noted that Bellman-Ford is actually a simple case of LP problem.

The lecture ends with an application of Bellman-Ford to solving a special case of VLSI layout problem in 1 dimension.

You’re welcome to watch lecture eighteen:

Topics covered in lecture eighteen:

  • [00:20] A long, long time ago… in a galaxy far, far away…
  • [00:40] Quick review of previous lecture - Dijkstra’s algorithm, non-negative edge weights.
  • [01:40] Description of Bellman-Ford algorithm.
  • [04:30] Bellman-Ford algorithm.
  • [08:50] Running time of Bellman-Ford O(VE).
  • [10:05] Example of Bellmen-Ford algorithm.
  • [18:40] Correctness of Bellman-Ford algorithm:
  • [36:30] Linear programming (LP).
  • [42:48] Efficient algorithms for solving LPs: simplex algorithm (exponential in worst case, but practical), ellipsoid algorithm (polynomial time, impractical), interior point methods (polynomial), random sampling (brand new, discovered at MIT).
  • [45:58] Linear feasibility problem - LP with no objective.
  • [47:30] Difference constraints - constraints in form xj - xi <= wij.
  • [49:50] Example of difference constraints.
  • [51:04] Constraint graph.
  • [54:05] Theorem: Negative weight cycle in constraint means difference constraints are infeasible/unsatisfiable.
  • [54:50] Proof.
  • [59:15] Theorem: If no negative weight cycle then satisfiable.
  • [01:00:23] Proof.
  • [01:08:20] Corollary: Bellman-Ford solves a system of of m difference constraints on n variables in O(mn) time.
  • [01:12:30] VLSI Layout problem solved by Bellman-Ford.

Lecture eighteen notes:

MIT Algorithms Lecture 18 Notes Thumbnail. Page 1 of 2.
Lecture 18, page 1 of 2: bellman-ford algorithm, correctness of bellman-ford, linear programming.
MIT Algorithms Lecture 18 Notes Thumbnail. Page 2 of 2.
Lecture 18, page 2 of 2: feasibility problem, bellman-ford and difference constraints, vlsi layouts.

Lecture 19: Shortest Paths III: All-Pairs Shortest Paths and Floyd-Warshall Algorithm

Lecture nineteen starts with a quick review of lectures seventeen and eighteen. It reminds the running times of various single-source shortest path algorithms, and mentions that in case of a directed acyclic graphs (which was not covered in previous lectures), you can run topological sort and 1 round of Bellman-Ford that makes it find single-source shortest paths in linear time (for graphs) in O(V+E).

The lecture continues all-pairs shortest paths problem, where we want to know the shortest path between every pair of vertices.

A naive approach to this problem is run single-source shortest path from each vertex. For example, on an unweighted graph we’d run BFS algorithm |V| times that would give O(VE) running time. On a non-negative edge weight graph it would be |V| times Dijkstra’s algorithm, giving O(VE + V2lg(V)) time. And in general case we’d run Bellman-Ford |V| times that would make the algorithm run in O(V2E) time.

The lecture continues with a precise definition of all-pairs shortest paths problem: given a directed graph, find an NxN matrix (N = |V|), where each entry aij is the shortest path from vertex i to vertex j.

In general case, if the graph has negative edges and it’s dense, the best we can do so far is run Bellman-Ford |V| times. Recalling that E = O(V2) in a dense graph, the running time is O(V2E) = O(V4) - hypercubed in number of vertices = slow.

Lecture then proceeds with a dynamic programming algorithm without knowing if it will be faster or not. It’s too complicated to explain here, and I recommend watching lecture at 11:54 to understand it. Turns out this dynamic programming algorithm does not give a performance boost and is still O(V4), but it gives some wicked ideas.

The most wicked idea is to connect matrix multiplication with the dynamic programming recurrence and using repeated squaring to beat O(V4). This craziness gives O(V3lgV) time that is an improvement. Please see 23:40 in the lecture for full explanation.

After all this the lecture arrives at Floyd-Warshall algorithm that finds all-pairs shortest paths in O(V3). The algorithm is derived from a recurrence that the shortest path from vertex i to vertex j is minimum of { shortest path from i to j directly or shortest path from i to k and shortest path from k to j }.

Finally the lecture explains Johnson’s algorithm that runs in O(VE + V2log(V)) time for sparse graphs. The key idea in this algorithm is to reweigh all edges so that they are all positive, then run Dijkstra from each vertex and finally undo the reweighing.

It turns out, however, that to find the function for reweighing all edges, a set of difference constraints need to be satisfied. It makes us first run Bellman-Ford to solve these constraints.

Reweighing takes O(EV) time, running Dijkstra on each vertex takes O(VE + V2lgV) and undoing reweighing takes O(V2) time. Of these terms O(VE + V2lgV) dominates and defines algorithm’s running time (for dense it’s still O(V3).

You’re welcome to watch lecture nineteen:

Topics covered in lecture nineteen:

  • [01:00] Review of single-source shortest path algorithms.
  • [04:45] All-pairs shortest paths by running single-source shortest path algorithms from each vertex.
  • [05:35] Unweighted edges: |V|xBFS = O(VE).
  • [06:35] Nonnegative edge weights: |V|xDijkstra = O(VE + V2lg(V)).
  • [07:40] General case: |V|xBellman-Ford = O(V2E).
  • [09:10] Formal definition of all-pairs shortest paths problem.
  • [11:08] |V|xBellman-Ford for dense graphs (E = O(V2)) is O(V4).
  • [11:54] Trying to beat O(V4) with dynamic programming.
  • [19:30] Dynamic programming algorithm, still O(V4).
  • [23:40] A better algorithm via wicked analogy with matrix multiplication. Running time: O(V3lg(V)).
  • [37:45] Floyd-Warshall algorithm. Runs in O(V3).
  • [47:35] Transitive closure problem of directed graphs.
  • [53:30] Johnson’s algorithm. Runs in O(VE + V2log(V))

Lecture nineteen notes:

MIT Algorithms Lecture 19 Notes Thumbnail. Page 1 of 2.
Lecture 19, page 1 of 2: all-pairs shortest paths, dynamic programming algorithm, matrix multiplication analogy.
MIT Algorithms Lecture 19 Notes Thumbnail. Page 2 of 2.
Lecture 19, page 2 of 2: floyd-warshall algorithm, transitive closure problem, johnson’s algorithn.

Have fun with shortest path algorithms! The next post is going to be an introduction to parallel algorithms - things like dynamic multithreading, scheduling and multithreaded algorithms.

PS. This course is taught from the CLRS book (also called “Introduction to Algorithms”). Chapters 24 and 25, called “Single-Source Shortest Paths” and “All-Pairs Shortest Paths” explain everything I wrote about here in much more detail. If these topics excite you, you may want to buy this awesome book:

Comments (16) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms'
(Popularity: 35%) 50,476 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Introduction to Algorithms 04 Dec 2008 09:30 pm
1 Star2 Stars3 Stars4 Stars5 Stars (12 votes, average: 4.67 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the eleventh post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lecture sixteen, which introduces the concept of Greedy Algorithms, reviews Graphs and applies the greedy Prim’s Algorithm to the Minimum Spanning Tree (MST) Problem.

The previous lecture introduced dynamic programming. Dynamic programming was used for finding solutions to optimization problems. In such problems there can be many possible solutions. Each solution has a value, and we wish to find a solution with the optimal (minimum or maximum) value. Greedy algorithms are another set of methods for finding optimal solution. A greedy algorithm always makes the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution. Greedy algorithms do not always yield optimal solutions, but for many problems they do. In this lecture it is shown that a greedy algorithm gives an optimal solution to the MST problem.

Lecture sixteen starts with a review of graphs. It reminds that a graph G is a pair (V, E), where V is a set of vertices (nodes) and E ⊆ VxV is a set of edges. If E contains ordered pairs, the graph is directed (also called a digraph), otherwise it’s undirected. It also gives facts, such as, for all graphs, |E| = O(|V|2), and if a graph is connected then |E| >= |V| - 1. It also mentions handshaking lemma - sum over all vertices of degree of the vertex = 2|E| (Σv∈Vdegree(v) = 2|E|).

The lecture continues with graph representations. Graphs are commonly stored as adjacency matrices or as adjacency lists. Adjacency matrix of a graph G = (V, E) is an nxn matrix A with elements aij = 1, if edge (i, j) ∈ E and aij = 0, if edge (i, j) ∉ E.

Here is an example of adjacency matrix for a digraph G = ({1, 2, 3, 4}, {{1, 2}, {1, 3}, {2, 3}, {4, 3}}):

Adjacency matrix example

You can see that the element aij of matrix A is 1 if there is an edge from i to j, and the element is 0 if there is no edge from i to j (i is row, j is column). The storage space required by this representation is O(|V|2). For dense graphs this is ok, but for sparse it’s not ok.

Adjacency list of a given vertex v ∈ V is the list Adj[v] of vertices adjacent to v. Here is an example of adjacency list for the same graph:

Adjacency list example

Storage required by this representation is O(V+E).

The lecture progresses with the problem of minimum spanning trees (MSTs). It’s stated as following: given a connected, undirected graph G = (V, E) with edge weight function w: E -> R, find a spanning tree that connects all the vertices of minimum weight.

Here is an example of a MST. The MST of the graph is colored in green. The MST connects all the vertices of the graph so that the weight of edges is minimum.

Minimum Spanning Tree

Analyzing the MST problem in more detail, it is appears that MST contains optimal substructure property - if we take a subgraph and look at its MST, that MST will be a part of the MST of the whole graph. This is dynamic programming hallmark #1 (see previous lecture). It also appears that the problem of finding an MST contains overlapping subproblems, which is dynamic programming hallmark #2. Hallmarks #1 and #2 suggest that we could use a dynamic programming algorithm to find a MST of a graph. Surprisingly, there is something more powerful than dynamic programming algorithm for this problem - a greedy algorithm.

There is a hallmark for greedy algorithms:

  • Greedy choice property - a locally optimal choice is globally optimal.

Lecture continues with a theorem and a proof that a greedy local choice for a MST is globally optimal choice. This is the key idea behind Prim’s algorithm for finding a minimum spanning tree of a graph. The idea in the algorithm is to maintain a priority queue Q (keyed on minimum weight of edge connected to a vertex) of vertices that are not yet connected in the MST. At each step the algorithm takes a vertex from Q and goes over all edges, connects the edge with minimum weight to MST and updates keys of neighbor vertices (please see the lecture or lecture notes for the exact algorithm and an example).

The lecture ends with running time analysis of Prim’s algorithm. As it uses a priority queue, the running time depends on implementation of the priority queue used. For example, if we use a dumb array as a priority queue, the running time is O(V2), if we use a binary heap, the running time os O(E·lg(V)), if we use a Fibonacci heap, the running time is O(E + V·lg(V)).

Another well known algorithm for MSTs is Kruskal’s algorithm (runs in O(E·lg(V)), which uses a disjoint-set data structure, that is not covered in this course (but is covered in Chapter 21 of the book this course is taught from).

The best known algorithm to date for finding MSTs is by David R. Karger (MIT), Philip N. Klein (Brown University) and Robert E. Tarjan (Princeton University). This algorithm was found in 1993, it’s randomized in nature and runs in O(E + V) expected time. Here is the publication that presents this algorithm: “A Randomized Linear-Time Algorithm for Finding Minimum Spanning Trees“.

You’re welcome to watch lecture sixteen:

Topics covered in lecture sixteen:

  • [00:10] Review of graphs.
  • [00:50] Digraph - directed graph.
  • [01:50] Undirected graph.
  • [02:40] Bound of edges in a graph |E| = O(V2).
  • [04:10] Connected graph edges |E| >= |V| - 1.
  • [05:50] Graph representations.
  • [06:15] Adjacency matrix graph representation.
  • [08:20] Example of adjacency matrix of a graph.
  • [09:50] Storage required for an adjacency matrix O(V2). Dense representation.
  • [12:40] Adjecency list graph representation.
  • [13:40] Example of adjecancy list of a graph.
  • [15:50] Handshaking lemma (theorem) for undirected graphs.
  • [18:20] Storage required for adj. list representation O(E + V).
  • [23:30] Minimum spanning trees.
  • [24:20] Definition of MST problem.
  • [28:45] Example of a graph and its minimum spanning tree.
  • [34:50] Optimal substructure property of MST (dynamic programming hallmark #1).
  • [38:10] Theorem that a subgraph’s MST is part of whole graph’s MST.
  • [42:40] Cut and paste argument proof.
  • [45:40] Overlapping subproblems of MST (dynamic programming hallmark #2).
  • [48:20] Hallmark for greedy algorithms: The greedy choice property - a locally optimal choice is globally optimal.
  • [50:25] Theorem and proof that a local choice of least weight edge is globally optimal for MSTs.
  • [01:01:10] Prim’s algorithm for minimum spanning trees.
  • [01:01:50] Key idea of Prim’s algorithm.
  • [01:02:55] Pseudocode of Prim’s algorithm.
  • [01:07:40] Example of running Prim’s algorithm.
  • [01:15:03] Analysis of Prim’s algorithm running time.
  • [01:18:20] Running time if we choose array, binary heap and Fibonaci heap.
  • [01:22:40] Kruskal’s algorithm.
  • [01:23:00] The best algorithm for MST to date.

Lecture sixteen notes:

MIT Algorithms Lecture 16 Notes Thumbnail. Page 1 of 2.
Lecture 16, page 1 of 2: graphs, graph representations, adjacency lists, minimum spanning trees, cut and paste argument.
MIT Algorithms Lecture 16 Notes Thumbnail. Page 2 of 2.
Lecture 16, page 2 of 2: greedy choice property, prim’s algorithm, idea of kruskal’s algorithm.

Have fun with greedy programming! The next post will be a trilogy of graph and shortest paths algorithms - Dijkstra’s algorithm, Breadth-first search, Bellman-Ford algorithm, Floyd-Warshall algorithm and Johnson’s algorithm!

PS. This course is taught from the CLRS book (also called “Introduction to Algorithms”). Chapter 16 is called “Greedy Algorithms”. It explains an activity-selection problem, elements of greedy algorithms, Huffman codes, Matroid theory, and task scheduling problem. Graphs are reviewed in Appendix B of the book. If these topics excite you, you may want to buy this awesome book:

Comments (8) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms'
(Popularity: 29%) 43,048 Views

Did you like this page? Subscribe to my posts!

Page 1 of 3123»