post 'good coders code, great reuse' to del.icio.us post 'good coders code, great reuse' to digg post 'good coders code, great reuse' to reddit subscribe to 'good coders code, great reuse' posts via feed
good coders code, great reuse

There are two ways to write error-free programs; only the third works.

Alan J. Perlis

Love my blog? I'd be thankful for a gift from my geeky wishlist. Thanks!

Introduction to Algorithms 07 Mar 2009 02:00 pm
1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 4.27 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the thirteenth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lectures twenty and twenty-one on parallel algorithms. These lectures cover the basics of multithreaded programming and multithreaded algorithms.

Lecture twenty begins with a good overview of multithreaded programming paradigm, introduces to various concepts of parallel programming and at the end talks about Cilk programming language.

Lecture twenty-one implements several multithreaded algorithms, such as nxn matrix addition, nxn matrix multiplication, and parallel merge-sort. It then goes into great detail to analyze the running time and parallelism of these algorithms.

Lecture 20: Parallel Algorithms I

The lecture starts with an introduction to a particular model of parallel computation called dynamic multithreading. It’s easiest to understand this model with an example. Here is a multithreaded algorithm that computes Fibonacci numbers:

Fibonacci(n):
1   if n < 2:                     | thread A
2     return n                    | thread A
3   x = spawn Fibonacci(n-1)      | thread A
4   y = spawn Fibonacci(n-2)      | thread B
5   sync                          | thread C
6   return x + y                  | thread C

# this is a really bad algorithm, because it’s
# exponential time.
#
# see lecture three for four different
# classical algorithms that compute Fibonacci numbers

(A thread is defined to be a maximal sequence of instructions not containing the parallel control statements spawn, sync, and return, not something like Java threads or Posix threads.)

I put the most important keywords in bold. They are “spawn” and “sync“.

The keyword spawn is the parallel analog of an ordinary subroutine call. Spawn before the subroutine call in line 3 indicates that the subprocedure Fibonacci(n-1) can execute in parallel with the procedure Fibonacci(n) itself. Unlike an ordinary function call, where the parent is not resumed until after its child returns, in the case of a spawn, the parent can continue to execute in parallel with the child. In this case, the parent goes on to spawn Fibonacci(n-2). In general, the parent can continue to spawn off children, producing a high degree of parallelism.

A procedure cannot safely use the return values of the children it has spawned until it executes a sync statement. If any of its children have not completed when it executes a sync, the procedure suspends and does not resume until all of its children have completed. When all of its children return, execution of the procedure resumes at the point immediately following the sync statement. In the Fibonacci example, the sync statement in line 5 is required before the return statement in line 6 to avoid the anomaly that would occur if x and y were summed before each had been computed.

The spawn and sync keywords specify logical parallelism, not “actual” parallelism. That is, these keywords indicate which code may possibly execute in parallel, but what actually runs in parallel is determined by a scheduler, which maps the dynamically unfolding computation onto the available processors.

The lecture then continues with a discussion on how a parallel instruction stream can be viewed as a directed acyclic graph G = (V, E), where threads make up the set V of vertices, and each spawn and return statement makes up an edge in E. There are also continuation edges in E that exist between statements.

In Fibonacci example there are three threads: lines 1, 2, 3 make up the thread A, line 4 is thread B and lines 5, 6 are thread C.

Here is how a graph of Fibonacci(4) computation looks like:

Multithreading Directed Acyclic Graph

A dag representing multithreaded computation of Fibonacci(4).

Here the threads are show as circles, and each group of threads belonging to the same procedure are surrounded by a rounded rectangle. Downward edges are spawn dependencies, horizontal edges represent continuation dependencies within a procedure, and upward edges are return dependencies. Every computation starts with a single initial thread, and ends with a single final thread.

So to compute Fib(4), first Fib(3) and Fib(2) have to be computed. Fib(4) spawns one thread to compute Fib(3) and another thread to compute Fib(2) and sync’es. In order to compute Fib(3), values of Fib(2) and Fib(1) have to be known, and to compute Fib(2) values of Fib(1) and Fib(0) have to be known. Fib(3) and Fib(2) spawn their threads and sync. This way the computation unwinds.

Next, the lecture introduces some concepts for measuring performance of multithreaded algorithms.

First some time measures:

  • Tp - running time of an algorithm on p processors,
  • T1 - running time of algorithm on 1 processor, and
  • T as critical path length, that is the longest time to execute the algorithm on infinite number of processors.

Based on these measures, the speedup of a computation on p processors is T1/Tp, that is how much faster a p processor machine executes the algorithm than a one processor machine. If the speedup is O(p), then we say that the computation has a linear speedup. If, however, speedup > p, then we call it a super-linear speedup. The maximum possible speedup is called parallelism and is computed by T1/T.

After these concepts the lecture puts forward the concept of scheduling and greedy scheduler.

A scheduler maps computation to p processors.

The greedy scheduler schedules as much as it can at every time step. On a p-processor machine every step can be classified into two types. If there are p or more threads ready to execute, the scheduler executes any p of them (this step is also called a complete step). If there are fewer than p threads, it executes all of them (called incomplete step). The greedy scheduler is an offline scheduler and it’s 2-competitive (see lecture 13 on online/offline algorithms and the meaning of competitive algorithm).

The lecture ends with an introduction to Cilk parallel, multithreaded programming language. It uses a randomized online scheduler. It was used to write two chess programs *Socrates and Cilkchess.

See Charles Leiseron’s handout for a much more comprehensive overview of dynamic multithreading: A Minicourse on Dynamic Multithreaded Algorithms (pdf).

You’re welcome to watch lecture twenty:

Topics covered in lecture twenty:

  • [01:33] Parallel algorithms.
  • [02:33] Dynamic multi-threading.
  • [03:15] A multithreaded algorithm for fibonnaci numbers.
  • [04:50] Explanation of spawn and sync.
  • [06:45] Logical parallelism.
  • [08:00] Multithreaded computation.
  • [12:15] Fibonacci(4) computation graph, and its concepts - init thread, spawn edge, continuation edge, return edge, final thread.
  • [17:45] Edges: spawn, return, continuation.
  • [19:40] Performance measures: running time on p processors Tp, work (serial time) T1, critcial path length (longest path) Tinfinity.
  • [24:15] Lower bounds on Tp (running time on p processors).
  • [29:35] Speedup, linear speedup, superlinear speedup.
  • [32:55] Maximum possible speedup.
  • [36:40] Scheduling.
  • [38:55] Greedy scheduler.
  • [43:40] Theorem by Grant and Brent (bound on Tp for greedy scheduler).
  • [46:20] Proof of Grant, Brent theorem.
  • [56:50] Corollary on greedy scheduler’s linear speedup.
  • [01:02:20] Cilk programming language.
  • [01:06:00] Chess programs: *Socrates, Clikchess.
  • [01:06:30] Analysis of *Socrates running time on 512 processors.

Lecture twenty notes:

MIT Algorithms Lecture 20 Notes Thumbnail. Page 1 of 2.
Lecture 20, page 1 of 2: dynamic multithreading, multithreaded computation, performance measures, speedup, scheduling, greedy scheduler.
MIT Algorithms Lecture 20 Notes Thumbnail. Page 2 of 2.
Lecture 20, page 2 of 2: grant, brent theorem, cilk, socrates, cilkchess.

Lecture 21: Parallel Algorithms II

Lecture twenty-one is all about real multithreaded algorithms and their analysis.

It starts with a hugly important problem of multithreaded matrix multiplication: given two nxn matrices A and B, write a multithreaded algorithm Matrix-Multiply that multiplies them, producing matrix C. The natural approach is to use divide and conquer method from lecture three, and formulate the problem as follows:

Multithreaded Matrix Multiplication

Each nxn matrix can be expressed as 8 multiplications and 4 additions of (n/2)x(n/2) submatrices. So the first thing we need is Matrix-Add algorithm that adds two matrices in parallel.

Here is the parallel Matrix-Add algorithm:

Matrix-Add(C, T, n):
   // Adds matrices C and T in-place, producing C = C + T
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = C[1, 1] + T[1, 1]
   else:
     partition C and T into (n/2)x(n/2) submatrices
     spawn Matrix-Add(C11, T11, n/2)
     spawn Matrix-Add(C12, T12, n/2)
     spawn Matrix-Add(C21, T21, n/2)
     spawn Matrix-Add(C22, T22, n/2)
     sync

The base case for this algorithm is when matrices are just scalars, then it just adds the only element of both matrices. In all other cases, it partitions matrices C and T into (n/2)x(n/2) matrices and spawns four subproblems.

Now we can implement the Matrix-Multiply algorithm, by doing 8 multiplications in parallel and then calling Matrix-Add to do 4 additions.

Here is the parallel Matrix-Multiply algorithm:

Matrix-Multiply(C, A, B, n):
   // Multiplies matrices A and B, storing the result in C.
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = A[1, 1] · B[1, 1]
   else:
     allocate a temporary matrix T[1...n, 1...n]
     partition A, B, C, and T into (n/2)x(n/2) submatrices
     spawn Matrix-Multiply(C12,A11,B12, n/2)
     spawn Matrix-Multiply(C21,A21,B11, n/2)
     spawn Matrix-Multiply(C22,A21,B12, n/2)
     spawn Matrix-Multiply(T11,A12,B21, n/2)
     spawn Matrix-Multiply(T12,A12,B22, n/2)
     spawn Matrix-Multiply(T21,A22,B21, n/2)
     spawn Matrix-Multiply(T22,A22,B22, n/2)
     sync
     Matrix-Add(C, T, n)

The lecture then proceeds to calculating how good the algorithm is, turns out the parallelism (ratio of time of algorithm running on a single processor T1 to running on infinite number of processors Tinf) for doing 1000×1000 matrix multiplication is 107, which the parallel algorithm is fast as hell.

Matrix-Multiply used a temporary matrix which was actually unnecessary. To achieve higher performance, it’s often advantageous for an algorithm to use less space, because more space means more time. Therefore a faster algorithm can be added that multiplies matrices and adds them in-place. See lecture at 33:05 for Matrix-Multiply-Add algorithm. The parallelism of this algorithm for 1000×1000 matrix multiplication is 106, smaller than the previous one, but due to fact that no temporary matrices had to be used, it beats Matrix-Multiply algorithm in practice.

The lecture then proceeds to parallel sorting. The classical sorting algorithms were covered in lectures four and five. This lecture parallelizes the Merge-Sort algorithm.

Here is the parallel merge-sort algorithm:

Parallel-Merge-Sort(A, p, r): // sort A[p...r]
  if p < r:
    q = floor((p+r)/2)
    spawn Parallel-Merge-Sort(A, p, q)
    spawn Parallel-Merge-Sort(A, q+1, r)
    sync
    Merge(A, p, q, r) // merge A[p...q] with A[q+1...r]

Instead of recursively calling Parallel-Merge-Sort for subproblems, we spawn them, thus utilizing the power of parallelism. After doing analysis turns out that due to fact that we use the classical Merge() function, the parallelism is just lg(n). That’s puny!

To solve this problem, a Parallel-Merge algorithm has to be written. The lecture continues with presenting a Parallel-Merge algorithm, and after that does the analysis of the same Parallel-Merge-Sort algorithm but with Parallel-Merge function. Turns out this algorithm has parallelism of n/lg2(n), that is much better. See lecture at 51:00 for Parallel-Merge algorithm.

The best parallel sorting algorithm know to date has the parallelism of n/lg(n).

You’re welcome to watch lecture twenty-one:

Topics covered in lecture twenty-one:

  • [00:35] Multithreaded algorithms.
  • [01:32] Multithreaded matrix multiplication.
  • [02:05] Parallelizing with divide and conquer.
  • [04:30] Algorithm Matrix-Multiply, that computes C = A*B.
  • [10:13] Algorithm Matrix-Add, that computes C = C+T.
  • [12:45] Analysis of running time of Matrix-Multiply and Matrix-Add.
  • [19:40] Analysis of critical path length of Matrix-Multiply and Matrix-Add.
  • [26:10] Parallelism of Matrix-Multiply and Matrix-Add.
  • [33:05] Algorithm Matrix-Multiply-Add, that computes C = C + A*B.
  • [35:50] Analysis of Matrix-Multiply-Add.
  • [42:30] Multithreaded, parallel sorting.
  • [43:30] Parallelized Merge-Sort algorithm.
  • [45:55] Analysis of multithreaded Merge-Sort.
  • [51:00] Parallel-Merge algorithm.
  • [01:00:30] Analysis of Parallel-Merge algorithm.

Lecture twenty-one notes:

MIT Algorithms Lecture 21 Notes Thumbnail. Page 1 of 2.
Lecture 21, page 1 of 2: multithreaded algorithms, matrix multiplication, analysis (critical path length, parallelism).
MIT Algorithms Lecture 21 Notes Thumbnail. Page 2 of 2.
Lecture 21, page 2 of 2: paralel sorting, merge sort, parallel merge, analysis.

Have fun with parallel, multithreaded algorithms! The next post is going to be on cache oblivious algorithms! They are a class of algorithms that exploit all the various caches in modern computer architecture to make them run notably faster.

Comments (9) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms'
(Popularity: 47%) 23,065 Views

Did you like this page? Subscribe to my posts!

Love my blog? I'd be thankful for a gift from my geeky wishlist. Thanks!

Introduction to Algorithms 27 Jan 2009 05:00 am
1 Star2 Stars3 Stars4 Stars5 Stars (12 votes, average: 4 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the twelfth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review a trilogy of lectures on graph and shortest path algorithms. They are lectures seventeen, eighteen and nineteen. They’ll cover Dijkstra’s Algorithm, Breadth-First Search Algorithm and Bellman-Ford Algorithm for finding single-source shortest paths as well as Floyd-Warshall Algorithm and Johnson’s Algorithm for finding all-pairs shortest paths.

These algorithms require a thorough understanding of graphs. See the previous lecture for a good review of graphs.

Lecture seventeen focuses on the single-source shortest-paths problem: Given a graph G = (V, E), we want to find a shortest path from a given source vertex s ∈ V to each vertex v ∈ V. In this lecture the weights of edges are restricted to be positive which leads it to Dijkstra’s algorithm and in a special case when all edges have unit weight to Breadth-first search algorithm.

Lecture eighteen also focuses on the same single-source shortest-paths problem, but allows edges to be negative. In this case a negative-weight cycles may exist and Dijkstra’s algorithm would no longer work and would produce incorrect results. Bellman-Ford algorithm therefore is introduced that runs slower than Dijkstra’s but detects negative cycles. As a corollary it is shown that Bellman-Ford solves Linear Programming problems with constraints in form xj - xi <= wij.

Lecture nineteen focuses on the all-pairs shortest-paths problem: Find a shortest path from u to v for every pair of vertices u and v. Although this problem can be solved by running a single-source algorithm once from each vertex, it can be solved faster with Floyd-Warshall algorithm or Johnson’s algorithm.

Lecture 17: Shortest Paths I: Single-Source Shortest Paths and Dijkstra’s Algorithm

Lecture seventeen starts with a small review of paths and shortest paths.

It reminds that given a graph G = (V, E, w), where V is a set of vertices, E is a set of edges and w is weight function that maps edges to real-valued weights, a path p from a vertex u to a vertex v in this graph is a sequence of vertices (v0, v1, …, vk) such that u = v0, v = vk and (vi-1, vi) ∈ E. The weight w(p) of this path is a sum of weights over all edges = w(v0, v1) + w(v1, v2) + … + w(vk-1, vk). It also reminds that a shortest path from u to v is the path with minimum weight of all paths from u to v, and that a shortest path in a graph might not exist if it contains a negative weight cycle.

The lecture then notes that shortest paths exhibit the optimal substructure property - a subpath of a shortest path is also a shortest path. The proof of this property is given by cut and paste argument. If you remember from previous two lectures on dynamic programming and greedy algorithms, an optimal substructure property suggests that these two techniques could be applied to solve the problem efficiently. Indeed, applying the greedy idea, Dijkstra’s algorithm emerges.

Here is a somewhat precise definition of single-source shortest paths problem with non-negative edge weights: Given a graph G = (V, E), and a starting vertex s ∈ V, find shortest-path weights for all vertices v ∈ V.

Here is the greedy idea of Dijkstra’s algorithm:

  • 1. Maintain a set S of vertices whose shortest-path from s are known (s ∈ S initially).
  • 2. At each step add vertex v from the set V-S to the set S. Choose v that has minimal distance from s (be greedy).
  • 3. Update the distance estimates of vertices adjacent to v.

I have also posted a video interview with Edsger Dijkstra - Edsger Dijkstra: Discipline in Thought, please take a look if you want to see how Dijkstra looked like. :)

The lecture continues with an example of running Dijkstra’s algorithm on a non-trivial graph. It also introduces to a concept of a shortest path tree - a tree that is formed by edges that were last relaxed in each iteration (hard to explain in English, see lecture at 43:40).

The other half of lecture is devoted to three correctness arguments of Dijkstra’s algorithm. The first one proves that relaxation never makes a mistake. The second proves that relaxation always makes the right greedy choice. And the third proves that when algorithm terminates the results are correct.

At the final minutes of lecture, running time of Dijkstra’s algorithm is analyzed. Turns out that the running time depends on what data structure is used for maintaining the priority queue of the set V-S (step 2). If we use an array, the running time is O(V2), if we use binary heap, it’s O(E·lg(V)) and if we use Fibonacci heap, it’s O(E + V·lg(V)).

Finally a special case of weighted graphs is considered when all weights are unit weights. In this case a single-source shortest-paths problem can be solved by a the Breadth-first search (BFS) algorithm that is actually a simpler version of Dijkstra’s algorithm with priority queue replaced by a FIFO! The running time of BFS is O(V+E).

You’re welcome to watch lecture seventeen:

Topics covered in lecture seventeen:

  • [01:40] Review of paths.
  • [03:15] Edge weight functions.
  • [03:30] Example of a path, its edge weights, and weight of the path.
  • [04:22] Review of shortest-paths.
  • [05:15] Shortest-path weight.
  • [06:30] Negative edge weights.
  • [10:55] Optimal substructure of a shortest path.
  • [11:50] Proof of optimal substructure property: cut and paste.
  • [14:23] Triangle inequality.
  • [15:15] Geometric proof of triangle inequality.
  • [16:30] Single-source shortest paths problem.
  • [18:32] Restricted single-source shortest paths problem: all edge weights positive or zero.
  • [19:35] Greedy idea for ss shortest paths.
  • [26:40] Dijkstra’s algorithm.
  • [35:30] Example of Dijkstra’s algorithm.
  • [43:40] Shortest path trees.
  • [45:12] Correctness of Dijkstra’s algorithm: why relaxation never makes mistake.
  • [53:55] Correctness of Dijkstra’s algorithm: why relaxation makes progress.
  • [01:01:00] Correctness of Dijkstra’s algorithm: why it gives correct answer when it terminates.
  • [01:15:40] Running time of Dijkstra’s algorithm.
  • [01:18:40] Running time depending on using array O(V^2), binary heap O(E·lg(V)) and Fibonacci heap O(E + V·lg(V)) for priority queue.
  • [01:20:00] Unweighted graphs.
  • [01:20:40] Breadth-First Search (BFS) algorithm.
  • [01:23:23] Running time of BFS: O(V+E).

Lecture seventeen notes:

MIT Algorithms Lecture 17 Notes Thumbnail. Page 1 of 2.
Lecture 17, page 1 of 2: paths, shortest paths, negative edge weights, optimal substructure, triangle inequality, single-source shortest paths, dijkstra’s algorithm.
MIT Algorithms Lecture 17 Notes Thumbnail. Page 2 of 2.
Lecture 17, page 2 of 2: example of dijkstra’s algorithm, correctness of dijkstra’s algorithm, running time of dijkstra, unweighted graphs, breadth first search (bfs).

Lecture 18: Shortest Paths II: Bellman-Ford Algorithm

Lecture eighteen begins with recalling that if a graph contains a negative weight cycle, then a shortest path may not exist and gives a geometric illustration of this fact.

Right after this fact, it jumps to Bellman-Ford algorithm. The Bellman-Ford algorithm solves the single-source shortest-paths problem in the general case in which edge weights may be negative. Given a weighted, directed graph G = (V, E) with source s and weight function w: E → R, the Bellman-Ford algorithm produces the shortest paths from s and their weights, if there is no negative weight cycle, and it produces no answer if there is a negative weight cycle.

The algorithm uses relaxation, progressively decreasing an estimate on the weight of a shortest path from the source s to each vertex v ∈ V until it achieves the actual shortest-path weight.

The running time of Bellman-Ford algorithm is O(VE). The lecture also gives a correctness proof of Bellman-Ford algorithm.

The other half of the lecture is devoted to a problem that can be effectively solved by Bellman-Ford. It’s called the linear feasibility problem that it is a special case of linear programming (LP) problem, where there is no objective but the constraints are in form xi <= wij. It is noted that Bellman-Ford is actually a simple case of LP problem.

The lecture ends with an application of Bellman-Ford to solving a special case of VLSI layout problem in 1 dimension.

You’re welcome to watch lecture eighteen:

Topics covered in lecture eighteen:

  • [00:20] A long, long time ago… in a galaxy far, far away…
  • [00:40] Quick review of previous lecture - Dijkstra’s algorithm, non-negative edge weights.
  • [01:40] Description of Bellman-Ford algorithm.
  • [04:30] Bellman-Ford algorithm.
  • [08:50] Running time of Bellman-Ford O(VE).
  • [10:05] Example of Bellmen-Ford algorithm.
  • [18:40] Correctness of Bellman-Ford algorithm:
  • [36:30] Linear programming (LP).
  • [42:48] Efficient algorithms for solving LPs: simplex algorithm (exponential in worst case, but practical), ellipsoid algorithm (polynomial time, impractical), interior point methods (polynomial), random sampling (brand new, discovered at MIT).
  • [45:58] Linear feasibility problem - LP with no objective.
  • [47:30] Difference constraints - constraints in form xj - xi <= wij.
  • [49:50] Example of difference constraints.
  • [51:04] Constraint graph.
  • [54:05] Theorem: Negative weight cycle in constraint means difference constraints are infeasible/unsatisfiable.
  • [54:50] Proof.
  • [59:15] Theorem: If no negative weight cycle then satisfiable.
  • [01:00:23] Proof.
  • [01:08:20] Corollary: Bellman-Ford solves a system of of m difference constraints on n variables in O(mn) time.
  • [01:12:30] VLSI Layout problem solved by Bellman-Ford.

Lecture eighteen notes:

MIT Algorithms Lecture 18 Notes Thumbnail. Page 1 of 2.
Lecture 18, page 1 of 2: bellman-ford algorithm, correctness of bellman-ford, linear programming.
MIT Algorithms Lecture 18 Notes Thumbnail. Page 2 of 2.
Lecture 18, page 2 of 2: feasibility problem, bellman-ford and difference constraints, vlsi layouts.

Lecture 19: Shortest Paths III: All-Pairs Shortest Paths and Floyd-Warshall Algorithm

Lecture nineteen starts with a quick review of lectures seventeen and eighteen. It reminds the running times of various single-source shortest path algorithms, and mentions that in case of a directed acyclic graphs (which was not covered in previous lectures), you can run topological sort and 1 round of Bellman-Ford that makes it find single-source shortest paths in linear time (for graphs) in O(V+E).

The lecture continues all-pairs shortest paths problem, where we want to know the shortest path between every pair of vertices.

A naive approach to this problem is run single-source shortest path from each vertex. For example, on an unweighted graph we’d run BFS algorithm |V| times that would give O(VE) running time. On a non-negative edge weight graph it would be |V| times Dijkstra’s algorithm, giving O(VE + V2lg(V)) time. And in general case we’d run Bellman-Ford |V| times that would make the algorithm run in O(V2E) time.

The lecture continues with a precise definition of all-pairs shortest paths problem: given a directed graph, find an NxN matrix (N = |V|), where each entry aij is the shortest path from vertex i to vertex j.

In general case, if the graph has negative edges and it’s dense, the best we can do so far is run Bellman-Ford |V| times. Recalling that E = O(V2) in a dense graph, the running time is O(V2E) = O(V4) - hypercubed in number of vertices = slow.

Lecture then proceeds with a dynamic programming algorithm without knowing if it will be faster or not. It’s too complicated to explain here, and I recommend watching lecture at 11:54 to understand it. Turns out this dynamic programming algorithm does not give a performance boost and is still O(V4), but it gives some wicked ideas.

The most wicked idea is to connect matrix multiplication with the dynamic programming recurrence and using repeated squaring to beat O(V4). This craziness gives O(V3lgV) time that is an improvement. Please see 23:40 in the lecture for full explanation.

After all this the lecture arrives at Floyd-Warshall algorithm that finds all-pairs shortest paths in O(V3). The algorithm is derived from a recurrence that the shortest path from vertex i to vertex j is minimum of { shortest path from i to j directly or shortest path from i to k and shortest path from k to j }.

Finally the lecture explains Johnson’s algorithm that runs in O(VE + V2log(V)) time for sparse graphs. The key idea in this algorithm is to reweigh all edges so that they are all positive, then run Dijkstra from each vertex and finally undo the reweighing.

It turns out, however, that to find the function for reweighing all edges, a set of difference constraints need to be satisfied. It makes us first run Bellman-Ford to solve these constraints.

Reweighing takes O(EV) time, running Dijkstra on each vertex takes O(VE + V2lgV) and undoing reweighing takes O(V2) time. Of these terms O(VE + V2lgV) dominates and defines algorithm’s running time (for dense it’s still O(V3).

You’re welcome to watch lecture nineteen:

Topics covered in lecture nineteen:

  • [01:00] Review of single-source shortest path algorithms.
  • [04:45] All-pairs shortest paths by running single-source shortest path algorithms from each vertex.
  • [05:35] Unweighted edges: |V|xBFS = O(VE).
  • [06:35] Nonnegative edge weights: |V|xDijkstra = O(VE + V2lg(V)).
  • [07:40] General case: |V|xBellman-Ford = O(V2E).
  • [09:10] Formal definition of all-pairs shortest paths problem.
  • [11:08] |V|xBellman-Ford for dense graphs (E = O(V2)) is O(V4).
  • [11:54] Trying to beat O(V4) with dynamic programming.
  • [19:30] Dynamic programming algorithm, still O(V4).
  • [23:40] A better algorithm via wicked analogy with matrix multiplication. Running time: O(V3lg(V)).
  • [37:45] Floyd-Warshall algorithm. Runs in O(V3).
  • [47:35] Transitive closure problem of directed graphs.
  • [53:30] Johnson’s algorithm. Runs in O(VE + V2log(V))

Lecture nineteen notes:

MIT Algorithms Lecture 19 Notes Thumbnail. Page 1 of 2.
Lecture 19, page 1 of 2: all-pairs shortest paths, dynamic programming algorithm, matrix multiplication analogy.
MIT Algorithms Lecture 19 Notes Thumbnail. Page 2 of 2.
Lecture 19, page 2 of 2: floyd-warshall algorithm, transitive closure problem, johnson’s algorithn.

Have fun with shortest path algorithms! The next post is going to be an introduction to parallel algorithms - things like dynamic multithreading, scheduling and multithreaded algorithms.

PS. This course is taught from the CLRS book (also called “Introduction to Algorithms”). Chapters 24 and 25, called “Single-Source Shortest Paths” and “All-Pairs Shortest Paths” explain everything I wrote about here in much more detail. If these topics excite you, you may want to buy this awesome book:

Comments (9) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms'
(Popularity: 37%) 33,174 Views

Did you like this page? Subscribe to my posts!

Love my blog? I'd be thankful for a gift from my geeky wishlist. Thanks!

Introduction to Algorithms 04 Dec 2008 09:30 pm
1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 4.56 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the eleventh post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lecture sixteen, which introduces the concept of Greedy Algorithms, reviews Graphs and applies the greedy Prim’s Algorithm to the Minimum Spanning Tree (MST) Problem.

The previous lecture introduced dynamic programming. Dynamic programming was used for finding solutions to optimization problems. In such problems there can be many possible solutions. Each solution has a value, and we wish to find a solution with the optimal (minimum or maximum) value. Greedy algorithms are another set of methods for finding optimal solution. A greedy algorithm always makes the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution. Greedy algorithms do not always yield optimal solutions, but for many problems they do. In this lecture it is shown that a greedy algorithm gives an optimal solution to the MST problem.

Lecture sixteen starts with a review of graphs. It reminds that a graph G is a pair (V, E), where V is a set of vertices (nodes) and E ⊆ VxV is a set of edges. If E contains ordered pairs, the graph is directed (also called a digraph), otherwise it’s undirected. It also gives facts, such as, for all graphs, |E| = O(|V|2), and if a graph is connected then |E| >= |V| - 1. It also mentions handshaking lemma - sum over all vertices of degree of the vertex = 2|E| (Σv∈Vdegree(v) = 2|E|).

The lecture continues with graph representations. Graphs are commonly stored as adjacency matrices or as adjacency lists. Adjacency matrix of a graph G = (V, E) is an nxn matrix A with elements aij = 1, if edge (i, j) ∈ E and aij = 0, if edge (i, j) ∉ E.

Here is an example of adjacency matrix for a digraph G = ({1, 2, 3, 4}, {{1, 2}, {1, 3}, {2, 3}, {4, 3}}):

Adjacency matrix example

You can see that the element aij of matrix A is 1 if there is an edge from i to j, and the element is 0 if there is no edge from i to j (i is row, j is column). The storage space required by this representation is O(|V|2). For dense graphs this is ok, but for sparse it’s not ok.

Adjacency list of a given vertex v ∈ V is the list Adj[v] of vertices adjacent to v. Here is an example of adjacency list for the same graph:

Adjacency list example

Storage required by this representation is O(V+E).

The lecture progresses with the problem of minimum spanning trees (MSTs). It’s stated as following: given a connected, undirected graph G = (V, E) with edge weight function w: E -> R, find a spanning tree that connects all the vertices of minimum weight.

Here is an example of a MST. The MST of the graph is colored in green. The MST connects all the vertices of the graph so that the weight of edges is minimum.

Minimum Spanning Tree

Analyzing the MST problem in more detail, it is appears that MST contains optimal substructure property - if we take a subgraph and look at its MST, that MST will be a part of the MST of the whole graph. This is dynamic programming hallmark #1 (see previous lecture). It also appears that the problem of finding an MST contains overlapping subproblems, which is dynamic programming hallmark #2. Hallmarks #1 and #2 suggest that we could use a dynamic programming algorithm to find a MST of a graph. Surprisingly, there is something more powerful than dynamic programming algorithm for this problem - a greedy algorithm.

There is a hallmark for greedy algorithms:

  • Greedy choice property - a locally optimal choice is globally optimal.

Lecture continues with a theorem and a proof that a greedy local choice for a MST is globally optimal choice. This is the key idea behind Prim’s algorithm for finding a minimum spanning tree of a graph. The idea in the algorithm is to maintain a priority queue Q (keyed on minimum weight of edge connected to a vertex) of vertices that are not yet connected in the MST. At each step the algorithm takes a vertex from Q and goes over all edges, connects the edge with minimum weight to MST and updates keys of neighbor vertices (please see the lecture or lecture notes for the exact algorithm and an example).

The lecture ends with running time analysis of Prim’s algorithm. As it uses a priority queue, the running time depends on implementation of the priority queue used. For example, if we use a dumb array as a priority queue, the running time is O(V2), if we use a binary heap, the running time os O(E·lg(V)), if we use a Fibonacci heap, the running time is O(E + V·lg(V)).

Another well known algorithm for MSTs is Kruskal’s algorithm (runs in O(E·lg(V)), which uses a disjoint-set data structure, that is not covered in this course (but is covered in Chapter 21 of the book this course is taught from).

The best known algorithm to date for finding MSTs is by David R. Karger (MIT), Philip N. Klein (Brown University) and Robert E. Tarjan (Princeton University). This algorithm was found in 1993, it’s randomized in nature and runs in O(E + V) expected time. Here is the publication that presents this algorithm: “A Randomized Linear-Time Algorithm for Finding Minimum Spanning Trees“.

You’re welcome to watch lecture sixteen:

Topics covered in lecture sixteen:

  • [00:10] Review of graphs.
  • [00:50] Digraph - directed graph.
  • [01:50] Undirected graph.
  • [02:40] Bound of edges in a graph |E| = O(V2).
  • [04:10] Connected graph edges |E| >= |V| - 1.
  • [05:50] Graph representations.
  • [06:15] Adjacency matrix graph representation.
  • [08:20] Example of adjacency matrix of a graph.
  • [09:50] Storage required for an adjacency matrix O(V2). Dense representation.
  • [12:40] Adjecency list graph representation.
  • [13:40] Example of adjecancy list of a graph.
  • [15:50] Handshaking lemma (theorem) for undirected graphs.
  • [18:20] Storage required for adj. list representation O(E + V).
  • [23:30] Minimum spanning trees.
  • [24:20] Definition of MST problem.
  • [28:45] Example of a graph and its minimum spanning tree.
  • [34:50] Optimal substructure property of MST (dynamic programming hallmark #1).
  • [38:10] Theorem that a subgraph’s MST is part of whole graph’s MST.
  • [42:40] Cut and paste argument proof.
  • [45:40] Overlapping subproblems of MST (dynamic programming hallmark #2).
  • [48:20] Hallmark for greedy algorithms: The greedy choice property - a locally optimal choice is globally optimal.
  • [50:25] Theorem and proof that a local choice of least weight edge is globally optimal for MSTs.
  • [01:01:10] Prim’s algorithm for minimum spanning trees.
  • [01:01:50] Key idea of Prim’s algorithm.
  • [01:02:55] Pseudocode of Prim’s algorithm.
  • [01:07:40] Example of running Prim’s algorithm.
  • [01:15:03] Analysis of Prim’s algorithm running time.
  • [01:18:20] Running time if we choose array, binary heap and Fibonaci heap.
  • [01:22:40] Kruskal’s algorithm.
  • [01:23:00] The best algorithm for MST to date.

Lecture sixteen notes:

MIT Algorithms Lecture 16 Notes Thumbnail. Page 1 of 2.
Lecture 16, page 1 of 2: graphs, graph representations, adjacency lists, minimum spanning trees, cut and paste argument.
MIT Algorithms Lecture 16 Notes Thumbnail. Page 2 of 2.
Lecture 16, page 2 of 2: greedy choice property, prim’s algorithm, idea of kruskal’s algorithm.

Have fun with greedy programming! The next post will be a trilogy of graph and shortest paths algorithms - Dijkstra’s algorithm, Breadth-first search, Bellman-Ford algorithm, Floyd-Warshall algorithm and Johnson’s algorithm!

PS. This course is taught from the CLRS book (also called “Introduction to Algorithms”). Chapter 16 is called “Greedy Algorithms”. It explains an activity-selection problem, elements of greedy algorithms, Huffman codes, Matroid theory, and task scheduling problem. Graphs are reviewed in Appendix B of the book. If these topics excite you, you may want to buy this awesome book:

Comments (6) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms'
(Popularity: 33%) 29,196 Views

Did you like this page? Subscribe to my posts!

Love my blog? I'd be thankful for a gift from my geeky wishlist. Thanks!

Introduction to Algorithms 27 Nov 2008 06:45 pm
1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 5 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the tenth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lecture fifteen, which introduces the concept of Dynamic Programming and applies it to the Longest Common Subsequence problem.

Dynamic programming is a design technique similar to divide and conquer. Divide-and-conquer algorithms partition the problem into independent subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem. Dynamic programming is applicable when the subproblems are not independent, that is, when subproblems share subsubproblems. A dynamic-programming algorithm solves every subsubproblem just once and then saves its answer in a table, thereby avoiding the work of recomputing the answer every time the subsubproblem is encountered.

Dynamic programming was systematized by Richard E. Bellman. He began the systematic study of dynamic programming in 1955. The word “programming,” both here and in linear programming, refers to the use of a tabular solution method and not to writing computer code.

Dynamic programming is typically applied to optimization problems. In such problems there can be many possible solutions. Each solution has a value, and we wish to find a solution with the optimal (minimum or maximum) value. We call such a solution an optimal solution, as opposed to the optimal solution, since there may be several solutions that achieve the optimal value.

Dynamic programming can be effectively applied to solve the longest common subsequence (LCS) problem. The problem is stated as following: given two sequences (or strings) x and y find a maximum-length common subsequence (substring) of x and y.

For example, given two sequences x = “ABCBDAB” and y = “BDCABA”, the LCS(x, y) = { “BCBA”, “BDAB”, “BCAB” }. As you can see there are several optimal solutions.

Lecture fifteen introduces dynamic programming via this longest common subsequence problem. It first gives a brute-force, exponential time algorithm for solving it. The idea of algorithm is to check every subequence of x[1..m] (m is the length of sequence x) to see if it is also a subsequence of y[1..n] (n is the length of sequence y). Checking takes O(n) time, and there are 2m subsequences of x. The running time thus is exponential O(n·2m). It is no good for large sequences and the lecture continues with a simplification - let’s look at the length of a longest-common subseq and then extend this algorithm to find the LCS itself. The simplified algorithm is recursive in nature and computes the same subproblems. At this moment two dynamic programming hallmarks are stated:

  • 1. Optimal substructure: an optimal solution to a problem contains optimal solutions to subproblems.
  • 2. Overlapping subproblems: a recursive solution contains a “small” number of distinct subproblems repeated many times.

As the subproblems are overlapping, the lecture introduces concept of memoization algorithm (note that it’s not memorization). A better known word for memoization is caching. The subproblems are cached (memoized) so that they are not recomputed over and over again.

The lecture ends with constructing a dynamic programming table for LCS problem and explains how to find a LCS from this table.

You’re welcome to watch lecture fifteen:

Topics covered in lecture fifteen:

  • [00:20] Dynamic programming.
  • [01:47] Longest common subsequence (LCS) problem.
  • [03:55] Example of LCS on sequences “ABCBDAB” and “BDCABA”.
  • [06:55] Brute force algorithm for LCS.
  • [07:50] Analysis of brute force algorithm.
  • [11:40] Simplification of LCS problem.
  • [16:20] Theorem about LCS length.
  • [18:25] Proof of the theorem.
  • [30:40] Dynamic programming hallmark #1: Optimal substructure.
  • [32:25] Example of hallmark #1 on LCS.
  • [34:15] Recursive algorithm for longest common subseq.
  • [36:40] Worst case analysis of the algorithm.
  • [38:10] Recursion tree of algorithm.
  • [42:40] Dynamic programming hallmark #2: Overlapping subproblems.
  • [44:40] Example of hallmark #2 on LCS.
  • [45:50] Memoization algorithm for LCS.
  • [48:45] Time and space analysis of memoized algorithm.
  • [54:30] Dynamic programming algorithm for LCS.
  • [01:01:15] Analysis of dynamic programming algorithm.

Lecture fifteen notes:

MIT Algorithms Lecture 15 Notes Thumbnail. Page 1 of 2.
Lecture 15, page 1 of 2.
MIT Algorithms Lecture 15 Notes Thumbnail. Page 2 of 2.
Lecture 15, page 2 of 2.

Have fun with programming dynamically! The next post will be about graphs, greedy algorithms and minimum spanning trees.

PS. This course is taught from the CLRS book (also called “Introduction to Algorithms”). Chapter 15 is called “Dynamic Programming” and covers the topics in this lecture. It also explains the assembly-line scheduling problem, matrix-chain multiplication problem, elements of dynamic programming and optimal binary search trees.

Comments (20) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lecture 15: Dynamic Programming' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lecture 15: Dynamic Programming' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lecture 15: Dynamic Programming' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lecture 15: Dynamic Programming'
(Popularity: 24%) 17,157 Views

Did you like this page? Subscribe to my posts!

Love my blog? I'd be thankful for a gift from my geeky wishlist. Thanks!

Introduction to Algorithms 20 Nov 2008 12:40 pm
1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 3.8 out of 5)
Loading ... Loading ...

MIT AlgorithmsThis is the ninth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lectures thirteen and fourteen. They are on theoretical topics of Amortized Analysis, Competitive Analysis and Self-Organizing Lists.

Amortized analysis is a tool for analyzing algorithms that perform a sequence of similar operations. It can be used to show that the average cost of an operation is small, if one averages over a sequence of operations, even though a single operation within the sequence might be expensive. Amortized analysis differs from average-case analysis in that probability is not involved; an amortized analysis guarantees the average performance of each operation in the worst case.

Lecture thirteen looks at three most common techniques used in amortized analysis - Aggregate Analysis, Accounting Method (also known as Taxation Method) and Potential Method.

Competitive analysis is a method that is used to show how an Online Algorithm compares to an Offline Algorithm. An algorithm is said to be online if it does not know the data it will be executing on beforehand. An offline algorithm may see all of the data in advance.

A self-organizing list is a list that reorders its elements based on some self-organizing heuristic to improve average access time.

Lecture fourteen gives a precise definition of what it means for an algorithm to be competitive and shows a heuristic for a self-organizing list that makes it 4-competitive.

Lecture 13: Amortized Algorithms

Lecture thirteen starts with a question - “How large should a hash table be?” Well, we could make it as large as possible to minimize the search time, but then it might take too much space. We could then try to make it as small as possible, but what if we don’t know the number of elements that will be hashed into it? The solution is to use a dynamic hash table that grows whenever it is about to overflow.

This creates a problem that at the moment of overflow the table must be grown and growing it might take a lot of time. Thus, we need a way to analyze the running time in a long run.

Lecture continues with three methods for this analysis, called amortized arguments - aggregate method, accounting method, and potential method.

In aggregate analysis, we determine an upper bound T(n) on the total cost of a sequence of n operations. The average cost per operation is then T(n)/n. We take the average cost as the amortized cost of each operation, so that all operations have the same amortized cost.

In accounting method we determine an amortized cost of each operation. When there is more than one type of operation, each type of operation may have a different amortized cost. The accounting method overcharges some operations early in the sequence, storing the overcharge as “prepaid credit” on specific objects in the data structure. The credit is used later in the sequence to pay for operations that are charged less than they actually cost.

Potential method is like the accounting method in that we determine the amortized cost of each operation and may overcharge operations early on to compensate for undercharges later. The potential method maintains the credit as the “potential energy” of the data structure as a whole instead of associating the credit with individual objects within the data structure.

Each method is applied to dynamic tables to show that average cost per insert is O(1).

You’re welcome to watch lecture thirteen:

Topics covered in lecture thirteen:

  • [01:05] How large should a hash table be?
  • [04:45] Dynamic tables.
  • [06:15] Algorithm of dynamic hash tables.
  • [07:10] Example of inserting elements in a dynamic array.
  • [09:35] Wrong analysis of time needed for n insert operations.
  • [12:15] Correct analysis of n insert operations using aggregate analysis method.
  • [19:10] Definition of amortized analysis.
  • [21:20] Types of amortized arguments.
  • [23:55] Accounting method (taxation method) amortized analysis.
  • [29:00] Dynamic table analysis with accounting (taxation) method.
  • [40:45] Potential method amortized analysis.
  • [54:15] Table doubling analysis with potential method.
  • [01:13:10] Conclusions about amortized costs, methods and bounds.

Lecture thirteen notes:

MIT Algorithms Lecture 13 Notes Thumbnail. Page 1 of 2.
Lecture 13, page 1 of 2.
MIT Algorithms Lecture 13 Notes Thumbnail. Page 2 of 2.
Lecture 13, page 2 of 2.

Lecture 14: Competitive Analysis and Self-Organizing Lists

Lecture fourteen starts with the notion of self-organizing lists. A self-organizing list is a list that reorders itself to improve the average access time. The goal is to find a reordering that minimizes the total access time.

At this point the lecture steps away from this problem for a moment and defines online algorithms and offline algorithms: Given a sequence S of operations that algorithm must execute, an online algorithm must execute each operation immediately but an offline algorithm may see all of S in advance.

Now the lecture returns to the self-organizing list problem and looks at two heuristics how the list might reorganize itself to minimize the access time. The first way is to keep a count of the number of times each element in the list is accessed and reorder list in the order of decreasing count. The other is move-to-front heuristic (MTF) - after accessing the element, move it to the front of the list.

Next the lecture defines what it means for an algorithm to be competitive. It is said that an online algorithm is α-competitive if there is a constant k such that for any sequence of operations the (running time cost of online algorithm) <= α·(running time cost of an optimal offline algorithm (also called "God's algorithm")) + k.

Lecture continues with a theorem that move-to-front is 4-competitive for self-organizing lists. The proof of this theorem takes almost an hour!

Topics covered in lecture fourteen:

  • [01:00] Definition of self-organizing lists.
  • [03:00] Example of a self-organizing list (illustrates transpose cost and access cost).
  • [05:00] Definition of on-line and off-line algorithms.
  • [08:45] Worst case analysis of self-organizing lists.
  • [11:40] Average case analysis of selforganizing lists.
  • [17:05] Move-to-front heuristic.
  • [20:35] Definition of competitive analysis.
  • [23:35] Theorem: MTF is 4-competitive.
  • [25:50] Proof of this theorem.

Lecture fourteen notes:

MIT Algorithms Lecture 13 Notes Thumbnail. Page 1 of 2.
Lecture 14, page 1 of 2.
MIT Algorithms Lecture 14 Notes Thumbnail. Page 2 of 2.
Lecture 14, page 2 of 2.

Have fun with amortized and competitive analysis! The next post will be about dynamic programming!

PS. This course is taught from the CLRS book (also called “Introduction to Algorithms”):

Comments (3) Comments | Email Post Email 'MIT’s Introduction to Algorithms, Lectures 13 and 14: Amortized Analysis and Self-Organizing Lists' to a friend | Print Post Print 'MIT’s Introduction to Algorithms, Lectures 13 and 14: Amortized Analysis and Self-Organizing Lists' | Permalink Permalink to 'MIT’s Introduction to Algorithms, Lectures 13 and 14: Amortized Analysis and Self-Organizing Lists' | Trackback Trackback to 'MIT’s Introduction to Algorithms, Lectures 13 and 14: Amortized Analysis and Self-Organizing Lists'
(Popularity: 20%) 9,864 Views

Did you like this page? Subscribe to my posts!

Page 1 of 3123»