Summary of MIT Introduction to Algorithms course

As you all may know, I watched and posted my lecture notes of the whole MIT Introduction to Algorithms course. In this post I want to summarize all the topics that were covered in the lectures and point out some of the most interesting things in them.

Actually, before I wrote this article, I had started writing an article called "The coolest things that I learned from MIT's Introduction to Algorithms" but quickly did I realize that what I was doing was listing the topics in each article and not really pointing out the coolest things. Therefore I decided to write a summary article first (I had promised to do so), and only then write an article on really the most exciting topics.

Talking about the summary, I watched a total of 23 lectures and it resulted in 14 blog posts. It took me nearly a year to publish them here. Here is a list of all the posts:

I'll now go through each of the lectures. They require quite a bit of math knowledge to understand. If you are uncertain about your math skills, I'd suggest reading Knuth's Concrete Mathematics book. It contains absolutely all the necessary math to understand this course.

Lecture 1: Analysis of Algorithms

If you're a student, or even if you're not, you must never miss the first lecture of any course, ever! The first lecture tells you what to expect from the course, how it will be taught, what it will cover, who the professor is, what the prerequisites are, and a bunch of other important and interesting things.

In this lecture you also get to know professor Charles E. Leiserson (author of CLRS) and he explains the following topics:

Why study algorithms and their performance?
What is the analysis of algorithms?
What can be more important than the performance of algorithms?
The sorting problem.
Insertion sort algorithm.
Running time analysis of insertion sort.
Asymptotic analysis.
Worst-case, average-case, best-case running time analysis.
Analysis of insertion sort's worst-case running time.
Asymptotic notation - theta notation - ?.
Merge sort algorithm.
The recursive nature of merge sort algorithm.
Running time recurrence for merge sort.
Recursion trees.
Running time analysis of merge sort by looking at the recursion tree.
General recurrence for divide and conquer algorithms.

I personally found the list of things that can be more important than the performance of the program interesting. These things are modularity, correctness, maintainability, security, functionality, robustness, user-friendliness, programmer's time, simplicity, extensibility, reliability, scalability.

Follow this link to the full review of lecture one.

Lecture 2: Analysis of Algorithms (continued)

The second lecture is presented by Eric Demaine. He's the youngest professor in the history of MIT.

Here are the topics that he explains in the second lecture:

Asymptotic notation.
Big-o notation - O.
Set definition of O-notation.
Capital-omega notation - ?.
Theta notation - ?.
Small-o notation - o.
Small-omega notation - ?.
Solving recurrences by substitution method.
Solving recurrences by recursion-tree method.
Solving recurrences by the Master's method.
Intuitive sketch proof of the Master's method.

An interesting thing in this lecture is the analogy of (O, ?, ?, o, ?) to (?, ?, =, <, >).

For example, if we say f(n) = O(n²) then by using the analogy we can think of it as f(n) ? c·n², that is, function f(n) is always smaller than or equal to c·n², or in other words, it's bounded above by function c·n², which is exactly what f(n) = O(n²) means.

Follow this link to the full review of lecture two.

Lecture 3: Divide and Conquer

The third lecture is all about the divide-and-conquer algorithm design method and its applications. The divide and conquer method solves a problem by 1) breaking it into a number of subproblems (divide step), 2) solving each problem recursively (conquer step), 3) combining the solutions (combine step).

Here are the topics explained in the third lecture:

The nature of divide and conquer algorithms.
An example of divide and conquer - merge sort.
Solving for running time of merge sort by Master's method.
Binary search.
Powering a number.
Fibonacci numbers.
Algorithms for computing Fibonacci numbers.
Fibonacci by naive recursive algorithm.
Fibonacci by bottom-up algorithm.
Fibonacci by naive recursive squaring.
Fibonacci by matrix recursive squaring.
Matrix multiplication
Strassen's algorithm.
VLSI (very large scale integration) layout problem.

I was the most impressed by the four algorithms for computing Fibonacci numbers. I actually wrote about one of them in my publication "On the Linear Time Algorithm For Finding Fibonacci Numbers," which explains how this algorithms is actually quadratic in practice (but linear in theory).

Follow this link to the full review of lecture three.

Lecture 4: Sorting

Lecture four is devoted entirely to the quicksort algorithm. It's the industry standard algorithm that is used for sorting in most of the computer systems. You just have to know it.

Topics explained in lecture four:

Divide and conquer approach to sorting.
Quicksort algorithm.
The partition routine in the quicksort algorithm.
Running time analysis of quicksort.
Worst-case analysis of quicksort.
Intuitive, best-case analysis of quicksort.
Randomized quicksort.
Indicator random variables.
Running time analysis of randomized quicksort in expectation.

I loved how the idea of randomizing the partition subroutine in quicksort algorithm led to a running time that is independent of element order. The deterministic quicksort could always be fed an input that triggers the worst-case running time O(n²), but the worst-case running time of randomized quicksort is determined only by the output of the random number generator.

I once wrote another post about quicksort called "Three Beautiful Quicksorts" where I summarized what Jon Bentley's had to say about the experimental analysis of quicksort's running time and how the current quicksort algorithm looks in the industry libraries (such as c standard library, which provides qsort function).

Follow this link to the full review of lecture four.

Lecture 5: Sorting (continued)

Lecture five continues on sorting and looks at what limits the running time of sorting to O(n·lg(n)). It then breaks out of this limitation and shows several linear time sorting algorithms.

Topics explained in lecture five:

How fast can we sort?
Comparsion sort model.
Decision trees.
Comparsion sort algorithms based on decision trees.
Lower bound for decision-tree sorting.
Sorting in linear time.
Counting sort.
The concept of stable sorting.
Radix sort.
Correctness of radix sort.
Running time analysis of radix sort.

The most interesting topic here was how any comparison sort algorithm can be translated into a decision tree (and vice versa), which limits how fast we can sort.

Follow this link to the full review of lecture five.

Lecture 6: Order Statistics

Lecture six deals with the order statistics problem - how to find the k-th smallest element among n elements. The naive algorithm is to sort the list of n elements and return the k-th element in the sorted list, but this approach makes it run in O(n·lg(n)) time. This lecture shows how a randomized, linear-time algorithm (in expectation) for this problem can be constructed.

Topics explained in lecture six:

Order statistics.
Naive order statistics algorithm via sorting.
Randomized divide and conquer order statistics algorithm.
Expected running time analysis of randomized order statistics algorithm.
Worst-case linear-time order-statistics.

An interesting point in this lecture is that the worst-case, deterministic, linear-time algorithm for order statistics isn't being used in practice because it performs poorly compared to the randomized linear-time algorithm.

Follow this link to the full review of lecture six.

Lecture 7: Hashing

This is the first lecture of two on hashing. It introduces hashing and various collision resolution strategies.

All the topics explained in lecture seven:

Symbol table problem.
Direct-access table.
The concept of hashing.
Collisions in hashing.
Resolving collisions by chaining.
Analysis of worst-case and average-case search time of chaining.
Hash functions.
Division hash method.
Multiplication hash method.
Resolving collisions by open addressing.
Probing strategies.
Linear probing.
Double hashing.
Analysis of open addressing.

Follow this link to the full review of lecture seven.

Lecture 8: Hashing (continued)

The second lecture on hashing. It addresses the weakness of hashing - for any choice of hash function, there exists a bad set of keys that all hash to the same value. An adversary can take an advantage of this and attack our program. Universal hashing solves this problem. The other topic explained in this lecture is perfect hashing - given n keys, how to construct a hash table of size O(n) where search takes O(1) guaranteed.

All the topics in lecture eight:

Weakness of hashing.
Universal hashing.
Construction of universal hash functions.
Perfect hashing.
Markov inequality.

Follow this link to the full review of lecture eight.

Lecture 9: Search Trees

This lecture primarily discusses randomly built binary search trees. (It assumes you know what binary trees are.) Similar to universal hashing (see previous lecture), they solve a problem when you need to build a tree from untrusted data. It turns out that the expected height of a randomly built binary search tree is still O(lg(n)), more precisely, it's expected to be 3·lg(n) at most.

Topics explained in lecture nine:

What are good and bad binary search trees?
Binary search tree sort.
Analysis of binary search tree sort.
BST sort relation to quicksort.
Randomized BST sort.
Randomly built binary search trees.
Convex functions, Jensen's inequality.
Expected height of a randomly built BST.

The most surprising idea in this lecture is that the binary search tree sort (introduced in this lecture) does the same element comparsions as quicksort, that is, they produce the same decision tree.

Follow this link to the full review of lecture nine.

Lecture 10: Search Trees (continued)

This is the second lecture on search trees. It discusses self-balancing trees, more specifically, red-black trees. They balance themselves in such a manner that no matter what the input is, their height is always O(lg(n)).

Topics explained in lecture ten:

Balanced search trees.
Red-black trees.
Height of red-black trees.
Rotations in binary trees.
How to insert an element in a red-black tree?
Insert-element algorithm for red-black trees.

Follow this link to the full review of lecture ten.

Lecture 11: Augmenting Data Structures

The eleventh lecture explains how to build new data structures out of existing ones. For example, how to build a data structure that you can update and query quickly for the i-th smallest element. This is the problem of dynamic order statistics and an easy solution is to augment a binary tree, such as a red-black tree. Another example is interval trees - how to quickly find an interval (such as 5-9) that overlaps some other intervals (such as 4-11 and 8-20).

Topics explained in lecture eleven:

Dynamic order statistics.
Data structure augmentation.
Interval trees.
Augmenting red-black trees to have them perform as interval trees.
Correctness of augmented red-black tree data structure.

Augmenting data structures require a lot of creativity. First you need to find an underlying data structure (the easiest step) and then think of a way to augment it with data to make it do what you want (the hardest step).

Follow this link to the full review of lecture eleven.

Lecture 12: Skip Lists

This lecture explains skip lists, which is a simple, efficient, easily implementable, randomized search structure. It performs as well as a balanced binary search tree but is much easier to implement. Eric Demaine says he implemented it in 40 minutes before the class (10 minutes to implement and 30 to debug).

In this lecture Eric builds this data structure from scratch. He starts with a linked list and builds up to a pair of linked lists, to three linked lists, until it finds the optimal number of linked lists needed to achieve logarithmic search time.

Next he continues to explain how to algorithmically build such a structure and proves that the search in this data structure is indeed quick.

Follow this link to the full review of lecture twelve.

Lecture 13: Amortized Analysis

Amortized analysis is a technique to show that even if several operations in a sequence of operations are costly, the overall performance is still good. A good example is adding elements to a dynamic list (such as a list in Python). Every time the list is full, Python has to allocate more space and this is costly. Amortized analysis can be used to show that the average cost per insert is still O(1), even though Python occasionally has to allocate more space for the list.

Topics explained in lecture thirteen:

How large should a hash table be?
Dynamic tables.
Amortized analysis.
Accounting method of amortized analysis.
Dynamic table analysis with accounting method.
Potential method of amortized analysis.
Dynamic table analysis with potential method.

This is one of the most mathematically complicated lectures.

Follow this link to the full review of lecture thirteen.

Lecture 14: Self-Organizing Lists and Competitive Analysis

This lecture concentrates on self-orginizing lists. A self-organizing list is a list that reorders itself to improve the average access time. The goal is to find a reordering that minimizes the total access time. For example, each time an element is accessed, it's moved to the front of the list, hoping that it might be accessed soon again. This is called move-to-front heuristic.

Competitive analysis can be used to theoretically reason how well such a strategy as moving items to front performs.

Topics explained in lecture fourteen:

Self-organizing lists.
Online and offline algorithms
Worst-case analysis of self-organizing lists.
Competitive analysis.
Move-to-front heuristic for self-organizing lists.
Amortized cost of move-to-front heuristic.

Follow this link to the full review of lecture fourteen.

Lecture 15: Dynamic Programming

This lecture is about the dynamic programming algorithm design technique. It's a tabular method (involving constructing a table or some part of a table) that leads to a much faster running time of the algorithm.

The lecture focuses on the longest common subsequence problem, first showing the brute force algorithm, then a recursive one, and finally a dynamic programming algorithm. The brute force algorithm is exponential in the length of strings, the recursive one is also exponential, but the dynamic programming solution is O(n·m) where n is the length of one string, and m is the length of the other.

Topics explained in lecture fifteen:

The idea of dynamic programming.
Longest common subsequence problem (LCS).
Brute force algorithm for LCS.
Analysis of brute-force algorithm.
Simplified algorithm for LCS.
Dynamic programming hallmark #1: optimal substructure.
Dynamic programming hallmark #2: overlapping subproblems.
Recursive algorithm for LCS.
Memoization.
Dynamic programming algorithm for LCS.

The most interesting thing in this lecture is the two hallmarks that indicate that the problem may be solved with dynamic programming. They are "optimal substructure" and "overlapping subproblems".

The first one means that an optimal solution to a problem contains the optimal solution to subproblems. For example, if z = LCS(x,y) - z is the solution to the problem LCS(x,y) - then any prefix of z is a solution to LCS of a prefix of x and prefix of y (prefix of z is a solution to subproblems).

The second one means exactly what it says, that the problem contains many overlapping subproblems.

Follow this link to the full review of lecture fifteen.

Lecture 16: Greedy Algorithms

This lecture introduced greedy algorithms via the minimum spanning three problem. The minimum spanning tree problem asks to find a tree that connects all the vertices of a graph with minimum edge weight. It seems at first that dynamic programming solution could solve it effectively, but if analyzed more carefully, it can be noticed that the problem exhibits another powerful property -- the best solution to each of the subproblems leads to globally optimal solution. Therefore it's called greedy, it always chooses the best solution for subproblems without ever thinking about the whole problem in general.

Topics explained in lecture sixteen:

Review of graphs.
Graph representations.
Adjacency matrices.
Adjacency lists.
Sparse and dense graphs.
Hand shaking lemma.
Minimum spanning trees (MSTs).
Hallmark for greedy algorithms: greedy choice property.
Prim's algorithm for finding MST.
Running time analysis of Prim's algorithm.
Idea of Kruskal's algorithm for MSTs.

Follow this link to the full review of lecture sixteen.

Lecture 17: Shortest Path Algorithms

This lecture starts a trilogy on shortest path algorithm. In this first episode single-source shortest path algorithms are discussed. The problem can be described as following -- how to get from one point on a graph to another by traveling the shortest distance (think of a road network). The Dijkstra's algorithm solves this problem effectively.

Topics explained in lecture seventeen:

Paths in graphs.
Shortest paths.
Path weights.
Negative path weights.
Single-source shortest path.
Dijkstra's algorithm.
Example of Dijkstra's algorithm.
Correctness of Dijkstra's algorithm.
Unweighted graphs.
Breadth First Search.

The most interesting thing here is that the Dijkstra's algorithm for unweighted graphs reduces to breadth first search algorithm which uses a FIFO instead of a priority queue because there is no longer a need to keep track of the shortest distance (all the paths have the same weight).

Follow this link to the full review of lecture seventeen.

Lecture 18: Shortest Path Algorithms (continued)

The second lecture in trilogy on shortest paths deals with single-source shortest paths that may have negative edge weights. Bellman-Ford algorithm solves the shortest path problem for graphs with negative edges.

Topics explained in lecture eighteen:

Bellman-Ford algorithm for shortest paths with negative edges.
Negative weight cycles.
Correctness of Bellman-Ford algorithm.
Linear programming.
Linear feasibility problem.
Difference constraints.
Constraint graph.
Using Bellman-Ford algorithm to solve a system of difference constraints.
Solving VLSI (very large scale integration) layout problem via Bellman-Ford.

Follow this link to the full review of lecture eighteen.

Lecture 19: Shortest Path Algorithms (continued)

The last lecture in trilogy deals with all-pairs shortest paths problem -- determine of the shortest distances between every pair of vertices in a given graph.

Topics explained in lecture nineteen:

Review of single source shortest path problem.
All-pairs shortest paths.
Dynamic programming.
Idea from matrix multiplication.
Floyd-Warshall algorithm for all-pairs shortest paths.
Transitive closure of directed graph.
Johnson's algorithm for all-pairs shortest paths.

An interesting point here is how the Floyd-Warshall algorithm that runs in O((number of vertices)³) can be transformed into something similar to Strassen's algorithm to compute the transitive closure of a graph (now it runs in O((number of vertices)^lg7).

Follow this link to the full review of lecture nineteen.

Lecture 20: Parallel Algorithms

This is an introductory lecture to multithreaded algorithm analysis. It explains the terminology used in multithreaded algorithms, such as, work, critical path length, speedup, parallelism, scheduling, and others.

Topics explained in lecture twenty:

Dynamic multithreading.
Subroutines: spawn and sync.
Logical parallelism and actual parallelism.
Multithreaded computation.
An example of a multithreaded execution on a recursive Fibonacci algorithm.
Measuring performance of a multithreaded computation.
The concept of speedup.
Maximum possible speedup.
Linear speedup.
Super-linear speedup.
Parallelism.
Scheduling.
Greedy scheduler.
Grand and Brent theorem of competitiveness of greedy schedules.
*Socrates and Cilkchess chess programs.

Follow this link to the full review of lecture twenty.

Lecture 21: Parallel Algorithms (continued)

The second lecture on parallel algorithms shows how to design and analyze multithreaded matrix multiplication algorithm and multithreaded sorting.

Topics explained in lecture twenty-one:

Multithreaded algorithms.
Multithreaded matrix multiplication.
Performance analysis of the multithreaded matrix multiplication algorithm.
Multithreaded sorting.
Multithreaded merge-sort algorithm.
Parallel-merge subroutine.
Analysis of merge-sort with parallel-merge subroutine.

Follow this link to the full review of lecture twenty-one.

Lecture 22: Cache Oblivious Algorithms

Cache-oblivious algorithms take into account something that has been ignored in all the algorithms so far, particularly, the cache. An algorithm that can be transformed into using cache effectively will perform much better than a one that doesn't. This lecture is all about how to lay out data structures in memory in such a way that memory transfers are minimized.

Topics explained in lecture twenty-two:

Modern memory hierarchy.
The concept of spatial locality and temporal locality.
Two-level memory model.
Cache-oblivious algorithms.
Blocking of memory.
Memory transfers in a simple scanning algorithm.
Memory transfers in string-reverse algorithm.
Memory analysis of binary search.
Cache oblivious order statistics.
Cache oblivious matrix multiplication algorithm.

Follow this link to the full review of lecture twenty-two.

Lecture 23: Cache Oblivious Algorithms (continued)

This is the final lecture of the course. It continues on cache oblivious algorithms and shows how to store binary search trees in memory so that memory transfers are minimized when searching in them. It wraps up with cache oblivious sorting.

Topics explained in lecture twenty-three:

Static search trees.
Memory efficient layout of static binary search trees in memory.
Analysis of static search trees.
Cache aware sorting.
Cache-oblivious sorting.
Funnel sort.
K-funnel data structure.

This is the most complicated lecture in the whole course. It takes a day to understand the k-funnel data structure.

Follow this link to the full review of lecture twenty-three.

That's it. This was the final lecture. I hope you find this summary useful.

What's next?

Next, I'll post my notes of MIT's Linear Algebra course. At first I thought I'd post Linear Algebra to a separate blog section that does not appear in the RSS feed but then I gave it another thought and came to a conclusion that every competent programmer must know the linear algebra and therefore it's worth putting them in the feed. You can surely be a good programmer without knowing linear algebra, but if you want to work on great problems and make a difference, then you absolutely have to know it.

Stay tuned!

Update: Review of the first lecture is out – Lecture 1: The Geometry of Linear Equations.