MIT's Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms

This is a happy and sad moment at the same time - I have finally reached the last two lectures of MIT's undergraduate algorithms course. These last two lectures are on a fairly new area of algorithm research called "cache oblivious algorithms."

Cache-oblivious algorithms take into account something that has been ignored in all the lectures so far, particularly, the multilevel memory hierarchy of modern computers. Retrieving items from various levels of memory and cache make up a dominant factor of running time, so for speed it is crucial to minimize these costs. The main idea of cache-oblivious algorithms is to achieve optimal use of caches on all levels of a memory hierarchy without knowledge of their size.

Cache-oblivious algorithms should not be confused with cache-aware algorithms. Cache-aware algorithms and data structures explicitly depend on various hardware configuration parameters, such as the cache size. Cache-oblivious algorithms do not depend on any hardware parameters. An example of cache-aware (not cache-oblivious) data structure is a B-Tree that has the explicit parameter B, the size of a node. The main disadvantage of cache-aware algorithms is that they are based on the knowledge of the memory structure and size, which makes it difficult to move implementations from one architecture to another. Another problem is that it is very difficult, if not impossible, to adapt some of these algorithms to work with multiple levels in the memory hierarchy. Cache-oblivious algorithms solve both problems.

Lecture twenty-two introduces the terminology and notation used in cache-oblivious algorithms, explains the difference between cache-oblivious and cache-aware algorithms, does a simple memory analysis of several simple algorithms and culminates with a cache-oblivious algorithm for matrix multiplication.

The final lecture twenty-three is the most difficult in the whole course and shows cache-oblivious binary search trees and cache-oblivious sorting called funnel sort.

Use this supplementary reading material by professor Demaine to understand the material better: Cache-oblivious algorithms and data structures (.pdf).

Lecture 22: Cache Oblivious Algorithms I

Lecture twenty-two starts with an introduction to the modern memory hierarchy (CPU cache L1, L2, L3, main memory, disk cache, etc.) and with the notation and core concepts used in cache-oblivious algorithms.

A powerful result in cache-oblivious algorithm design is that if an algorithm is efficient on two levels of cache, then it's efficient on any number of levels. Thus the study of cache-obliviousness can be simplified to two-level memory hierarchy, say the CPU cache and main memory, where the accesses to cache are instant but are orders of magnitude slower to main memory. Therefore the main question cache-oblivious algorithm analysis tries to address is how many memory transfers (MTs) does a problem of size N take. The notation used for this is MT(N). For an algorithm to be efficient, the number of memory transfers should be as small as possible.

Next the lecture analysis the number of memory transfers for basic array scanning and array reverse algorithms. Since array scanning is consequential, N elements can be processed with O(N/B) accesses, where B is the block size - number of elements that are automatically fetched as N-th element is accessed. That is MT(N) = O(N/B) for array scanning. The same bound holds for reversing an array, since it can be viewed two scans - one from the beginning and one from the end.

Next it's shown that the classical binary search (covered in lecture 3) is not cache efficient, but order statistics problem (covered in lecture 6) is cache efficient.

Finally the lecture describes a cache efficient way to multiply matrices by storing them block-wise in memory.

You're welcome to watch lecture twenty-two:

Direct URL: http://video.google.com/videoplay?docid=-1965282303585171223

Topics covered in lecture twenty-two:

[00:10] Introduction and history of cache-oblivious algorithms.
[02:00] Modern memory hierarchy in computers: Caches L1, L2, L3, main memory, disk cache.
[06:15] Formula for calculating the cost to access a block of memory.
[08:18] Amortized cost to access one element in memory.
[11:00] Spatial and temporal locality of algorithms.
[13:45] Two-level memory model.
[16:30] Notation: total cache size M, block size B, number of blocks M/B.
[20:40] Notation: MT(N) - number of memory transfers of a problem of size N.
[21:45] Cache-aware algorithms.
[22:50] Cache-oblivious algorithms.
[28:35] Blocking of memory.
[32:45] Cache-oblivious scanning algorithm (visitor pattern).
[36:20] Cache-oblivious Array-Reverse algorithm.
[39:05] Memory transfers in classical binary search algorithm.
[43:45] Divide and conquer algorithms.
[45:50] Analysis of memory transfers in order statistics algorithm.
[01:00:50] Analysis of classical matrix multiplication (with row major, column major memory layout).
[01:07:30] Cache oblivious matrix multiplication.

Lecture twenty-two notes:

MIT Algorithms Lecture 22 Notes Thumbnail. Page 1 of 2.

Lecture 22, page 1 of 2: modern memory hierarchy, spacial and temporal locality, two-level memory model, blocking of memory, basic algorithms: parallel scan.

MIT Algorithms Lecture 22 Notes Thumbnail. Page 2 of 2.

Lecture 22, page 2 of 2: basic algorithms: binary search, divide and conquer algorithms, order statistics, matrix multiplication, block algorithms.

Lecture 23: Cache Oblivious Algorithms II

This was probably the most complicated lecture in the whole course. The whole lecture is devoted to two subjects - cache-oblivious search trees and cache-oblivious sorting.

While it's relatively easy to understand the design of cache-oblivious way of storing search trees in memory, it's amazingly difficult to understand the cache-efficient sorting. It's called funnel sort which is basically an n-way merge sort (covered in lecture 1) with special cache-oblivious merging function called k-funnel.

You're welcome to watch lecture twenty-three:

Direct URL: http://video.google.com/videoplay?docid=-5349583735068377717

Topics covered in lecture twenty-three:

[01:00] Cache-oblivious static search trees (binary search trees).
[09:35] Analysis of static search trees.
[18:15] Cache-aware sorting.
[19:00] Sorting by repeated insertion in binary tree.
[21:40] Sorting by binary merge sort.
[31:20] Sorting by N-way mergesort.
[36:20] Sorting bound for cache-oblivious sorting algorithms.
[38:30] Cache-oblivious sorting.
[41:40] Definition of K-Funnel (cache-oblivious merging).
[43:35] Funnel sort.
[54:05] Construction of K-Funnel.
[01:03:10] How to fill buffer in k-funnel.
[01:07:30] Analysis of fill buffer.

Lecture twenty-three notes:

MIT Algorithms Lecture 23 Notes Thumbnail. Page 1 of 2.

Lecture 23, page 1 of 2: static search trees, cache aware sorting.

MIT Algorithms Lecture 23 Notes Thumbnail. Page 2 of 2.

Lecture 23, page 2 of 2: cache oblivious sorting, k-funnels, funnel sort, fill-buffer algorithm and analysis.

Have fun with the cache oblivious algorithms! I'll do a few more posts that will summarize all these lectures and highlight key ideas.

If you loved this, please subscribe to my blog!