Follow me on Twitter for my latest adventures!
This is the eighth post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review lecture twelve, which introduces an efficient search structure called Skip Lists.
Skip lists are an efficient data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforce balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler to implement and significantly faster than the equivalent algorithms for balanced search trees (see lectures 9 and 10 for search trees).
Here is the first publication ever on skip lists by their inventor William Pugh - Skip Lists: A Probabalistic Alternative to Balanced Trees
The lecture begins with Erik Demaine (professor) telling the students that before the lecture he implemented skip lists to see how fast it could be done. It took him 10 minutes to implement linked lists (on which skip lists are based) and took another 30 to implement skip lists themselves (+30 mins debugging ;) ). He says that they are so simple that at no point did he have to think while implementing them. Compare it to red-black trees for example, where you have to think really hard to implement the operations to maintain red-black tree properties (see lecture 10 on red-black trees).
The lecture continues with derivation of skip lists from scratch. First, a single sorted linked list is considered. Searching in a sorted linked list takes O(n). Then another sorted linked list is added. The added linked list is chosen to be a sublist of the first linked list. The elements that both lists share are linked together with another link (see my scanned lecture notes below to see how it looks). The search cost for such structure is O(2·sqrt(n)). Then another and another linked list is added, the search cost goes down to O(3·n1/3), O(4·n1/4), ..., O(k·n1/k). Now, what should k be? Take it to be log(n), we get running time of O(lg(n)·n1/lg(n)), which is O(2·lg(n)) - logarithmic time search structure!
The lecture also presents Search(x) method for finding the element x in a skip list, and Insert(x) method for inserting the element x in a skip list.
The rest of the lecture is allocated to the proof that the search operation in skiplists takes O(lg(n)) time with high probability.
You're welcome to watch lecture twelve:
Topics covered in lecture twelve:
- [00:10] A new dynamic, balanced, randomized and simple search structure - skip lists.
- [01:20] Review of other simple search structures - treaps, red-black trees, b-trees.
- [03:55] Skip lists are a data structure that you will never forget, so simple it is.
- [06:10] All the skip list operations take O(lg(n)) time almost guaranteed (with high probability).
- [07:30] Implementing skip lists from scratch.
- [09:05] Search cost in a single sorted linked list.
- [10:30] Adding a second linked list, search cost analysis.
- [11:16] Trick question: what is this sequence 14, 23, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110, 116, 125? (I won't reveal the answer ;) , you'll have to watch the lecture).
- [14:35] Building skip list out of this sequence.
- [17:07] Search(x) operation algorithm for a skip list.
- [18:40] Example of search operation.
- [21:20] What elements do go in the second linked list?
- [25:30] Search cost of a skip list with two sorted linked lists.
- [27:53] Skip list with three sorted link lists.
- [29:14] Skip list with k sorted linked lists.
- [29:25] What should k be? (It should ideally be lg(n)).
- [32:10] Example of an ideal skip list.
- [38:30] Skip lists roughly maintain their structure subject to updates (insert, delete).
- [39:40] Insert(x) operation algorithm for a skip list.
- [47:00] Building a random skip list.
- [54:00] Rigorous analysis of search cost of skip lists.
- [55:00] Theorem: with high probability, search in n element skiplist costs O(lg(n)).
- [57:10] Definition: with high probability.
- [01:00:00] Boole's inequality.
- [01:06:55] Joke: probability that Search(x) algorithm takes more than O(lg(n)) time is smaller than a meteor striking the earth at the same time that your computer has floating point error at the same time earth explodes :D .
- [01:07:45] Lemma: With high probability number of levels in a skiplist is O(lg(n)).
- [01:13:00] Cool idea - analyze search time backwards.
Lecture twelve notes:
Have fun building skip lists! The next post will be about two really theoretical lectures on amortized analysis, competitive analysis and self-organizing lists.
PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"):
PPS. This was one of the lectures that did not have a corresponding chapter in the book.