Follow me on Twitter for my latest adventures!

**18**Comments October 16, 2008

# MIT's Introduction to Algorithms, Lecture 12: Skip Lists

This is the eighth post in an article series about MIT's lecture course "**Introduction to Algorithms**." In this post I will review lecture twelve, which introduces an efficient search structure called **Skip Lists**.

Skip lists are an efficient data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforce balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler to implement and significantly faster than the equivalent algorithms for balanced search trees (see lectures 9 and 10 for search trees).

Here is the first publication ever on skip lists by their inventor William Pugh - Skip Lists: A Probabalistic Alternative to Balanced Trees

The lecture begins with Erik Demaine (professor) telling the students that before the lecture he implemented skip lists to see how fast it could be done. It took him 10 minutes to implement linked lists (on which skip lists are based) and took another 30 to implement skip lists themselves (+30 mins debugging ;) ). He says that they are so simple that at no point did he have to think while implementing them. Compare it to red-black trees for example, where you have to think really hard to implement the operations to maintain red-black tree properties (see lecture 10 on red-black trees).

The lecture continues with derivation of skip lists from scratch. First, a single sorted linked list is considered. Searching in a sorted linked list takes O(n). Then another sorted linked list is added. The added linked list is chosen to be a sublist of the first linked list. The elements that both lists share are linked together with another link (see my scanned lecture notes below to see how it looks). The search cost for such structure is O(2·sqrt(n)). Then another and another linked list is added, the search cost goes down to O(3·n^{1/3}), O(4·n^{1/4}), ..., O(k·n^{1/k}). Now, what should k be? Take it to be log(n), we get running time of O(lg(n)·n^{1/lg(n)}), which is O(2·lg(n)) - logarithmic time search structure!

The lecture also presents Search(x) method for finding the element x in a skip list, and Insert(x) method for inserting the element x in a skip list.

The rest of the lecture is allocated to the proof that the search operation in skiplists takes O(lg(n)) time with high probability.

You're welcome to watch lecture twelve:

Topics covered in lecture twelve:

- [00:10] A new dynamic, balanced, randomized and simple search structure - skip lists.
- [01:20] Review of other simple search structures - treaps, red-black trees, b-trees.
- [03:55] Skip lists are a data structure that you will never forget, so simple it is.
- [06:10] All the skip list operations take O(lg(n)) time almost guaranteed (
*with high probability*). - [07:30] Implementing skip lists from scratch.
- [09:05] Search cost in a single sorted linked list.
- [10:30] Adding a second linked list, search cost analysis.
- [11:16] Trick question: what is this sequence 14, 23, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110, 116, 125? (I won't reveal the answer ;) , you'll have to watch the lecture).
- [14:35] Building skip list out of this sequence.
- [17:07] Search(x) operation algorithm for a skip list.
- [18:40] Example of search operation.
- [21:20] What elements do go in the second linked list?
- [25:30] Search cost of a skip list with two sorted linked lists.
- [27:53] Skip list with three sorted link lists.
- [29:14] Skip list with k sorted linked lists.
- [29:25] What should k be? (It should ideally be lg(n)).
- [32:10] Example of an ideal skip list.
- [38:30] Skip lists roughly maintain their structure subject to updates (insert, delete).
- [39:40] Insert(x) operation algorithm for a skip list.
- [47:00] Building a random skip list.
- [54:00] Rigorous analysis of search cost of skip lists.
- [55:00] Theorem: with high probability, search in n element skiplist costs O(lg(n)).
- [57:10] Definition: with high probability.
- [01:00:00] Boole's inequality.
- [01:06:55]
**Joke**: probability that Search(x) algorithm takes more than O(lg(n)) time is smaller than a meteor striking the earth at the same time that your computer has floating point error at the same time earth explodes :D . - [01:07:45] Lemma: With high probability number of levels in a skiplist is O(lg(n)).
- [01:13:00] Cool idea - analyze search time backwards.

Lecture twelve notes:

Have fun building skip lists! The next post will be about two really theoretical lectures on amortized analysis, competitive analysis and self-organizing lists.

PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"):

PPS. This was one of the lectures that did not have a corresponding chapter in the book.

## Comments

would be nice if anyone actually could do handwriting anymore

Good old skiplists. Cyrus Imapd uses a skiplist database format as one of its storage engines. Very interesting. I got to learn more than I ever wanted to know about the format while debugging it!

The main criticism is lack of cache coherency, and the number of locations you need to update when inserting or removing a node.

Great lecture...anyone catch the reference to The Hitchhiker's Guide?

MiniSuit Starter Kit for Nokia Lumia 520 – Hard Snap-On Rubber Case, Screen Protector, Stylus

http://www.bestcellphonecase.com/nokia/minisuit-starter-kit-for-nokia-lumia-520-hard-snap-on-rubber-case-screen-protector-stylus.html

The trade off, of course, is memory, which is almost n^2....

Thanks for writing this up!

Uau uau uau, apsveicu ar iekļūšanu LtU :D

These are MIT students? They're either incredibly shy, or not as smart as we've been led to believe; lots of the questions during the lecture were no brainers yet nobody is peeping up.

He even alludes to flipping a coin during the introduction, and then when he gets to the question where the obvious answer is "flip a coin", nobody seems to remember.

On the other hand, cool data structure. The "easy to implement" advantage is a nice bonus over balanced trees; are there performance benefits as well or are you just less likely to screw it up?

Also, no love for splay trees?

Actually, the beginning of that last message sounds way more douche-bag than I intended; no offense was meant. But you guys should really not be afraid to speak up! You're there to participate, and you'll miss the college experience when you are gone, so make the most of it.

(From a graduate of one of your competitors!)

Cognitive dissonance. Don Knuth re-uses code?

Oh, I see. The title remains the same but the quotation rotates. This causes the quoted people to seem to endorse a platitude they would never utter.

Ulzha: kas diez ir LtU?

Lietuvas Universitaate?

Ryan: there is some cool kids factor, as in: "is it cool to shout what you know?"

Experienced in classes I attended, where the back rows would always know the answer, but never call out loud. Well, maybe if the prof asked for the third time.

Peteris Ledins, LtU is Lambda the Ultimate.

<latvian>Ps. paldies par komentariem algoritmu lekcijas.</latvian>

plis i take data str n algo course moi uni info scie

Oh, i've read that paper long time ago (it's from early 90s if i'm not wrong). Stunning data structure.

I had not seen the video, so perhaps what follows is on it:

It turns out that not only skip lists are simple to implement, they can can also be "easily" changed to support multiple readers/writers. The structure (with n sized next pointer buckets) and the statistical distribution of bucket size allows for "localized" pointer management, resulting in more granular changes, making many inserts local. Compare this with AVL trees, where local insertions can alter large portions of the tree.

I've read (some time ago) that many high performance databases (Oracle, for instance) have changed to use this data structure for many operations because of it's better multi-threaded properties.

If my memory is not wrong, you can search ACM digital library and many papers on skip lists will pop out in the results, including multiple writers analisys.

Cool blog, BTW.

Well, i went to find the original paper. It was publicated on Communications of the ACM, Volume 33 Issue 6, June 1990, pages 668 thru 676.

If you are an ACM member, it can be downloaded (free for members, paid otherwise) with the link http://portal.acm.org/citation.cfm?id=78973.78977&coll=GUIDE&dl=GUIDE&CFID=44594004&CFTOKEN=83191267

There is another paper "Lock-free linked lists and skip lists" also on ACM, the link is http://portal.acm.org/citation.cfm?id=1011767.1011776&coll=GUIDE&dl=GUIDE&CFID=44594004&CFTOKEN=83191267

The lecture is very cool.

Tough I have one question related to a possible step of efficiency in the Search operation:

When searching X in level

and comparing it against the current node Y in level:When X < Y is not TRUE, why not trying to ask if X == Y.

If so, then we found X and we don't need to go down to level

.This will save log(

) operations.Regards.Can you un-private the video? Cool people like me can't watch it in it's current state.

## Leave a new comment