This article is part of the article series "MIT Introduction to Algorithms."
<- previous article next article ->
MIT Algorithms

This is the fourth post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review lecture six, which is on the topic of Order Statistics.

The problem of order statistics can be described as following. Given a set of N elements, find k-th smallest element in it. For example, the minimum of a set is the first order statistic (k = 1) and the maximum is the N-th order statistic (k = N).

Lecture six addresses the problem of selecting the k-th order statistic from a set of N distinct numbers.

The lecture begins with discussing the naive algorithm for finding k-th order statistic -- sort the given set (or array) A of numbers and return k-th element A[k]. Running time of such algorithm at best is O(n┬Ělog(n)).

Erik Demaine (professor) presents a randomized divide and conquer algorithm for this problem (my second post covers divide and conquer algorithms), which runs in expected linear time.

The key idea of this algorithm is to call Randomized-Partition (from previous lecture) subroutine on the given array and then recursively conquer on the partitioned elements on the left and right of the chosen pivot element.

Here is the pseudo code of this algorithm:

Randomized-Select(A, p, q, k) // finds k-th smallest element in array A[p..q]
  if p == q
    return A[p]
  r = Randomized-Partition(A, p, q)
  n = r - p + 1
  if k == n
    return A[r]
  if k < n
    return Randomized-Select(A, p, r-1, k)
  else
    return Randomized-Select(A, r+1, q, k-n)

The lecture continues with intuitive and then mathematically rigorous analysis of the algorithm. It is concluded that the expected running time of this algorithm is O(n) and the worst case (when the Randomized-Partition algorithm chooses bad pivots) is O(n2).

At the last minutes of the lecture a worst-case linear time order statistics algorithm is presented. This algorithm was invented by five scientists - M. Blum, R. W. Floyd, V. Pratt, R. Rivest and R. Tarjan. I found the original publication of this algorithm by these gentlemen: Time Bounds for Selection.

You're welcome to watch lecture six:

Lecture six notes:

MIT Algorithms Lecture 6 Notes Thumbnail. Page 1 of 2.
Lecture 6, page 1 of 2.

MIT Algorithms Lecture 6 Notes Thumbnail. Page 2 of 2.
Lecture 6, page 2 of 2.

Topics covered in lecture six:

  • [00:30] The problem of order statistics.
  • [01:50] Naive algorithm for finding k-th smallest element.
  • [04:30] Randomized divide and conquer algorithm.
  • [11:20] Example of algorithm run on array (6, 10, 13, 5, 8, 3, 2, 11) to find 7-th smallest element.
  • [15:30] Intuitive analysis of Randomized-Select algorithm.
  • [20:30] Mathematically rigorous analysis of expected running time.
  • [43:55] Worst-case linear time order statistics algorithm.

Have fun! The next post will be all about hashing!

PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"):

This article is part of the article series "MIT Introduction to Algorithms."
<- previous article next article ->

Comments

giffo Permalink
August 28, 2008, 03:27

Thank you for posting this :)

Huh... Permalink
August 28, 2008, 16:52

The RandomizedPartition algorithm referred to from the previous lecture modifies the source data. This means that in the general case you would need to make a complete copy of the array argument.

Meanwhile, bin sorting at can be accomplished in linear time, which suggests an alternative approach for some cases: perform a bin sort against the list, then find the k-th value from that sorted list (this would be valid work when max(A) - min(A) was not significantly larger than length(A))

September 02, 2008, 14:53

Nice, if you'd like to download them and you don't know how to, I've written a small post on my blog:

http://blog.srikanths.net/2007/11/introduction-to-algorithms.html

abc Permalink
September 16, 2008, 19:04

Can anyone answer the below question for me pleas??

Let X[1::n] and Y [1::n] be two arrays, each containing n numbers already in sorted order.
Give an O(lg n) time algorithm to find the nth and n+1st largest elements of the 2n elements
in arrays X and Y . Your algorithmic description must start with a brief overview of your
algorithm in English.

anonymous1 Permalink
November 19, 2008, 19:47

can anyone answer the below question please?!
thanks in advance..

as seen in the lesson we divide our input in groups of 5 elements.
1)Is it possible to divide 'em in groups of 7? will the result be the same?
2)and if i take groups of 3, why is the complexity O(nlgn) and not O(n)
Thanks
appriciate it.

Rajesh Singh Permalink
August 01, 2009, 07:54

Can any ony provide the material of Gibbs sampling used in Bayesian analysis.

June 06, 2010, 11:09

Hi,

great, thx, it helped a lot.
I found a small typo in the pseudo code

9th line is
" return Randomized-Select(A, p, r-q, k)"

but as far as i know it should be
" return Randomized-Select(A, p, r-1, k)"

r-q does not make sense, it should be r-1

thx,
Viktor

June 07, 2010, 01:08

Viktor, you are absolutely right. :) I corrected the mistake now!

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the first letter of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements