Introduction to Algorithms August 27, 2008

# MIT's Introduction to Algorithms, Lecture 6: Order Statistics

<- previous article next article ->

This is the fourth post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review lecture six, which is on the topic of Order Statistics.

The problem of order statistics can be described as following. Given a set of N elements, find k-th smallest element in it. For example, the minimum of a set is the first order statistic (k = 1) and the maximum is the N-th order statistic (k = N).

Lecture six addresses the problem of selecting the k-th order statistic from a set of N distinct numbers.

The lecture begins with discussing the naive algorithm for finding k-th order statistic -- sort the given set (or array) A of numbers and return k-th element A[k]. Running time of such algorithm at best is O(n·log(n)).

Erik Demaine (professor) presents a randomized divide and conquer algorithm for this problem (my second post covers divide and conquer algorithms), which runs in expected linear time.

The key idea of this algorithm is to call Randomized-Partition (from previous lecture) subroutine on the given array and then recursively conquer on the partitioned elements on the left and right of the chosen pivot element.

Here is the pseudo code of this algorithm:

```Randomized-Select(A, p, q, k) // finds k-th smallest element in array A[p..q]
if p == q
return A[p]
r = Randomized-Partition(A, p, q)
n = r - p + 1
if k == n
return A[r]
if k < n
return Randomized-Select(A, p, r-1, k)
else
return Randomized-Select(A, r+1, q, k-n)
```

The lecture continues with intuitive and then mathematically rigorous analysis of the algorithm. It is concluded that the expected running time of this algorithm is O(n) and the worst case (when the Randomized-Partition algorithm chooses bad pivots) is O(n2).

At the last minutes of the lecture a worst-case linear time order statistics algorithm is presented. This algorithm was invented by five scientists - M. Blum, R. W. Floyd, V. Pratt, R. Rivest and R. Tarjan. I found the original publication of this algorithm by these gentlemen: Time Bounds for Selection.

You're welcome to watch lecture six:

Lecture six notes:

 Lecture 6, page 1 of 2. Lecture 6, page 2 of 2.

Topics covered in lecture six:

• [00:30] The problem of order statistics.
• [01:50] Naive algorithm for finding k-th smallest element.
• [04:30] Randomized divide and conquer algorithm.
• [11:20] Example of algorithm run on array (6, 10, 13, 5, 8, 3, 2, 11) to find 7-th smallest element.
• [15:30] Intuitive analysis of Randomized-Select algorithm.
• [20:30] Mathematically rigorous analysis of expected running time.
• [43:55] Worst-case linear time order statistics algorithm.

Have fun! The next post will be all about hashing!

PS. This course is taught from the CLRS book (also called "Introduction to Algorithms"):

<- previous article next article ->

Thank you for posting this :)

The RandomizedPartition algorithm referred to from the previous lecture modifies the source data. This means that in the general case you would need to make a complete copy of the array argument.

Meanwhile, bin sorting at can be accomplished in linear time, which suggests an alternative approach for some cases: perform a bin sort against the list, then find the k-th value from that sorted list (this would be valid work when max(A) - min(A) was not significantly larger than length(A))

Nice, if you'd like to download them and you don't know how to, I've written a small post on my blog:

http://blog.srikanths.net/2007/11/introduction-to-algorithms.html

Can anyone answer the below question for me pleas??

Let X[1::n] and Y [1::n] be two arrays, each containing n numbers already in sorted order.
Give an O(lg n) time algorithm to find the nth and n+1st largest elements of the 2n elements
in arrays X and Y . Your algorithmic description must start with a brief overview of your
algorithm in English.

as seen in the lesson we divide our input in groups of 5 elements.
1)Is it possible to divide 'em in groups of 7? will the result be the same?
2)and if i take groups of 3, why is the complexity O(nlgn) and not O(n)
Thanks
appriciate it.

Can any ony provide the material of Gibbs sampling used in Bayesian analysis.

Hi,

great, thx, it helped a lot.
I found a small typo in the pseudo code

9th line is
" return Randomized-Select(A, p, r-q, k)"

but as far as i know it should be
" return Randomized-Select(A, p, r-1, k)"

r-q does not make sense, it should be r-1

thx,
Viktor

Viktor, you are absolutely right. :) I corrected the mistake now!