Bit Hacks

I decided to write an article about a thing that is second nature to embedded systems programmers - low level bit hacks. Bit hacks are ingenious little programming tricks that manipulate integers in a smart and efficient manner. Instead of performing some operation (such as counting the 1 bits in an integer) by looping over individual bits, these programming nuggets do the same with one or two carefully chosen bitwise operations.

To get things going I'll assume that you know what the two's complement binary representation of an integer is and also that you know all the the bitwise operations.

I'll use the following notation for bitwise operations in the article:

&   -  bitwise and
|   -  bitwise or
^   -  bitwise xor
~   -  bitwise not
<<  -  bitwise shift left
>>  -  bitwise shift right

The numbers in the article are 8 bit signed integers (though the operations work on arbitrary length signed integers) that are represented as two's complement and they are usually named 'x'. The result is usually 'y'. The individual bits of 'x' are named b7, b6, b5, b4, b3, b3, b2, b1 and b0. The bit b7 is the sign bit (the most significant bit), and b0 is the least significant.

I'll start with the most basic bit hacks and gradually progress to more difficult ones. I'll use examples to explain how each bithack works.

If you are intrigued by this topic I urge you to subscribe to my blog. I can share a secret that there will be the 2nd part of this article where I cover more advanced bit hacks, and I will also release a cheat sheet with all these bit tricks! It's well worth subscribing!

Here we go.

Bit Hack #1. Check if the integer is even or odd.

if ((x & 1) == 0) {
  x is even
}
else {
  x is odd
}

I am pretty sure everyone has seen this trick. The idea here is that an integer is odd if and only if the least significant bit b0 is 1. It follows from the binary representation of 'x', where bit b0 contributes to either 1 or 0. By AND-ing 'x' with 1 we eliminate all the other bits than b0. If the result after this operation is 0, then 'x' was even because bit b0 was 0. Otherwise 'x' was odd.

Let's look at some examples. Let's take integer 43, which is odd. In binary 43 is 00101011. Notice that the least significant bit b0 is 1 (in bold). Now let's AND it with 1:

    00101011
&   00000001   (note: 1 is the same as 00000001)
    --------
    00000001

See how AND-ing erased all the higher order bits b1-b7 but left bit b0 the same it was? The result is thus 1 which tells us that the integer was odd.

Now let's look at -43. Just as a reminder, a quick way to find negative of a given number in two's complement representation is to invert all bits and add one. So -43 is 11010101 in binary. Again notice that the last bit is 1, and the integer is odd. (Note that if we used one's complement it wouldn't be true!)

Now let's take a look at an even integer 98. In binary 98 is 1100010.

    01100010
&   00000001
    --------
    00000000

After AND-ing the result is 0. It means that the bit b0 of original integer 98 was 0. Thus the given integer is even.

Now the negative -98. It's 10011110. Again, bit b0 is 0, after AND-ing, the result is 0, meaning -98 is even, which indeed is true.

Bit Hack #2. Test if the n-th bit is set.

if (x & (1<<n)) {
  n-th bit is set
}
else {
  n-th bit is not set
}

In the previous bit hack we saw that (x & 1) tests if the first bit is set. This bit hack improves this result and tests if n-th bit is set. It does it by shifting that first 1-bit n positions to the left and then doing the same AND operation, which eliminates all bits but n-th.

Here is what happens if you shift 1 several positions to the left:

1         00000001    (same as 1<<0)
1<<1      00000010
1<<2      00000100
1<<3      00001000
1<<4      00010000
1<<5      00100000
1<<6      01000000
1<<7      10000000

Now if we AND 'x' with 1 shifted n positions to the left we effectively eliminate all the bits but n-th bit in 'x'. If the result after AND-ing is 0, then that bit must have been 0, otherwise that bit was set.

Let's look at some examples.

Does 122 have 3rd bit set? The operation we do to find it out is:

122 & (1<<3)

Now, 122 is 01111010 in binary. And (1<<3) is 00001000.

    01111010
&   00001000
    --------
    00001000

We see that the result is not 0, so yes, 122 has the 3rd bit set.

Note: In my article bit numeration starts with 0. So it's 0th bit, 1st bit, ..., 7th bit.

What about -33? Does it have the 5th bit set?

    11011111      (-33 in binary)
&   00100000     (1<<5)
    --------
    00000000

Result is 0, so the 5th bit is not set.

Bit Hack #3. Set the n-th bit.

y = x | (1<<n)

This bit hack combines the same (1<<n) trick of setting n-th bit by shifting with OR operation. The result of OR-ing a variable with a value that has n-th bit set is turning that n-th bit on. It's because OR-ing any value with 0 leaves the value the same; but OR-ing it with 1 changes it to 1 (if it wasn't already). Let's see how that works in action:

Suppose we have value 120, and we wish to turn on the 2nd bit.

    01111000    (120 in binary)
|   00000100    (1<<2)
    --------
    01111100

What about -120 and 6th bit?

    10001000   (-120 in binary)
|   01000000   (1<<6)
    --------
    11001000

Bit Hack #4. Unset the n-th bit.

y = x & ~(1<<n)

The important part of this bithack is the ~(1<<n) trick. It turns on all the bits except n-th.

Here is how it looks:

~1        11111110  (same as ~(1<<0))
~(1<<1)   11111101
~(1<<2)   11111011
~(1<<3)   11110111
~(1<<4)   11101111
~(1<<5)   11011111
~(1<<6)   10111111
~(1<<7)   01111111

The effect of AND-ing variable 'x' with this quantity is eliminating n-th bit. It does not matter if the n-th bit was 0 or 1, AND-ing it with 0 sets it to 0.

Here is an example. Let's unset 4th bit in 127:

    01111111    (127 in binary)
&   11101111    (~(1<<4))
    --------
    01101111

Bit Hack #5. Toggle the n-th bit.

y = x ^ (1<<n)

This bit hack also uses the wonderful "set n-th bit shift hack" but this time it XOR's it with the variable 'x'. The result of XOR-ing something with something else is that if both bits are the same, the result is 0, otherwise it's 1. How does it toggle n-th bit? Well, if n-th bit was 1, then XOR-ing it with 1 changes it to 0; conversely, if it was 0, then XOR-ing with with 1 changes it to 1. See, the bit got flipped.

Here is an example. Suppose you want to toggle 5th bit in value 01110101:

    01110101
^   00100000
    --------
    01010101

What about the same value but 5th bit originally 0?

    01010101
^   00100000
    --------
    01110101

Notice something? XOR-ing the same bit twice returned it to the same value. This nifty XOR property is used in calculating parity in RAID arrays and used in simple cryptography cyphers, but more about that in some other article.

Bit Hack #6. Turn off the rightmost 1-bit.

y = x & (x-1)

Now it finally gets more interesting!!! Bit hacks #1 - #5 were kind of boring to be honest.

This bit hack turns off the rightmost one-bit. For example, given an integer 00101010 (the rightmost 1-bit in bold) it turns it into 00101000. Or given 00010000 it turns it into 0, as there is just a single 1-bit.

Here are more examples:

    01010111    (x)
&   01010110    (x-1)
    --------
    01010110

    01011000    (x)
&   01010111    (x-1)
    --------
    01010000

    10000000    (x = -128)
&   01111111    (x-1 = 127 (with overflow))
    --------
    00000000

    11111111    (x = all bits 1)
&   11111110    (x-1)
    --------
    11111110

    00000000    (x = no rightmost 1-bits)
&   11111111    (x-1)
    --------
    00000000

Why does it work?

If you look at the examples and think for a while, you'll realize that there are two possible scenarios:

1. The value has the rightmost 1 bit. In this case subtracting one from it sets all the lower bits to one and changes that rightmost bit to 0 (so that if you add one now, you get the original value back). This step has masked out the rightmost 1-bit and now AND-ing it with the original value zeroes that rightmost 1-bit out.

2. The value has no rightmost 1 bit (all 0). In this case subtracting one underflows the value (as it's signed) and sets all bits to 1. AND-ing all zeroes with all ones produces 0.

Bit Hack #7. Isolate the rightmost 1-bit.

y = x & (-x)

This bit hack finds the rightmost 1-bit and sets all the other bits to 0. The end result has only that one rightmost 1-bit set. For example, 01010100 (rightmost bit in bold) gets turned into 00000100.

Here are some more examples:

    10111100  (x)
&   01000100  (-x)
    --------
    00000100

    01110000  (x)
&   10010000  (-x)
    --------
    00010000

    00000001  (x)
&   11111111  (-x)
    --------
    00000001

    10000000  (x = -128)
&   10000000  (-x = -128)
    --------
    10000000

    11111111  (x = all bits one)
&   00000001  (-x)
    --------
    00000001

    00000000  (x = all bits 0, no rightmost 1-bit)
&   00000000  (-x)
    --------
    00000000

This bit hack works because of two's complement. In two's complement system -x is the same as ~x+1. Now let's examine the two possible cases:

1. There is a rightmost 1-bit bi. In this case let's pivot on this bit and divide all other bits into two flanks - bits to the right and bits to the left. Remember that all the bits to the right bi-1, bi-2 ... b0 are 0's (because bi was the rightmost 1-bit). And bits to the left are the way they are. Let's call them bi+1, ..., bn.

Now, when we calculate -x, we first do ~x which turns bit bi into 0, bits bi-1 ... b0 into 1s, and inverts bits bi+1, ..., bn, and then we add 1 to this result.

Since bits bi-1 ... b0 are all 1's, adding one makes them carry this one all the way to bit bi, which is the first zero bit.

If we put it all together, the result of calculating -x is that bits bi+1, ..., bn get inverted, bit bi stays the same, and bits bi-1, ..., b0 are all 0's.

Now, AND-ing x with -x makes bits bi+1, ..., bn all 0, leaves bit bi as is, and sets bits bi-1, ..., b0 to 0. Only one bit is left, it's the bit bi - the rightmost 1-bit.

2. There is no rightmost 1-bit. The value is 0. The negative of 0 in two's complement is also 0. 0&0 = 0. No bits get turned on.

We have proved rigorously that this bithack is correct.

Bit Hack #8. Right propagate the rightmost 1-bit.

y = x | (x-1)

This is best understood by an example. Given a value 01010000 it turns it into 01011111. All the 0-bits right to the rightmost 1-bit got turned into ones.

This is not a clean hack, tho, as it produces all 1's if x = 0.

Let's look at more examples:

    10111100  (x)
|   10111011  (x-1)
    --------
    10111111

    01110111  (x)
|   01110110  (x-1)
    --------
    01110111

    00000001  (x)
|   00000000  (x-1)
    --------
    00000001

    10000000  (x = -128)
|   01111111  (x-1 = 127)
    --------
    11111111

    11111111  (x = -1)
|   11111110  (x-1 = -2)
    --------
    11111111

    00000000  (x)
|   11111111  (x-1)
    --------
    11111111

Let's prove it, though not as rigorously as in the previous bithack (as it's too time consuming and this is not a scientific publication). There are two cases again. Let's start with easiest first.

1. There is no rightmost 1-bit. In that case x = 0 and x-1 is -1. -1 in two's complement is 11111111. OR-ing 0 with 11111111 produces the same 11111111. (Not the desired result, but that's the way it is.)

2. There is the rightmost 1-bit bi. Let's divide all the bits in two groups again (like in the previous example). Calculating x-1 modifies only bits to the right, turning bi into 0, and all the lower bits to 1's. Now OR-ing x with x-1 leaves all the higher bits (to the left) the same, leaves bit bi as it was 1, and since lower bits are all low 1's it also turns them on. The result is that the rightmost 1-bit got propagated to lower order bits.

Bit Hack #9. Isolate the rightmost 0-bit.

y = ~x & (x+1)

This bithack does the opposite of #7. It finds the rightmost 0-bit, turns off all bits, and sets this bit to 1 in the result. For example, it finds the zero in bold in this number 10101011, producing 00000100.

More examples:

    10111100  (x)
    --------
    01000011  (~x)
&   10111101  (x+1)
    --------
    00000001

    01110111  (x)
    --------
    10001000  (~x)
&   01111000  (x+1)
    --------
    00001000

    00000001  (x)
    --------
    11111110  (~x)
&   00000010  (x+1)
    --------
    00000010

    10000000  (x = -128)
    --------
    01111111  (~x)
&   10000001  (x+1)
    --------
    00000001

    11111111  (x = no rightmost 0-bit)
    --------
    00000000  (~x)
&   00000000  (x+1)
    --------
    00000000

    00000000  (x)
    --------
    11111111  (~x)
&   00000001  (x+1)
    --------
    00000001

Proof: Suppose there is a rightmost 0-bit. Then ~x turns this rightmost 0 bit into 1 bit. And so does x+1 (because bits more right to the rightmost 0 bit are 1's). Now AND-ing ~x with x+1 evaporates all the bits up to this rightmost 0 bit. This is the highest order bit set in the result. Now what about lower order bits to the right of rightmost 0 bit? They also got evaporated because because x+1 turned them into 0's (they were 1's) and ~x turned them into 0's. They got AND-ed with 0 and evaporated.

Bit Hack #10. Turn on the rightmost 0-bit.

y = x | (x+1)

This hack changes the rightmost 0-bit into 1. For example, given an integer 10100011 it turns it into 10100111.

More examples:

    10111100  (x)
|   10111101  (x+1)
    --------
    10111101

    01110111  (x)
|   01111000  (x+1)
    --------
    01111111

    00000001  (x)
|   00000010  (x+1)
    --------
    00000011

    10000000  (x = -128)
|   10000001  (x+1)
    --------
    10000001

    11111111  (x = no rightmost 0-bit)
|   00000000  (x+1)
    --------
    11111111

    00000000  (x)
|   00000001  (x+1)
    --------
    00000001

Here is the proof as a bunch of true statements. OR-ing x with x+1 does not lose any information. Adding 1 to x fills the first rightmost 0. The result is max{x, x+1}. If x+1 overflows it's x and there were no 0 bits. If it doesn't, it's x+1 which just got rightmost bit filled with 1.

Bonus stuff.

If you decide to play more with these hacks, here are a few utility functions to print binary values of 8 bit signed integers in Perl, Python and C.

Print binary representation in Perl:

sub int_to_bin {
  my $num = shift;
  print unpack "B8", pack "c", $num;
}

Or you can print it from command line right away:

perl -wle 'print unpack "B8", pack "c", shift' <integer>

# For example:
perl -wle 'print unpack "B8", pack "c", shift' 113
01110001

perl -wle 'print unpack "B8", pack "c", shift' -- -128
10000000

Print binary number in Python:

def int_to_bin(num, bits=8):
 r = ''
 while bits:
  r = ('1' if num&1 else '0') + r
  bits = bits - 1
  num = num >> 1
 print r

Print binary representation in C:

void int_to_bin(int num) {
  char str[9] = {0};
  int i;
  for (i=7; i>=0; i--) {
    str[i] = (num&1)?'1':'0';
    num >>= 1;
  }
  printf("%s\n", str);
}

Have fun with these! I'll write about advanced bit hacks some time soon. If you are really intrigued by this topic I encourage you to subscribe to my blog. Thanks! :)

Ps. Let me know in the comments what you think about this article, and let me know if you do not know what two's complement, or the basic binary operations are. If there are a few people who would like me to explain these concepts, I'll be glad to write another article just about these fundamental topics.

Pps. There is a book entirely on bit hacks like these. It's called "Hacker's Delight". It may be worth getting if you are into this stuff:

Google Python Search Library

Here is another quick hack that I wrote a while ago. It complements the xgoogle library that I published in my previous post with an API for Google Sponsored Links search.

Let me quickly explain why this library is useful, and what the Google Sponsored Links are.

For a typical search, Google shows regular web search results on the left side of the page, and "Sponsored Links" in a column on the right side. "Sponsored" means the results are pulled from Googe's advertising network (Adwords).

Here is a screenshot that illustrates the Sponsored Links:

Google Sponsored Links
Google Sponsored Links results for search term "security" are in red.

Okay, now why would I need a library to search the Sponsored results? Suppose that I am an advertiser on Adwords, and I buy some software related keywords like "video software". It is in my interests to know my competitors, their advertisement text, what are they up to, the new players in this niche, and their websites. Without my library it would be practically impossible to keep track of all the competitors. There can literally be hundreds of changes per day. However, with my library it's now piece of cake to keep track of all the dynamics.

How does the library work?

The sponsored links library pulls the results from this URL: http://www.google.com/sponsoredlinks. Here is an example of all the sponsored results for a query "security":

Google Sponsored Links for Security
Sponsored links results for "security".

The library just grabs page after page, calls BeautifulSoup, and extracts the search result elements. Elementary.

How to use the library?

As I mentioned, this library is part of my xgoogle library. Download and extract it first:

Download: xgoogle library (.zip)
Downloaded: 18896 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

Now, the source file that contains the implementation of this library is "xgoogle/sponsoredlinks.py". To use it, do the usual import "from xgoogle.sponsoredlinks import SponsoredLinks, SLError".

SponsoredLinks is the class that provides the API and SLError is exception class that gets thrown in case of errors, so it's a good idea to import both.

The SponsoredLinks has a similar interface as the xgoogle.search (the plain google search module). The constructor of SponsoredLinks takes the keyword you want to search for, and the constructed object has several public methods and properties:

  • method get_results() - gets a page of results, returning a list of SponsoredLink objects. It returns an empty list if there are no more results.
  • property num_results - returns number of search results found.
  • property results_per_page - sets/gets the number of results to get per page (max 100).

The returned SponsoredLink objects have four attributes -- "title", "desc", "url", and "display_url". Here is a picture that illustrates what each attribute stands for:

Coresspondence between sponsored link and result object

The picture does not show the "display_url" attribute as it's the actual link the result links to (href of blue link in the pic).

Here is an example usage of this library. It retrieves first 100 Sponsored Links results for keyword "video software":

from xgoogle.sponsoredlinks import SponsoredLinks, SLError
try:
  sl = SponsoredLinks("video software")
  sl.results_per_page = 100
  results = sl.get_results()
except SLError, e:
  print "Search failed: %s" % e

for result in results:
  print result.title.encode('utf8')
  print result.desc.encode('utf8')
  print result.display_url.encode('utf8')
  print result.url.encode('utf8')
  print

Output:

Photoshop Video Software
Time saving software for video. Work faster in Photoshop. 
www.toolsfortelevision.com
http://www.toolsfortelevision.com

...

That's about it for this time. Use it to find your competitors and outsmart them!

Next time I am going to expand the library for Google Sets search.

Download "xgoogle" library:

Download: xgoogle library (.zip)
Downloaded: 18896 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

Have fun!

Google Python Search Library

Here is a quick hack that I wrote. It's a Python library to search Google without using their API. It's quick and dirty, just the way I love it.

Why didn't I use Google's provided REST API? Because it says "you can only get up to 8 results in a single call and you can't go beyond the first 32 results". Seriously, what am I gonna do with just 32 results?

I wrote it because I want to do various Google hacks automatically, monitor popularity of some keywords and sites, and to use it for various other reasons.

One of my next post is going to extend on this library and build a tool that perfects your English. I have been using Google for a while to find the correct use of various English idioms, phrases, and grammar. For example, "i am programmer" vs. "i am a programmer". The first one is missing an indefinite article "a", but the second is correct. Googling for these terms reveal that the first has 6,230 results, but the second has 136,000 results, so I pretty much trust that the 2nd is more correct than the first.

Subscribe to my posts via catonmat's rss, if you are intrigued and would love to receive my posts automatically!

How to use the library?

First download the xgoogle library, and extract it somewhere.

Download: xgoogle library (.zip)
Downloaded: 18896 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

At the moment it contains just the code for Google search, but in the future I will add other searches (google sets, google suggest, etc).

To use the search, from "xgoogle.search" import "GoogleSearch" and, optionally, "SearchError".

GoogleSearch is the class you will use to do Google searches. SearchError is an exception class that GoogleSearch throws in case of various errors.

Pass the keyword you want to search as the first parameter to GoogleSearch's constructor. The constructed object has several public methods and properties:

  • method get_results() - gets a page of results, returning a list of SearchResult objects. It returns an empty list if there are no more results.
  • property num_results - returns number of search results found.
  • property results_per_page - sets/gets the number of results to get per page. Possible values are 10, 25, 50, 100.
  • property page - sets/gets the search page.

As I said, get_results() method returns a SearchResult object. It has three attributes -- "title", "desc", and "url". They are Unicode strings, so do a proper encoding before outputting them.

Here is a screenshot that illustrates the "title", "desc", and "url" attributes:

Google Search Result, url, title, description
Google search result for "catonmat".

Here is an example program of doing a Google search. It takes the first argument, does a search on it, and prints the results:

from xgoogle.search import GoogleSearch, SearchError
try:
  gs = GoogleSearch("quick and dirty")
  gs.results_per_page = 50
  results = gs.get_results()
  for res in results:
    print res.title.encode("utf8")
    print res.desc.encode("utf8")
    print res.url.encode("utf8")
    print
except SearchError, e:
  print "Search failed: %s" % e

This code fragment sets up a search for "quick and dirty" and specifies that a result page should have 50 results. Then it calls get_results() to get a page of results. Finally it prints the title, description and url of each search result.

Here is the output from running this program:

Quick-and-dirty - Wikipedia, the free encyclopedia
Quick-and-dirty is a term used in reference to anything that is an easy way to implement a kludge. Its usage is popular among programmers, ...
http://en.wikipedia.org/wiki/Quick-and-dirty

Grammar Girl's Quick and Dirty Tips for Better Writing - Wikipedia ...
"Grammar Girl's Quick and Dirty Tips for Better Writing" is an educational podcast that was launched in July 2006 and the title of a print book that was ...Writing - 39k - 
http://en.wikipedia.org/wiki/Grammar_Girl%27s_Quick_and_Dirty_Tips_for_Better_Writing

Quick & Dirty Tips :: Grammar  Girl
Quick & Dirty Tips(tm) and related trademarks appearing on this website are the property of Mignon Fogarty, Inc. and Holtzbrinck Publishers Holdings, LLC. ...
http://grammar.quickanddirtytips.com/
[...]

Quick and Dirty search on Google
Compare these results to the output above.

You could also have specified which search page to start the search from. For example, the following code will get 25 results per page and start the search at 2nd page.

gs = GoogleSearch("quick and dirty")
  gs.results_per_page = 25
  gs.page = 2
  results = gs.get_results()

You can also quickly write a scraper to get all the results for a given search term:

from xgoogle.search import GoogleSearch, SearchError
try:
  gs = GoogleSearch("quantum mechanics")
  gs.results_per_page = 100
  results = []
  while True:
    tmp = gs.get_results()
    if not tmp: # no more results were found
      break
    results.extend(tmp)
  # ... do something with all the results ...
except SearchError, e:
  print "Search failed: %s" % e

You can use this library to constantly monitor how your website is ranking for a given search term. Suppose your website has a domain "catonmat.net" and the search term you want to find your position for is "python videos".

Here is a code that outputs your ranking: (it looks through first 100 results, if you need more, put a loop there)

import re
from urlparse import urlparse
from xgoogle.search import GoogleSearch, SearchError

target_domain = "catonmat.net"
target_keyword = "python videos"

def mk_nice_domain(domain):
    """
    convert domain into a nicer one (eg. www3.google.com into google.com)
    """
    domain = re.sub("^www(\d+)?\.", "", domain)
    # add more here
    return domain

gs = GoogleSearch(target_keyword)
gs.results_per_page = 100
results = gs.get_results()
for idx, res in enumerate(results):
  parsed = urlparse(res.url)
  domain = mk_nice_domain(parsed.netloc)
  if domain == target_domain:
    print "Ranking position %d for keyword '%s' on domain %s" % (idx+1, target_keyword, target_domain)

Output of this program:

Ranking position 6 for keyword python videos on domain catonmat.net
Ranking position 7 for keyword python videos on domain catonmat.net

Here is a much wicked example. It uses the GeoIP Python module to find all 10 websites for keyword "wicked code" that are physically hosting in California or New York in USA. Make sure you download GeoCityLite database from "http://www.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz" and extract it to "/usr/local/geo_ip".

import GeoIP
from urlparse import urlparse
from xgoogle.search import GoogleSearch, SearchError

class Geo(object):
  GEO_PATH = "/usr/local/geo_ip/GeoLiteCity.dat"

  def __init__(self):
    self.geo = GeoIP.open(Geo.GEO_PATH, GeoIP.GEOIP_STANDARD)

  def detect_by_host(self, host):
    try:
      gir = self.geo.record_by_name(host)
      return {'country': gir['country_code'].lower(),
              'region': gir['region'].lower()}
    except Exception, e:
      return {'country': 'none', 'region': 'none'}

dst_country = 'us'
dst_states = ['ca', 'ny']
dst_keyword = "wicked code"
num_results = 10
final_results = []
geo = Geo()

gs = GoogleSearch(dst_keyword)
gs.results_per_page = 100

seen_websites = []
while len(final_results) < num_results:
  results = gs.get_results()
  domains = [urlparse(r.url).netloc for r in results]
  for d in domains:
    geo_loc = geo.detect_by_host(d)
    if (geo_loc['country'] == dst_country and
                 geo_loc['region'] in dst_states and
                 d not in seen_websites):
      final_results.append((d, geo_loc['region']))
      seen_websites.append(d)
      if len(final_results) == num_results:
        break

print "Found %d websites:" % len(final_results)
for w in final_results:
    print "%s (state: %s)" % w

Here is the output of running it:

Found 10 websites:
www.wickedcode.com (state: ca)
www.retailmenot.com (state: ca)
www.simplyhired.com (state: ca)
archdipesh.blogspot.com (state: ca)
wagnerblog.com (state: ca)
answers.yahoo.com (state: ca)
devsnippets.com (state: ca)
friendfeed.com (state: ca)
www.thedacs.com (state: ny)
www.tipsdotnet.com (state: ca)

You may modify these examples the way you wish. I'd love to hear some comments about what you can come up with!

And just for fun, here are some other simple uses:

You can make your own Google Fight:

import sys
from xgoogle.search import GoogleSearch, SearchError

args = sys.argv[1:]
if len(args) < 2:
 print 'Usage: google_fight.py "keyword 1" "keyword 2"'
 sys.exit(1)

try:
  n0 = GoogleSearch('"%s"' % args[0]).num_results
  n1 = GoogleSearch('"%s"' % args[1]).num_results
except SearchError, e:
  print "Google search failed: %s" % e
  sys.exit(1)

if n0 > n1:
  print "%s wins with %d results! (%s had %d)" % (args[0], n0, args[1], n1)
elif n1 > n0:
  print "%s wins with %d results! (%s had %d)" % (args[1], n1, args[0], n0)
else:
  print "It's a tie! Both keywords have %d results!" % n1

Download: google_fight.py
Downloaded: 3346 times.
Download url: http://www.catonmat.net/download/google_fight.py

Here is an example usage of google_fight.py:

$ ./google_fight.py google microsoft
google wins with 2680000000 results! (microsoft had 664000000)

$ ./google_fight.py "linux ubuntu" "linux gentoo"
linux ubuntu wins with 4300000 results! (linux gentoo had 863000)

After I wrote this, I generalized this Google Fight to take N keywords, and made their passing to program easier by allowing them to be separated by a comma.

import sys
from operator import itemgetter
from xgoogle.search import GoogleSearch, SearchError

args = sys.argv[1:]
if not args:
  print "Usage: google_fight.py keyword one, keyword two, ..."
  sys.exit(1)

keywords = [k.strip() for k in ' '.join(args).split(',')]
try:
  results = [(k, GoogleSearch('"%s"' % k).num_results) for k in keywords]
except SearchError, e:
  print "Google search failed: %s" % e
  sys.exit(1)

results.sort(key=itemgetter(1), reverse=True)
for res in results:
    print "%s: %d" % res

Download: google_fight2.py
Downloaded: 2893 times.
Download url: http://www.catonmat.net/download/google_fight2.py

Here is an example usage of google_fight2.py:

$ ./google_fight2.py earth atmospehere, sun atmosphere, moon atmosphere, jupiter atmosphere
earth atmospehere: 685000
jupiter atmosphere: 31400
sun atmosphere: 24900
moon atmosphere: 8130

I am going to expand on this library and add search for Google Sets, Google Sponsored Links, Google Suggest, and perhaps some other Google searches. Then I'm going to build various tools on them, like a sponsored links competitor finder, use Google Suggest together with Google Sets to find various phrases in English, and apply them to tens of other my ideas.

Download "xgoogle" library and examples:

Download: xgoogle library (.zip)
Downloaded: 18896 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

Download: google_fight.py
Downloaded: 3346 times.
Download url: http://www.catonmat.net/download/google_fight.py

Download: google_fight2.py
Downloaded: 2893 times.
Download url: http://www.catonmat.net/download/google_fight2.py

This article is part of the article series "MIT Introduction to Algorithms."
<- previous article next article ->
MIT Algorithms

This is the thirteenth post in an article series about MIT's lecture course "Introduction to Algorithms." In this post I will review lectures twenty and twenty-one on parallel algorithms. These lectures cover the basics of multithreaded programming and multithreaded algorithms.

Lecture twenty begins with a good overview of multithreaded programming paradigm, introduces to various concepts of parallel programming and at the end talks about the Cilk programming language.

Lecture twenty-one implements several multithreaded algorithms, such as nxn matrix addition, nxn matrix multiplication, and parallel merge-sort. It then goes into great detail to analyze the running time and parallelism of these algorithms.

Lecture 20: Parallel Algorithms I

The lecture starts with an introduction to a particular model of parallel computation called dynamic multithreading. It's easiest to understand this model with an example. Here is a multithreaded algorithm that computes Fibonacci numbers:

Fibonacci(n):
1   if n < 2:                     | thread A
2     return n                    | thread A
3   x = spawn Fibonacci(n-1)      | thread A
4   y = spawn Fibonacci(n-2)      | thread B
5   sync                          | thread C
6   return x + y                  | thread C

# this is a really bad algorithm, because it's
# exponential time.
#
# see lecture three for four different
# classical algorithms that compute Fibonacci numbers

(A thread is defined to be a maximal sequence of instructions not containing the parallel control statements spawn, sync, and return, not something like Java threads or Posix threads.)

I put the most important keywords in bold. They are "spawn" and "sync".

The keyword spawn is the parallel analog of an ordinary subroutine call. Spawn before the subroutine call in line 3 indicates that the subprocedure Fibonacci(n-1) can execute in parallel with the procedure Fibonacci(n) itself. Unlike an ordinary function call, where the parent is not resumed until after its child returns, in the case of a spawn, the parent can continue to execute in parallel with the child. In this case, the parent goes on to spawn Fibonacci(n-2). In general, the parent can continue to spawn off children, producing a high degree of parallelism.

A procedure cannot safely use the return values of the children it has spawned until it executes a sync statement. If any of its children have not completed when it executes a sync, the procedure suspends and does not resume until all of its children have completed. When all of its children return, execution of the procedure resumes at the point immediately following the sync statement. In the Fibonacci example, the sync statement in line 5 is required before the return statement in line 6 to avoid the anomaly that would occur if x and y were summed before each had been computed.

The spawn and sync keywords specify logical parallelism, not "actual" parallelism. That is, these keywords indicate which code may possibly execute in parallel, but what actually runs in parallel is determined by a scheduler, which maps the dynamically unfolding computation onto the available processors.

The lecture then continues with a discussion on how a parallel instruction stream can be viewed as a directed acyclic graph G = (V, E), where threads make up the set V of vertices, and each spawn and return statement makes up an edge in E. There are also continuation edges in E that exist between statements.

In Fibonacci example there are three threads: lines 1, 2, 3 make up the thread A, line 4 is thread B and lines 5, 6 are thread C.

Here is how a graph of Fibonacci(4) computation looks like:

Multithreading Directed Acyclic Graph

A dag representing multithreaded computation of Fibonacci(4).

Here the threads are show as circles, and each group of threads belonging to the same procedure are surrounded by a rounded rectangle. Downward edges are spawn dependencies, horizontal edges represent continuation dependencies within a procedure, and upward edges are return dependencies. Every computation starts with a single initial thread, and ends with a single final thread.

So to compute Fib(4), first Fib(3) and Fib(2) have to be computed. Fib(4) spawns one thread to compute Fib(3) and another thread to compute Fib(2) and sync'es. In order to compute Fib(3), values of Fib(2) and Fib(1) have to be known, and to compute Fib(2) values of Fib(1) and Fib(0) have to be known. Fib(3) and Fib(2) spawn their threads and sync. This way the computation unwinds.

Next, the lecture introduces some concepts for measuring performance of multithreaded algorithms.

First some time measures:

  • Tp - running time of an algorithm on p processors,
  • T1 - running time of algorithm on 1 processor, and
  • T as critical path length, that is the longest time to execute the algorithm on infinite number of processors.

Based on these measures, the speedup of a computation on p processors is T1/Tp, that is how much faster a p processor machine executes the algorithm than a one processor machine. If the speedup is O(p), then we say that the computation has a linear speedup. If, however, speedup > p, then we call it a super-linear speedup. The maximum possible speedup is called parallelism and is computed by T1/T.

After these concepts the lecture puts forward the concept of scheduling and greedy scheduler.

A scheduler maps computation to p processors.

The greedy scheduler schedules as much as it can at every time step. On a p-processor machine every step can be classified into two types. If there are p or more threads ready to execute, the scheduler executes any p of them (this step is also called a complete step). If there are fewer than p threads, it executes all of them (called incomplete step). The greedy scheduler is an offline scheduler and it's 2-competitive (see lecture 13 on online/offline algorithms and the meaning of competitive algorithm).

The lecture ends with an introduction to Cilk parallel, multithreaded programming language. It uses a randomized online scheduler. It was used to write two chess programs *Socrates and Cilkchess.

See Charles Leiseron's handout for a much more comprehensive overview of dynamic multithreading: A Minicourse on Dynamic Multithreaded Algorithms (pdf).

You're welcome to watch lecture twenty:

Topics covered in lecture twenty:

  • [01:33] Parallel algorithms.
  • [02:33] Dynamic multi-threading.
  • [03:15] A multithreaded algorithm for fibonnaci numbers.
  • [04:50] Explanation of spawn and sync.
  • [06:45] Logical parallelism.
  • [08:00] Multithreaded computation.
  • [12:15] Fibonacci(4) computation graph, and its concepts - init thread, spawn edge, continuation edge, return edge, final thread.
  • [17:45] Edges: spawn, return, continuation.
  • [19:40] Performance measures: running time on p processors Tp, work (serial time) T1, critcial path length (longest path) Tinfinity.
  • [24:15] Lower bounds on Tp (running time on p processors).
  • [29:35] Speedup, linear speedup, superlinear speedup.
  • [32:55] Maximum possible speedup.
  • [36:40] Scheduling.
  • [38:55] Greedy scheduler.
  • [43:40] Theorem by Grant and Brent (bound on Tp for greedy scheduler).
  • [46:20] Proof of Grant, Brent theorem.
  • [56:50] Corollary on greedy scheduler's linear speedup.
  • [01:02:20] Cilk programming language.
  • [01:06:00] Chess programs: *Socrates, Clikchess.
  • [01:06:30] Analysis of *Socrates running time on 512 processors.

Lecture twenty notes:

MIT Algorithms Lecture 20 Notes Thumbnail. Page 1 of 2.
Lecture 20, page 1 of 2: dynamic multithreading, multithreaded computation, performance measures, speedup, scheduling, greedy scheduler.

MIT Algorithms Lecture 20 Notes Thumbnail. Page 2 of 2.
Lecture 20, page 2 of 2: grant, brent theorem, cilk, socrates, cilkchess.

Lecture 21: Parallel Algorithms II

Lecture twenty-one is all about real multithreaded algorithms and their analysis.

It starts with a hugly important problem of multithreaded matrix multiplication: given two nxn matrices A and B, write a multithreaded algorithm Matrix-Multiply that multiplies them, producing matrix C. The natural approach is to use divide and conquer method from lecture three, and formulate the problem as follows:

Multithreaded Matrix Multiplication

Each nxn matrix can be expressed as 8 multiplications and 4 additions of (n/2)x(n/2) submatrices. So the first thing we need is Matrix-Add algorithm that adds two matrices in parallel.

Here is the parallel Matrix-Add algorithm:

Matrix-Add(C, T, n):
   // Adds matrices C and T in-place, producing C = C + T
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = C[1, 1] + T[1, 1] 
   else:
     partition C and T into (n/2)x(n/2) submatrices
     spawn Matrix-Add(C11, T11, n/2) 
     spawn Matrix-Add(C12, T12, n/2) 
     spawn Matrix-Add(C21, T21, n/2) 
     spawn Matrix-Add(C22, T22, n/2) 
     sync 

The base case for this algorithm is when matrices are just scalars, then it just adds the only element of both matrices. In all other cases, it partitions matrices C and T into (n/2)x(n/2) matrices and spawns four subproblems.

Now we can implement the Matrix-Multiply algorithm, by doing 8 multiplications in parallel and then calling Matrix-Add to do 4 additions.

Here is the parallel Matrix-Multiply algorithm:

Matrix-Multiply(C, A, B, n):
   // Multiplies matrices A and B, storing the result in C.
   // n is power of 2 (for simplicity).
   if  n == 1:
     C[1, 1] = A[1, 1] · B[1, 1] 
   else:
     allocate a temporary matrix T[1...n, 1...n] 
     partition A, B, C, and T into (n/2)x(n/2) submatrices 
     spawn Matrix-Multiply(C12,A11,B12, n/2) 
     spawn Matrix-Multiply(C21,A21,B11, n/2) 
     spawn Matrix-Multiply(C22,A21,B12, n/2) 
     spawn Matrix-Multiply(T11,A12,B21, n/2) 
     spawn Matrix-Multiply(T12,A12,B22, n/2) 
     spawn Matrix-Multiply(T21,A22,B21, n/2) 
     spawn Matrix-Multiply(T22,A22,B22, n/2) 
     sync 
     Matrix-Add(C, T, n)

The lecture then proceeds to calculating how good the algorithm is, turns out the parallelism (ratio of time of algorithm running on a single processor T1 to running on infinite number of processors Tinf) for doing 1000x1000 matrix multiplication is 107, which the parallel algorithm is fast as hell.

Matrix-Multiply used a temporary matrix which was actually unnecessary. To achieve higher performance, it's often advantageous for an algorithm to use less space, because more space means more time. Therefore a faster algorithm can be added that multiplies matrices and adds them in-place. See lecture at 33:05 for Matrix-Multiply-Add algorithm. The parallelism of this algorithm for 1000x1000 matrix multiplication is 106, smaller than the previous one, but due to fact that no temporary matrices had to be used, it beats Matrix-Multiply algorithm in practice.

The lecture then proceeds to parallel sorting. The classical sorting algorithms were covered in lectures four and five. This lecture parallelizes the Merge-Sort algorithm.

Here is the parallel merge-sort algorithm:

Parallel-Merge-Sort(A, p, r): // sort A[p...r]
  if p < r:
    q = floor((p+r)/2)
    spawn Parallel-Merge-Sort(A, p, q)
    spawn Parallel-Merge-Sort(A, q+1, r)
    sync
    Merge(A, p, q, r) // merge A[p...q] with A[q+1...r]

Instead of recursively calling Parallel-Merge-Sort for subproblems, we spawn them, thus utilizing the power of parallelism. After doing analysis turns out that due to fact that we use the classical Merge() function, the parallelism is just lg(n). That's puny!

To solve this problem, a Parallel-Merge algorithm has to be written. The lecture continues with presenting a Parallel-Merge algorithm, and after that does the analysis of the same Parallel-Merge-Sort algorithm but with Parallel-Merge function. Turns out this algorithm has parallelism of n/lg2(n), that is much better. See lecture at 51:00 for Parallel-Merge algorithm.

The best parallel sorting algorithm know to date has the parallelism of n/lg(n).

You're welcome to watch lecture twenty-one:

Topics covered in lecture twenty-one:

  • [00:35] Multithreaded algorithms.
  • [01:32] Multithreaded matrix multiplication.
  • [02:05] Parallelizing with divide and conquer.
  • [04:30] Algorithm Matrix-Multiply, that computes C = A*B.
  • [10:13] Algorithm Matrix-Add, that computes C = C+T.
  • [12:45] Analysis of running time of Matrix-Multiply and Matrix-Add.
  • [19:40] Analysis of critical path length of Matrix-Multiply and Matrix-Add.
  • [26:10] Parallelism of Matrix-Multiply and Matrix-Add.
  • [33:05] Algorithm Matrix-Multiply-Add, that computes C = C + A*B.
  • [35:50] Analysis of Matrix-Multiply-Add.
  • [42:30] Multithreaded, parallel sorting.
  • [43:30] Parallelized Merge-Sort algorithm.
  • [45:55] Analysis of multithreaded Merge-Sort.
  • [51:00] Parallel-Merge algorithm.
  • [01:00:30] Analysis of Parallel-Merge algorithm.

Lecture twenty-one notes:

MIT Algorithms Lecture 21 Notes Thumbnail. Page 1 of 2.
Lecture 21, page 1 of 2: multithreaded algorithms, matrix multiplication, analysis (critical path length, parallelism).

MIT Algorithms Lecture 21 Notes Thumbnail. Page 2 of 2.
Lecture 21, page 2 of 2: paralel sorting, merge sort, parallel merge, analysis.

Have fun with parallel, multithreaded algorithms! The next post is going to be on cache oblivious algorithms! They are a class of algorithms that exploit all the various caches in modern computer architecture to make them run notably faster.

JavaScript: The Good Parts

I found a really nice video lecture on JavaScript. I'm a JavaScript journeyman myself so I decided to review it to learn something new. I have written about JavaScript in the past -- the previous post on JavaScript was "Learning JavaScript through Video Lectures".

This lecture is given by JavaScript guru Doug Crockford. He's the author of JSON, JSMin JavaScript minifier and JSLint.

The talk is based on Douglas's recent book "JavaScript: The Good Parts". It's excellent.

The lecture begins with a brief intro of why JavaScript is the way it is and how it came to be so. An interesting point Doug makes is that JavaScript is basically Scheme, except with Java syntax. He talks about the bad and good parts of JavaScript and gives a few examples of common JavaScript problems, their solutions and patterns. Douglas also talks about how he discovered JSON. After the talk there is a 13 minute Q and A.

You're welcome to watch the video lecture. It's 1 hour long and it will definitely make your understanding of JavaScript better:

Here are some interesting points from the lecture:

[09:57] JavaScript was influenced by Scheme, Java and Perl. From Scheme it borrowed lambda functions and loose typing. From Java it took most of the syntax, and from Perl some of its regular expressions. And finally, it derived the idea of prototypal inheritance and dynamic objects from Self. See my previous post on JavaScript for explanation of prototypal inheritance.

[11:38] JavaScript bad parts:

  • Global variables. Global variables make it harder to run independent subprograms in the same program. If the subprograms happen to have global variables that share the same names, then they will interfere with each other and likely fail, usually in difficult to diagnose ways.
  • Newlines get converted into semicolons. JavaScript has a mechanism that tries to correct faulty programs by automatically inserting semicolons. It sometimes inserts semicolons in places where they are not welcome. For example:
    return
    {
        status: true
    };
    
    returns undefined because JavaScript inserts ';' after return.Correct way:
    return {
        status: true
    };
    

  • Operator typeof is not very helpful. For example, "typeof null" is "object", "typeof [1,2,3]" is also "object".
  • Operator + adds numbers and concatenates. The + operator can add or concatenate. Which one it does depends on the types of the parameters. If either operand is an empty string, it produces the other operand converted to a string. If both operands are numbers, it produces the sum. Otherwise, it converts both operands to strings and concatenates them. This complicated behavior is a common source of bugs. If you intend + to add, make sure that both operands are numbers.
  • Operators == and != do type coercion. Use of the === and !== operators is always preferred.
  • Too many falsy values. 0, Nan, '' (empty string), false, null, undefined are all false.

[17:25] for ... in operator mixes inherited functions with the desired data members (it does deep reflection). Use object.hasOwnProperty to filter out names that belong to object itself:

for (name in object) {
    if (object.hasOwnProperty(name)) {
        ...
    }
}

[22:00] Javascript good parts:

  • Lambda. Enough said.
  • Dynamic objects. Just add a property to an object, or remove it, no need for classes and derivation to create a similar object.
  • Loose Typing. No need for type declarations.
  • Object Literals. {} for object literals, [] for array literals and // for regexp literals.

[23:00] Two schools of inheritance - classical and prototypal. Prototypal inheritance means that objects can inherit properties directly from other objects. The language is class-free.

[24:35] Realization of prototypal inheritance in JavaScript:

if (typeof Object.create !== 'function') {
    Object.create = function (o) {
        function F() {}
        F.prototype = o;
        return new F();
    };
}

Now a newObject can be created by inheriting from oldObject:

newObject = Object.create(oldObject);

[26:05] Example of global variables and why they are bad and how to solve the problem by using closures.

var my_thing = function () {
    var names = ["zero", "one", ... ];
    return function(n) {
        return names[n];
    };
}();

[29:00] There are four ways to create a new object in JavaScript:

  • Object literal -- var object = { }.
  • New -- var object = new Object()
  • Object.create -- var object = Object.create(old_object).
  • Call another constructor (use different inheritance model).

[42:42] JSLint. JSLint defines a professional subset of JavaScript. JSLint will hurt your feelings.

[52:00] Q/A: Does strict mode change the behavior or does it take things out? -- Can't have "with" in strict mode, changes the way eval() works, changes error handling.

[53:00] Q/A: What's going on with DOM? -- Ajax libraries fix DOM, these changes should be propagated back into DOM API.

[55:30] Q/A: How do you spell lambda in JavaScript? -- function.

[55:54] Q/A: How to solve global variable problem? -- Each of compilation unit should be isolated, but there should be a way how they can introduce (link) to each other.

[56:30] Q/A: How do JavaScript objects differ from hash tables, they seem the same to me?

[57:23] Q/A: What's wrong with HTML 5 and web apps? -- They are doing too much and there are way too many of them.

[59:10] Q/A: How come JSON and JavaScript have almost the same syntax? -- Doug admits he forgot to include Unicode Line Separator (LS) and Paragraph Separator (PS) control codes as whitespace chars.

[01:00:32] Q/A: Why does JSON require quotes around the property names? -- Three reasons: 1. Wanted to align with Python where quotes are required, 2. Wanted to make the grammar of standard simpler, 3. JavaScript has stupid reserved word policy.

[01:02:40] Q/A: Are there any prospects for adding concurrency to the language? -- Definitely no threads. Could be some kind of a messaging model.

If you liked this talk, I recommend that you get Doug's book:

Happy javascripting! ;)