Holy smokes! It has now been two years since I started this blog. It seems almost like yesterday when I posted the "A Year of Blogging" article. And now it's two! With this post I'd like to celebrate the 2nd birthday and share various interesting statistics that I managed to gather.
During this year (July 20, 2008  July 26, 2009) I wrote 55 posts, which received around 1000 comments. According to StatCounter and Google Analytics my blog was visited by 1,050,000 unique people who viewed 1,700,000 pages. Wow, 1 million visitors! That's very impressive!
Here is a Google Analytics graph of monthly page views for the last year (click for a larger version):
In the last three months I did not manage to write much and you can see how that reflected on the page views. A good lesson to be learned is to be persistent and keep writing articles consistently.
Here is the same graph with two years of data, showing a complete picture of my blog's growth:
I like this seemingly linear growth. I hope it continues the same way the next year!
Here are the top 5 referring sites that my visitors came from:
 reddit (292,147 visitors).
 stumbleupon (69,575 visitors).
 hacker news (47,595 visitors).
 delicious (27,109 visitors).
 dzone (15,898 visitors).
And here are the top 5 referring blogs:
 Scott Klarr's blog (7,848 visitors).
 My Free Science Online blog (6,284 visitors).
 Eric Wendelin's blog (1,693 visitors).
 Andy Lester's PerlBuzz blog (1,356 visitors).
 Jurgen Appelo's blog (1,001 visitors)
I found that just a handful of blogs had linked to me during this year. The main reason, I suspect, is that I do not link out much myself... It's something to improve upon.
If you remember, I ended the last year's post with the following words (I had only 1000 subscribers at that time):
I am setting myself a goal of reaching 5000 subscribers by the end of the next year of blogging (July 2009)! I know that this is very ambitious goal but I am ready to take the challenge!
I can proudly say that I reached my ambitious goal! My blog now has almost 7000 subscribers! If you have not yet subscribed, click here to do it!
Here is the RSS subscriber graph for the whole two years:
Several months ago I approximated the subscriber data with an exponent function and it produced a good fit. Probably if I had continued writing articles at the same pace I did three months ago, I'd have over 10,000 subscribers now.
Anyway, let's now turn to the top 10 most viewed posts:
 1. My job interview experience at Google (144,400 views).
 2. A Unix Utility You Should Know About: Pipe Viewer (124,570 views)
 3. Code Reuse in Google Chrome Browser (115,750 views).
 4. Famous Awk OneLiners Explained, Part I (87,721 views).
 5. MIT Introduction to Algorithms, Part I: Analysis of Algorithms (79,536 views).
 6. A Unix Utility You Should Know About: Netcat (51,354 views).
 7. Famous Sed OneLiners Explained, Part I (49,068 views).
 8. Vim Plugins You Should Know About, Part I: surround.vim (41,388 views).
 9. 10 Awk Tips, Tricks and Pitfalls (29,689 views).
 10. Low Level Bit Hacks You Absolutely Must Know (22,916 views).
The article that I liked the most myself but which didn't make it to top ten was the "Set Operations in Unix Shell". I just love this Unix stuff I did there.
I am also very proud for the following three article series that I wrote:
 1. Review of MIT's Introduction to Algorithms course (14 parts).
 2. Famous Awk OneLiners Explained (4 parts: 1, 2, 3, 4).
 3. Famous Sed OneLiners Explained (3 parts: 1, 2, 3)
Finally, here is a list of ideas that I have thought for the third year of blogging:
 Publish three ebooks on Awk OneLiners, Sed OneLiners and Perl OneLiners.
 Launch mathematics, physics and general science blog.
 Write about mathematical foundations of cryptography and try to implement various cryptosystems and cryptography protocols.
 Publish my review of MIT's Linear Algebra course (in math blog, so the main topic of catonmat stays computing).
 Publish my review of MIT's Physics courses on Mechanics, Electromagnetism, and Waves (in physics blog).
 Publish my notes on how I learned the C++ language.
 Write more about computer security and ethical hacking.
 Write several book reviews.
 Create a bunch of various fun utilities and programs.
 Create at least one useful web project.
 Add a knowledge database to catonmat, create software to allow easy publishing to it.
 If time allows, publish reviews of important computer science publications.
I'll document everything here as I go, so if you are interested in these topics stay with me by subscribing to my rss feed!
And to make things more challenging again, I am setting a new goal for the next year of blogging. The goal is to reach 20,000 subscribers by July 2010!
Hope to see you all on my blog again! Now it's time for this delicious cake:
MIT's Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms
This is a happy and sad moment at the same time  I have finally reached the last two lectures of MIT's undergraduate algorithms course. These last two lectures are on a fairly new area of algorithm research called "cache oblivious algorithms."
Cacheoblivious algorithms take into account something that has been ignored in all the lectures so far, particularly, the multilevel memory hierarchy of modern computers. Retrieving items from various levels of memory and cache make up a dominant factor of running time, so for speed it is crucial to minimize these costs. The main idea of cacheoblivious algorithms is to achieve optimal use of caches on all levels of a memory hierarchy without knowledge of their size.
Cacheoblivious algorithms should not be confused with cacheaware algorithms. Cacheaware algorithms and data structures explicitly depend on various hardware configuration parameters, such as the cache size. Cacheoblivious algorithms do not depend on any hardware parameters. An example of cacheaware (not cacheoblivious) data structure is a BTree that has the explicit parameter B, the size of a node. The main disadvantage of cacheaware algorithms is that they are based on the knowledge of the memory structure and size, which makes it difficult to move implementations from one architecture to another. Another problem is that it is very difficult, if not impossible, to adapt some of these algorithms to work with multiple levels in the memory hierarchy. Cacheoblivious algorithms solve both problems.
Lecture twentytwo introduces the terminology and notation used in cacheoblivious algorithms, explains the difference between cacheoblivious and cacheaware algorithms, does a simple memory analysis of several simple algorithms and culminates with a cacheoblivious algorithm for matrix multiplication.
The final lecture twentythree is the most difficult in the whole course and shows cacheoblivious binary search trees and cacheoblivious sorting called funnel sort.
Use this supplementary reading material by professor Demaine to understand the material better: Cacheoblivious algorithms and data structures (.pdf).
Lecture 22: Cache Oblivious Algorithms I
Lecture twentytwo starts with an introduction to the modern memory hierarchy (CPU cache L1, L2, L3, main memory, disk cache, etc.) and with the notation and core concepts used in cacheoblivious algorithms.
A powerful result in cacheoblivious algorithm design is that if an algorithm is efficient on two levels of cache, then it's efficient on any number of levels. Thus the study of cacheobliviousness can be simplified to twolevel memory hierarchy, say the CPU cache and main memory, where the accesses to cache are instant but are orders of magnitude slower to main memory. Therefore the main question cacheoblivious algorithm analysis tries to address is how many memory transfers (MTs) does a problem of size N take. The notation used for this is MT(N). For an algorithm to be efficient, the number of memory transfers should be as small as possible.
Next the lecture analysis the number of memory transfers for basic array scanning and array reverse algorithms. Since array scanning is consequential, N elements can be processed with O(N/B) accesses, where B is the block size  number of elements that are automatically fetched as Nth element is accessed. That is MT(N) = O(N/B) for array scanning. The same bound holds for reversing an array, since it can be viewed two scans  one from the beginning and one from the end.
Next it's shown that the classical binary search (covered in lecture 3) is not cache efficient, but order statistics problem (covered in lecture 6) is cache efficient.
Finally the lecture describes a cache efficient way to multiply matrices by storing them blockwise in memory.
You're welcome to watch lecture twentytwo:
Topics covered in lecture twentytwo:
 [00:10] Introduction and history of cacheoblivious algorithms.
 [02:00] Modern memory hierarchy in computers: Caches L1, L2, L3, main memory, disk cache.
 [06:15] Formula for calculating the cost to access a block of memory.
 [08:18] Amortized cost to access one element in memory.
 [11:00] Spatial and temporal locality of algorithms.
 [13:45] Twolevel memory model.
 [16:30] Notation: total cache size M, block size B, number of blocks M/B.
 [20:40] Notation: MT(N)  number of memory transfers of a problem of size N.
 [21:45] Cacheaware algorithms.
 [22:50] Cacheoblivious algorithms.
 [28:35] Blocking of memory.
 [32:45] Cacheoblivious scanning algorithm (visitor pattern).
 [36:20] Cacheoblivious ArrayReverse algorithm.
 [39:05] Memory transfers in classical binary search algorithm.
 [43:45] Divide and conquer algorithms.
 [45:50] Analysis of memory transfers in order statistics algorithm.
 [01:00:50] Analysis of classical matrix multiplication (with row major, column major memory layout).
 [01:07:30] Cache oblivious matrix multiplication.
Lecture twentytwo notes:
Lecture 23: Cache Oblivious Algorithms II
This was probably the most complicated lecture in the whole course. The whole lecture is devoted to two subjects  cacheoblivious search trees and cacheoblivious sorting.
While it's relatively easy to understand the design of cacheoblivious way of storing search trees in memory, it's amazingly difficult to understand the cacheefficient sorting. It's called funnel sort which is basically an nway merge sort (covered in lecture 1) with special cacheoblivious merging function called kfunnel.
You're welcome to watch lecture twentythree:
Topics covered in lecture twentythree:
 [01:00] Cacheoblivious static search trees (binary search trees).
 [09:35] Analysis of static search trees.
 [18:15] Cacheaware sorting.
 [19:00] Sorting by repeated insertion in binary tree.
 [21:40] Sorting by binary merge sort.
 [31:20] Sorting by Nway mergesort.
 [36:20] Sorting bound for cacheoblivious sorting algorithms.
 [38:30] Cacheoblivious sorting.
 [41:40] Definition of KFunnel (cacheoblivious merging).
 [43:35] Funnel sort.
 [54:05] Construction of KFunnel.
 [01:03:10] How to fill buffer in kfunnel.
 [01:07:30] Analysis of fill buffer.
Lecture twentythree notes:


Have fun with the cache oblivious algorithms! I'll do a few more posts that will summarize all these lectures and highlight key ideas.
If you loved this, please subscribe to my blog!
On the Linear Time Algorithm For Finding Fibonacci Numbers
In this article I'd like to show how the theory does not always match the practice. I am sure you all know the linear time algorithm for finding Fibonacci numbers. The analysis says that the running time of this algorithm is O(n). But is it still O(n) if we actually run it? If not, what is wrong?
Let's start with the simplest linear time implementation of the Fibonacci number generating algorithm in Python:
def LinearFibonacci(n): fn = f1 = f2 = 1 for x in xrange(2, n): fn = f1 + f2 f2, f1 = f1, fn return fn
The theory says that this algorithm should run in O(n)  given the nth Fibonacci number to find, the algorithm does a single loop up to n.
Now let's verify if this algorithm is really linear in practice. If it's linear then the plot of n vs. running time of LinearFibonacci(n) should be a line. I plotted these values for n up to 200,000 and here is the plot that I got:
Note: Each data point was averaged over 10 calculcations.
Oh no! This does not look linear at all! It looks quadratic! I fitted the data with a quadratic function and it fit nearly perfectly. Do you know why the seemingly linear algorithm went quadratic?
The answer is that the theoretical analysis assumed that all the operations in the algorithm executed in constant time. But this is not the case when we run the algorithm on a real machine! As the Fibonacci numbers get larger, each addition operation for calculating the next Fibonacci number "fn = f1 + f2 " runs in time proportional to the length of the previous Fibonacci number. It's because these huge numbers no longer fit in the basic units of computation in the CPU; so a big integer library is required. The addition of two numbers of length O(n) in a big integer library takes time of O(n).
I'll show you that the running time of the reallife linear Fibonacci algorithm really is O(n^2) by taking into account this hidden cost of bigint library.
So at each iteration i we have a hidden cost of O(number of digits of f_{i}) = O(digits(f_{i})). Let's sum these hidden cost for the whole loop up to n:
Now let's find the number of digits in the nth Fibonacci number. To do that let's use the wellknown Binet's formula, which tells us that the nth Fibonacci number f_{n} can be expressed as:
It is also wellknown that the number of digits in a number is integer part of log_{10}(number) + 1. Thus the number of digits in the nth Fibonacci number is:
Thus if we now sum all the hidden costs for finding the nth Fibonacci number we get:
There we have it. The running time of this "linear" algorithm is actually quadratic if we take into consideration that each addition operation runs proportionally to the length of addends.
Next time I'll show you that if the addition operation runs in constant time, then the algorithm is truly linear; and later I will do a similar analysis of the logarithmic time algorithm for finding Fibonnaci numbers that uses this awesome matrix identity:
Don't forget to subscribe if you are interested! It's well worth every byte!
I decided to write an article about a thing that is second nature to embedded systems programmers  low level bit hacks. Bit hacks are ingenious little programming tricks that manipulate integers in a smart and efficient manner. Instead of performing some operation (such as counting the 1 bits in an integer) by looping over individual bits, these programming nuggets do the same with one or two carefully chosen bitwise operations.
To get things going I'll assume that you know what the two's complement binary representation of an integer is and also that you know all the the bitwise operations.
I'll use the following notation for bitwise operations in the article:
&  bitwise and   bitwise or ^  bitwise xor ~  bitwise not <<  bitwise shift left >>  bitwise shift right
The numbers in the article are 8 bit signed integers (though the operations work on arbitrary length signed integers) that are represented as two's complement and they are usually named 'x'. The result is usually 'y'. The individual bits of 'x' are named b7, b6, b5, b4, b3, b3, b2, b1 and b0. The bit b7 is the sign bit (the most significant bit), and b0 is the least significant.
I'll start with the most basic bit hacks and gradually progress to more difficult ones. I'll use examples to explain how each bithack works.
If you are intrigued by this topic I urge you to subscribe to my blog. I can share a secret that there will be the 2nd part of this article where I cover more advanced bit hacks, and I will also release a cheat sheet with all these bit tricks! It's well worth subscribing!
Here we go.
Bit Hack #1. Check if the integer is even or odd.
if ((x & 1) == 0) { x is even } else { x is odd }
I am pretty sure everyone has seen this trick. The idea here is that an integer is odd if and only if the least significant bit b0 is 1. It follows from the binary representation of 'x', where bit b0 contributes to either 1 or 0. By ANDing 'x' with 1 we eliminate all the other bits than b0. If the result after this operation is 0, then 'x' was even because bit b0 was 0. Otherwise 'x' was odd.
Let's look at some examples. Let's take integer 43, which is odd. In binary 43 is 00101011. Notice that the least significant bit b0 is 1 (in bold). Now let's AND it with 1:
00101011 & 00000001 (note: 1 is the same as 00000001)  00000001
See how ANDing erased all the higher order bits b1b7 but left bit b0 the same it was? The result is thus 1 which tells us that the integer was odd.
Now let's look at 43. Just as a reminder, a quick way to find negative of a given number in two's complement representation is to invert all bits and add one. So 43 is 11010101 in binary. Again notice that the last bit is 1, and the integer is odd. (Note that if we used one's complement it wouldn't be true!)
Now let's take a look at an even integer 98. In binary 98 is 1100010.
01100010 & 00000001  00000000
After ANDing the result is 0. It means that the bit b0 of original integer 98 was 0. Thus the given integer is even.
Now the negative 98. It's 10011110. Again, bit b0 is 0, after ANDing, the result is 0, meaning 98 is even, which indeed is true.
Bit Hack #2. Test if the nth bit is set.
if (x & (1<<n)) { nth bit is set } else { nth bit is not set }
In the previous bit hack we saw that (x & 1) tests if the first bit is set. This bit hack improves this result and tests if nth bit is set. It does it by shifting that first 1bit n positions to the left and then doing the same AND operation, which eliminates all bits but nth.
Here is what happens if you shift 1 several positions to the left:
1 00000001 (same as 1<<0) 1<<1 00000010 1<<2 00000100 1<<3 00001000 1<<4 00010000 1<<5 00100000 1<<6 01000000 1<<7 10000000
Now if we AND 'x' with 1 shifted n positions to the left we effectively eliminate all the bits but nth bit in 'x'. If the result after ANDing is 0, then that bit must have been 0, otherwise that bit was set.
Let's look at some examples.
Does 122 have 3rd bit set? The operation we do to find it out is:
122 & (1<<3)
Now, 122 is 01111010 in binary. And (1<<3) is 00001000.
01111010 & 00001000  00001000
We see that the result is not 0, so yes, 122 has the 3rd bit set.
Note: In my article bit numeration starts with 0. So it's 0th bit, 1st bit, ..., 7th bit.
What about 33? Does it have the 5th bit set?
11011111 (33 in binary) & 00100000 (1<<5)  00000000
Result is 0, so the 5th bit is not set.
Bit Hack #3. Set the nth bit.
y = x  (1<<n)
This bit hack combines the same (1<<n) trick of setting nth bit by shifting with OR operation. The result of ORing a variable with a value that has nth bit set is turning that nth bit on. It's because ORing any value with 0 leaves the value the same; but ORing it with 1 changes it to 1 (if it wasn't already). Let's see how that works in action:
Suppose we have value 120, and we wish to turn on the 2nd bit.
01111000 (120 in binary)  00000100 (1<<2)  01111100
What about 120 and 6th bit?
10001000 (120 in binary)  01000000 (1<<6)  11001000
Bit Hack #4. Unset the nth bit.
y = x & ~(1<<n)
The important part of this bithack is the ~(1<<n) trick. It turns on all the bits except nth.
Here is how it looks:
~1 11111110 (same as ~(1<<0)) ~(1<<1) 11111101 ~(1<<2) 11111011 ~(1<<3) 11110111 ~(1<<4) 11101111 ~(1<<5) 11011111 ~(1<<6) 10111111 ~(1<<7) 01111111
The effect of ANDing variable 'x' with this quantity is eliminating nth bit. It does not matter if the nth bit was 0 or 1, ANDing it with 0 sets it to 0.
Here is an example. Let's unset 4th bit in 127:
01111111 (127 in binary) & 11101111 (~(1<<4))  01101111
Bit Hack #5. Toggle the nth bit.
y = x ^ (1<<n)
This bit hack also uses the wonderful "set nth bit shift hack" but this time it XOR's it with the variable 'x'. The result of XORing something with something else is that if both bits are the same, the result is 0, otherwise it's 1. How does it toggle nth bit? Well, if nth bit was 1, then XORing it with 1 changes it to 0; conversely, if it was 0, then XORing with with 1 changes it to 1. See, the bit got flipped.
Here is an example. Suppose you want to toggle 5th bit in value 01110101:
01110101 ^ 00100000  01010101
What about the same value but 5th bit originally 0?
01010101 ^ 00100000  01110101
Notice something? XORing the same bit twice returned it to the same value. This nifty XOR property is used in calculating parity in RAID arrays and used in simple cryptography cyphers, but more about that in some other article.
Bit Hack #6. Turn off the rightmost 1bit.
y = x & (x1)
Now it finally gets more interesting!!! Bit hacks #1  #5 were kind of boring to be honest.
This bit hack turns off the rightmost onebit. For example, given an integer 00101010 (the rightmost 1bit in bold) it turns it into 00101000. Or given 00010000 it turns it into 0, as there is just a single 1bit.
Here are more examples:
01010111 (x) & 01010110 (x1)  01010110 01011000 (x) & 01010111 (x1)  01010000 10000000 (x = 128) & 01111111 (x1 = 127 (with overflow))  00000000 11111111 (x = all bits 1) & 11111110 (x1)  11111110 00000000 (x = no rightmost 1bits) & 11111111 (x1)  00000000
Why does it work?
If you look at the examples and think for a while, you'll realize that there are two possible scenarios:
1. The value has the rightmost 1 bit. In this case subtracting one from it sets all the lower bits to one and changes that rightmost bit to 0 (so that if you add one now, you get the original value back). This step has masked out the rightmost 1bit and now ANDing it with the original value zeroes that rightmost 1bit out.
2. The value has no rightmost 1 bit (all 0). In this case subtracting one underflows the value (as it's signed) and sets all bits to 1. ANDing all zeroes with all ones produces 0.
Bit Hack #7. Isolate the rightmost 1bit.
y = x & (x)
This bit hack finds the rightmost 1bit and sets all the other bits to 0. The end result has only that one rightmost 1bit set. For example, 01010100 (rightmost bit in bold) gets turned into 00000100.
Here are some more examples:
10111100 (x) & 01000100 (x)  00000100 01110000 (x) & 10010000 (x)  00010000 00000001 (x) & 11111111 (x)  00000001 10000000 (x = 128) & 10000000 (x = 128)  10000000 11111111 (x = all bits one) & 00000001 (x)  00000001 00000000 (x = all bits 0, no rightmost 1bit) & 00000000 (x)  00000000
This bit hack works because of two's complement. In two's complement system x is the same as ~x+1. Now let's examine the two possible cases:
1. There is a rightmost 1bit b_{i}. In this case let's pivot on this bit and divide all other bits into two flanks  bits to the right and bits to the left. Remember that all the bits to the right b_{i1}, b_{i2} ... b_{0} are 0's (because b_{i} was the rightmost 1bit). And bits to the left are the way they are. Let's call them b_{i+1}, ..., b_{n}.
Now, when we calculate x, we first do ~x which turns bit b_{i} into 0, bits b_{i1} ... b_{0} into 1s, and inverts bits b_{i+1}, ..., b_{n}, and then we add 1 to this result.
Since bits b_{i1} ... b_{0} are all 1's, adding one makes them carry this one all the way to bit b_{i}, which is the first zero bit.
If we put it all together, the result of calculating x is that bits b_{i+1}, ..., b_{n} get inverted, bit b_{i} stays the same, and bits b_{i1}, ..., b_{0} are all 0's.
Now, ANDing x with x makes bits b_{i+1}, ..., b_{n} all 0, leaves bit b_{i} as is, and sets bits b_{i1}, ..., b_{0} to 0. Only one bit is left, it's the bit b_{i}  the rightmost 1bit.
2. There is no rightmost 1bit. The value is 0. The negative of 0 in two's complement is also 0. 0&0 = 0. No bits get turned on.
We have proved rigorously that this bithack is correct.
Bit Hack #8. Right propagate the rightmost 1bit.
y = x  (x1)
This is best understood by an example. Given a value 01010000 it turns it into 01011111. All the 0bits right to the rightmost 1bit got turned into ones.
This is not a clean hack, tho, as it produces all 1's if x = 0.
Let's look at more examples:
10111100 (x)  10111011 (x1)  10111111 01110111 (x)  01110110 (x1)  01110111 00000001 (x)  00000000 (x1)  00000001 10000000 (x = 128)  01111111 (x1 = 127)  11111111 11111111 (x = 1)  11111110 (x1 = 2)  11111111 00000000 (x)  11111111 (x1)  11111111
Let's prove it, though not as rigorously as in the previous bithack (as it's too time consuming and this is not a scientific publication). There are two cases again. Let's start with easiest first.
1. There is no rightmost 1bit. In that case x = 0 and x1 is 1. 1 in two's complement is 11111111. ORing 0 with 11111111 produces the same 11111111. (Not the desired result, but that's the way it is.)
2. There is the rightmost 1bit b_{i}. Let's divide all the bits in two groups again (like in the previous example). Calculating x1 modifies only bits to the right, turning b_{i} into 0, and all the lower bits to 1's. Now ORing x with x1 leaves all the higher bits (to the left) the same, leaves bit b_{i} as it was 1, and since lower bits are all low 1's it also turns them on. The result is that the rightmost 1bit got propagated to lower order bits.
Bit Hack #9. Isolate the rightmost 0bit.
y = ~x & (x+1)
This bithack does the opposite of #7. It finds the rightmost 0bit, turns off all bits, and sets this bit to 1 in the result. For example, it finds the zero in bold in this number 10101011, producing 00000100.
More examples:
10111100 (x)  01000011 (~x) & 10111101 (x+1)  00000001 01110111 (x)  10001000 (~x) & 01111000 (x+1)  00001000 00000001 (x)  11111110 (~x) & 00000010 (x+1)  00000010 10000000 (x = 128)  01111111 (~x) & 10000001 (x+1)  00000001 11111111 (x = no rightmost 0bit)  00000000 (~x) & 00000000 (x+1)  00000000 00000000 (x)  11111111 (~x) & 00000001 (x+1)  00000001
Proof: Suppose there is a rightmost 0bit. Then ~x turns this rightmost 0 bit into 1 bit. And so does x+1 (because bits more right to the rightmost 0 bit are 1's). Now ANDing ~x with x+1 evaporates all the bits up to this rightmost 0 bit. This is the highest order bit set in the result. Now what about lower order bits to the right of rightmost 0 bit? They also got evaporated because because x+1 turned them into 0's (they were 1's) and ~x turned them into 0's. They got ANDed with 0 and evaporated.
Bit Hack #10. Turn on the rightmost 0bit.
y = x  (x+1)
This hack changes the rightmost 0bit into 1. For example, given an integer 10100011 it turns it into 10100111.
More examples:
10111100 (x)  10111101 (x+1)  10111101 01110111 (x)  01111000 (x+1)  01111111 00000001 (x)  00000010 (x+1)  00000011 10000000 (x = 128)  10000001 (x+1)  10000001 11111111 (x = no rightmost 0bit)  00000000 (x+1)  11111111 00000000 (x)  00000001 (x+1)  00000001
Here is the proof as a bunch of true statements. ORing x with x+1 does not lose any information. Adding 1 to x fills the first rightmost 0. The result is max{x, x+1}. If x+1 overflows it's x and there were no 0 bits. If it doesn't, it's x+1 which just got rightmost bit filled with 1.
Bonus stuff.
If you decide to play more with these hacks, here are a few utility functions to print binary values of 8 bit signed integers in Perl, Python and C.
Print binary representation in Perl:
sub int_to_bin {
my $num = shift;
print unpack "B8", pack "c", $num;
}
Or you can print it from command line right away:
perl wle 'print unpack "B8", pack "c", shift' <integer> # For example: perl wle 'print unpack "B8", pack "c", shift' 113 01110001 perl wle 'print unpack "B8", pack "c", shift'  128 10000000
Print binary number in Python:
def int_to_bin(num, bits=8):
r = ''
while bits:
r = ('1' if num&1 else '0') + r
bits = bits  1
num = num >> 1
print r
Print binary representation in C:
void int_to_bin(int num) {
char str[9] = {0};
int i;
for (i=7; i>=0; i) {
str[i] = (num&1)?'1':'0';
num >>= 1;
}
printf("%s\n", str);
}
Have fun with these! I'll write about advanced bit hacks some time soon. If you are really intrigued by this topic I encourage you to subscribe to my blog. Thanks! :)
Ps. Let me know in the comments what you think about this article, and let me know if you do not know what two's complement, or the basic binary operations are. If there are a few people who would like me to explain these concepts, I'll be glad to write another article just about these fundamental topics.
Pps. There is a book entirely on bit hacks like these. It's called "Hacker's Delight". It may be worth getting if you are into this stuff:
Here is another quick hack that I wrote a while ago. It complements the xgoogle library that I published in my previous post with an API for Google Sponsored Links search.
Let me quickly explain why this library is useful, and what the Google Sponsored Links are.
For a typical search, Google shows regular web search results on the left side of the page, and "Sponsored Links" in a column on the right side. "Sponsored" means the results are pulled from Googe's advertising network (Adwords).
Here is a screenshot that illustrates the Sponsored Links:
Google Sponsored Links results for search term "security" are in red.
Okay, now why would I need a library to search the Sponsored results? Suppose that I am an advertiser on Adwords, and I buy some software related keywords like "video software". It is in my interests to know my competitors, their advertisement text, what are they up to, the new players in this niche, and their websites. Without my library it would be practically impossible to keep track of all the competitors. There can literally be hundreds of changes per day. However, with my library it's now piece of cake to keep track of all the dynamics.
How does the library work?
The sponsored links library pulls the results from this URL: http://www.google.com/sponsoredlinks. Here is an example of all the sponsored results for a query "security":
The library just grabs page after page, calls BeautifulSoup, and extracts the search result elements. Elementary.
How to use the library?
As I mentioned, this library is part of my xgoogle library. Download and extract it first:
Download: xgoogle library (.zip)
Downloaded: 22019 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
Now, the source file that contains the implementation of this library is "xgoogle/sponsoredlinks.py". To use it, do the usual import "from xgoogle.sponsoredlinks import SponsoredLinks, SLError".
SponsoredLinks is the class that provides the API and SLError is exception class that gets thrown in case of errors, so it's a good idea to import both.
The SponsoredLinks has a similar interface as the xgoogle.search (the plain google search module). The constructor of SponsoredLinks takes the keyword you want to search for, and the constructed object has several public methods and properties:
 method get_results()  gets a page of results, returning a list of SponsoredLink objects. It returns an empty list if there are no more results.
 property num_results  returns number of search results found.
 property results_per_page  sets/gets the number of results to get per page (max 100).
The returned SponsoredLink objects have four attributes  "title", "desc", "url", and "display_url". Here is a picture that illustrates what each attribute stands for:
The picture does not show the "display_url" attribute as it's the actual link the result links to (href of blue link in the pic).
Here is an example usage of this library. It retrieves first 100 Sponsored Links results for keyword "video software":
from xgoogle.sponsoredlinks import SponsoredLinks, SLError
try:
sl = SponsoredLinks("video software")
sl.results_per_page = 100
results = sl.get_results()
except SLError, e:
print "Search failed: %s" % e
for result in results:
print result.title.encode('utf8')
print result.desc.encode('utf8')
print result.display_url.encode('utf8')
print result.url.encode('utf8')
print
Output:
Photoshop Video Software Time saving software for video. Work faster in Photoshop. www.toolsfortelevision.com http://www.toolsfortelevision.com ...
That's about it for this time. Use it to find your competitors and outsmart them!
Next time I am going to expand the library for Google Sets search.
Download "xgoogle" library:
Download: xgoogle library (.zip)
Downloaded: 22019 times.
Download url: http://www.catonmat.net/download/xgoogle.zip
Have fun!