Asynchronous DNSOnce upon a time, I had to quickly resolve thousands of DNS names. My first solution was to call gethostbyname repeatedly for each of the hosts. This turned out to be extremely slow. I could only do 200 hosts in a minute. I talked with someone and he suggested to try to do it asynchronously. I looked around and found adns - asynchronous dns library. Since I was writing the code in Python, I looked around some more and found Python bindings for adns. I tried adns and - wow - I could do 20000 hosts in a minute!

In this post I want to share the slow code and the fast asynchronous code. The slow code is only useful if you need to resolve just several domains. The asynchronous code is much more useful. I made it as a Python module so that you can reuse it. It's called "async_dns.py" and an example of how to use it is included at the bottom of the post.

Here is the slow code that uses gethostbyname. The only reusable part of this code is "resolve_slow" function that takes a list of hosts to resolve, resolves them, and returns a dictionary containing { host: ip } pairs.

To measure how fast it is I made it resolve hosts "www.domain0.com", "www.domain1.com", ..., "www.domain999.com" and print out how long the whole process took.

#!/usr/bin/python

import socket
from time import time

def resolve_slow(hosts):
    """
    Given a list of hosts, resolves them and returns a dictionary
    containing {'host': 'ip'}.
    If resolution for a host failed, 'ip' is None.
    """
    resolved_hosts = {}
    for host in hosts:
        try:
            host_info = socket.gethostbyname(host)
            resolved_hosts[host] = host_info
        except socket.gaierror, err:
            resolved_hosts[host] = None
    return resolved_hosts

if __name__ == "__main__":
    host_format = "www.domain%d.com"
    number_of_hosts = 1000

    hosts = [host_format % i for i in range(number_of_hosts)]

    start = time()
    resolved_hosts = resolve_slow(hosts)
    end = time()

    print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)

And here is the fast code that uses adns. I created a class "AsyncResolver" that can be reused if you import it from this code. Just like "resolve_slow" from the previous code example, it takes a list of hosts to resolve and returns a dictionary of { host: ip } pairs.

If you run this code, it will print out how long it took to resolve 20000 hosts.

#!/usr/bin/python
#

import adns
from time import time

class AsyncResolver(object):
    def __init__(self, hosts, intensity=100):
        """
        hosts: a list of hosts to resolve
        intensity: how many hosts to resolve at once
        """
        self.hosts = hosts
        self.intensity = intensity
        self.adns = adns.init()

    def resolve(self):
        """ Resolves hosts and returns a dictionary of { 'host': 'ip' }. """
        resolved_hosts = {}
        active_queries = {}
        host_queue = self.hosts[:]

        def collect_results():
            for query in self.adns.completed():
                answer = query.check()
                host = active_queries[query]
                del active_queries[query]
                if answer[0] == 0:
                    ip = answer[3][0]
                    resolved_hosts[host] = ip
                elif answer[0] == 101: # CNAME
                    query = self.adns.submit(answer[1], adns.rr.A)
                    active_queries[query] = host
                else:
                    resolved_hosts[host] = None

        def finished_resolving():
            return len(resolved_hosts) == len(self.hosts)

        while not finished_resolving():
            while host_queue and len(active_queries) < self.intensity:
                host = host_queue.pop()
                query = self.adns.submit(host, adns.rr.A)
                active_queries[query] = host
            collect_results()

        return resolved_hosts

if __name__ == "__main__":
    host_format = "www.host%d.com"
    number_of_hosts = 20000

    hosts = [host_format % i for i in range(number_of_hosts)]

    ar = AsyncResolver(hosts, intensity=500)
    start = time()
    resolved_hosts = ar.resolve()
    end = time()

    print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)

I wrote it in a manner that makes it reusable in other programs. Here is an example of how to reuse this code:

from async_dns import AsyncResolver

ar = AsyncResolver(["www.google.com", "www.reddit.com", "www.nonexistz.net"])
resolved = ar.resolve()

for host, ip in resolved.items():
  if ip is None:
    print "%s could not be resolved." % host
  else:
    print "%s resolved to %s" % (host, ip)

Output:

www.nonexistz.net could not be resolved.
www.reddit.com resolved to 159.148.86.207
www.google.com resolved to 74.125.39.99

Download "async_dns.py":

Download: async_dns.py
Downloaded: 4100 times.
Download url: http://www.catonmat.net/download/async_dns.py

I hope someone finds this useful!

Bit HacksA part of being a great programmer is having your personal code library. With a personal code library I mean a repository of code that you have an intimate knowledge of and that you can reuse quickly. If you are a C programmer, you don't want to reimplement linked lists, trees, various utility functions, macros and algorithms each time you write a new program. Rather you want to take them from your repository, adjust and incorporate in your code.

A good example is the implementation of linked lists in the Linux kernel. Every kernel developer knows it and uses it if necessary. They wouldn't reimplement it. Another example is all the code written by djb. It's so good that people have taken it and turned into libdjb code library.

With this article I'd like to open a new topic in this blog where I share code from my personal code library. I'll start with a C header file that I created just recently based on my Bit Hacks You Should Know About article.

This header file is called "bithacks.h" and it contains various macros for bit manipulations. I also wrote tests for all the macros in the "bithacks-test.c" program.

The most beautiful part of "bithacks.h" is the "B8" macro that allows to write something like " x = B8(10101010) " and turns it into " x = 170 " (because 10101010 in binary is 170 in decimal). I have not yet added B16 and B32 macros but I will add them when I publish the article on advanced bithacks. The credit for the B8 idea goes to Tom Torfs who was the first to write it.

The "bithacks.h" header provides the following macros:

  • B8(x) - turns x written in binary into decimal,
  • B_EVEN(x) - tests if x is even (bithack #1),
  • B_ODD(x) - tests if x is odd (inverse of (bithack #1)),
  • B_IS_SET(x, n) - tests if n-th bit is set in x (bithack #2),
  • B_SET(x, n) - sets n-th bit in x (bithack #3),
  • B_UNSET(x, n) - unsets n-th bit in x (bithack #4),
  • B_TOGGLE(x, n) - toggles n-th bit in x (bithack #5),
  • B_TURNOFF_1(x) - turns off the right-most 1-bit in x (bithack #6),
  • B_ISOLATE_1(x) - isolates the right-most 1-bit in x (bithack #7),
  • B_PROPAGATE_1(x) - propagates the right-most 1-bit in x (bithack #8),
  • B_ISOLATE_0(x) - isolates the right-most 0-bit in x (bithack #9),
  • B_TURNON_0(x) - turn on the right-most 0-bit in x (bithack #10).

Please see "bithacks-test.c" for many examples of these macros.

For those who don't want to download bithacks.h, here is its content:

/* 
** bithacks.h - bit hacks macros. v1.0
**
** Released under the MIT license.
*/

#ifndef BITHACKS_H
#define BITHACKS_H

#define HEXIFY(X) 0x##X##LU

#define B8IFY(Y) (((Y&0x0000000FLU)?1:0)  + \
                  ((Y&0x000000F0LU)?2:0)  + \
                  ((Y&0x00000F00LU)?4:0)  + \
                  ((Y&0x0000F000LU)?8:0)  + \
                  ((Y&0x000F0000LU)?16:0) + \
                  ((Y&0x00F00000LU)?32:0) + \
                  ((Y&0x0F000000LU)?64:0) + \
                  ((Y&0xF0000000LU)?128:0))

#define B8(Z) ((unsigned char)B8IFY(HEXIFY(Z)))

/* test if x is even */
#define B_EVEN(x)        (((x)&1)==0)

/* test if x is odd */
#define B_ODD(x)         (!B_EVEN((x)))

/* test if n-th bit in x is set */
#define B_IS_SET(x, n)   (((x) & (1<<(n)))?1:0)

/* set n-th bit in x */
#define B_SET(x, n)      ((x) |= (1<<(n)))

/* unset n-th bit in x */
#define B_UNSET(x, n)    ((x) &= ~(1<<(n)))

/* toggle n-th bit in x */
#define B_TOGGLE(x, n)   ((x) ^= (1<<(n)))

/* turn off right-most 1-bit in x */
#define B_TURNOFF_1(x)   ((x) &= ((x)-1))

/* isolate right-most 1-bit in x */
#define B_ISOLATE_1(x)   ((x) &= (-(x)))

/* right-propagate right-most 1-bit in x */
#define B_PROPAGATE_1(x) ((x) |= ((x)-1))

/* isolate right-most 0-bit in x */
#define B_ISOLATE_0(x)   ((x) = ~(x) & ((x)+1))

/* turn on right-most 0-bit in x */
#define B_TURNON_0(x)    ((x) |= ((x)+1))

/*
** more bit hacks coming as soon as I post
** an article on advanced bit hacks
*/

#endif

And here are all the tests:

/* 
** bithacks-test.c - tests for bithacks.h
**
** Released under the MIT license.
*/

#include <stdio.h>
#include <stdlib.h>

#include "bithacks.h"

int error_count;

#define TEST_OK(exp, what) do { \
    if ((exp)!=(what)) { \
        error_count++; \
        printf("Test '%s' at line %d failed.\n", #exp, __LINE__); \
    } } while(0)

#define TEST_END do { \
    if (error_count) { \
        printf("Testing failed: %d failed tests.\n", error_count); \
    } else { \
        printf("All tests OK.\n"); \
    } } while (0)

void test_B8()
{
    /* test B8 */
    TEST_OK(B8(0), 0);
    TEST_OK(B8(1), 1);
    TEST_OK(B8(11), 3);
    TEST_OK(B8(111), 7);
    TEST_OK(B8(1111), 15);
    TEST_OK(B8(11111), 31);
    TEST_OK(B8(111111), 63);
    TEST_OK(B8(1111111), 127);
    TEST_OK(B8(00000000), 0);
    TEST_OK(B8(11111111), 255);
    TEST_OK(B8(1010), 10);
    TEST_OK(B8(10101010), 170);
    TEST_OK(B8(01010101), 85);
}

void test_B_EVEN()
{
    /* test B_EVEN */
    TEST_OK(B_EVEN(B8(0)), 1);
    TEST_OK(B_EVEN(B8(00000000)), 1);
    TEST_OK(B_EVEN(B8(1)), 0);
    TEST_OK(B_EVEN(B8(11111111)), 0);
    TEST_OK(B_EVEN(B8(10101010)), 1);
    TEST_OK(B_EVEN(B8(01010101)), 0);
    TEST_OK(B_EVEN(44), 1);
    TEST_OK(B_EVEN(131), 0);
}

void test_B_ODD()
{
    /* test B_ODD */
    TEST_OK(B_ODD(B8(0)), 0);
    TEST_OK(B_ODD(B8(00000000)), 0);
    TEST_OK(B_ODD(B8(1)), 1);
    TEST_OK(B_ODD(B8(11111111)), 1);
    TEST_OK(B_ODD(B8(10101010)), 0);
    TEST_OK(B_ODD(B8(01010101)), 1);
    TEST_OK(B_ODD(44), 0);
    TEST_OK(B_ODD(131), 1);
}

void test_B_IS_SET()
{
    /* test B_IS_SET */
    TEST_OK(B_IS_SET(B8(0), 0), 0);
    TEST_OK(B_IS_SET(B8(00000000), 0), 0);
    TEST_OK(B_IS_SET(B8(1), 0), 1);
    TEST_OK(B_IS_SET(B8(11111111), 0), 1);
    TEST_OK(B_IS_SET(B8(11111111), 1), 1);
    TEST_OK(B_IS_SET(B8(11111111), 2), 1);
    TEST_OK(B_IS_SET(B8(11111111), 3), 1);
    TEST_OK(B_IS_SET(B8(11111111), 4), 1);
    TEST_OK(B_IS_SET(B8(11111111), 5), 1);
    TEST_OK(B_IS_SET(B8(11111111), 6), 1);
    TEST_OK(B_IS_SET(B8(11111111), 7), 1);
    TEST_OK(B_IS_SET(B8(11110000), 0), 0);
    TEST_OK(B_IS_SET(B8(11110000), 1), 0);
    TEST_OK(B_IS_SET(B8(11110000), 2), 0);
    TEST_OK(B_IS_SET(B8(11110000), 3), 0);
    TEST_OK(B_IS_SET(B8(11110000), 4), 1);
    TEST_OK(B_IS_SET(B8(11110000), 5), 1);
    TEST_OK(B_IS_SET(B8(11110000), 6), 1);
    TEST_OK(B_IS_SET(B8(11110000), 7), 1);
    TEST_OK(B_IS_SET(B8(00001111), 0), 1);
    TEST_OK(B_IS_SET(B8(00001111), 1), 1);
    TEST_OK(B_IS_SET(B8(00001111), 2), 1);
    TEST_OK(B_IS_SET(B8(00001111), 3), 1);
    TEST_OK(B_IS_SET(B8(00001111), 4), 0);
    TEST_OK(B_IS_SET(B8(00001111), 5), 0);
    TEST_OK(B_IS_SET(B8(00001111), 6), 0);
    TEST_OK(B_IS_SET(B8(00001111), 7), 0);
    TEST_OK(B_IS_SET(B8(10101010), 0), 0);
    TEST_OK(B_IS_SET(B8(10101010), 1), 1);
    TEST_OK(B_IS_SET(B8(10101010), 2), 0);
    TEST_OK(B_IS_SET(B8(10101010), 3), 1);
    TEST_OK(B_IS_SET(B8(10101010), 4), 0);
    TEST_OK(B_IS_SET(B8(10101010), 5), 1);
    TEST_OK(B_IS_SET(B8(10101010), 6), 0);
    TEST_OK(B_IS_SET(B8(10101010), 7), 1);
    TEST_OK(B_IS_SET(B8(01010101), 0), 1);
    TEST_OK(B_IS_SET(B8(01010101), 1), 0);
    TEST_OK(B_IS_SET(B8(01010101), 2), 1);
    TEST_OK(B_IS_SET(B8(01010101), 3), 0);
    TEST_OK(B_IS_SET(B8(01010101), 4), 1);
    TEST_OK(B_IS_SET(B8(01010101), 5), 0);
    TEST_OK(B_IS_SET(B8(01010101), 6), 1);
    TEST_OK(B_IS_SET(B8(01010101), 7), 0);
}

void test_B_SET()
{
    /* test B_SET */
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_SET(x, 0), B8(00000001));
    TEST_OK(B_SET(x, 1), B8(00000011));
    TEST_OK(B_SET(x, 2), B8(00000111));
    TEST_OK(B_SET(x, 3), B8(00001111));
    TEST_OK(B_SET(x, 4), B8(00011111));
    TEST_OK(B_SET(x, 5), B8(00111111));
    TEST_OK(B_SET(x, 6), B8(01111111));
    TEST_OK(B_SET(x, 7), B8(11111111));

    x = B8(11111111);
    TEST_OK(B_SET(x, 0), B8(11111111));
    TEST_OK(B_SET(x, 1), B8(11111111));
    TEST_OK(B_SET(x, 2), B8(11111111));
    TEST_OK(B_SET(x, 3), B8(11111111));
    TEST_OK(B_SET(x, 4), B8(11111111));
    TEST_OK(B_SET(x, 5), B8(11111111));
    TEST_OK(B_SET(x, 6), B8(11111111));
    TEST_OK(B_SET(x, 7), B8(11111111));
}

void test_B_UNSET()
{
    unsigned char x;
   
    x = B8(11111111);
    TEST_OK(B_UNSET(x, 0), B8(11111110));
    TEST_OK(B_UNSET(x, 1), B8(11111100));
    TEST_OK(B_UNSET(x, 2), B8(11111000));
    TEST_OK(B_UNSET(x, 3), B8(11110000));
    TEST_OK(B_UNSET(x, 4), B8(11100000));
    TEST_OK(B_UNSET(x, 5), B8(11000000));
    TEST_OK(B_UNSET(x, 6), B8(10000000));
    TEST_OK(B_UNSET(x, 7), B8(00000000));

    x = B8(00000000);
    TEST_OK(B_UNSET(x, 0), B8(00000000));
    TEST_OK(B_UNSET(x, 1), B8(00000000));
    TEST_OK(B_UNSET(x, 2), B8(00000000));
    TEST_OK(B_UNSET(x, 3), B8(00000000));
    TEST_OK(B_UNSET(x, 4), B8(00000000));
    TEST_OK(B_UNSET(x, 5), B8(00000000));
    TEST_OK(B_UNSET(x, 6), B8(00000000));
    TEST_OK(B_UNSET(x, 7), B8(00000000));
}

void test_B_TOGGLE()
{
    unsigned char x = B8(11111111);
    TEST_OK(B_TOGGLE(x, 0), B8(11111110));
    TEST_OK(B_TOGGLE(x, 0), B8(11111111));
    TEST_OK(B_TOGGLE(x, 1), B8(11111101));
    TEST_OK(B_TOGGLE(x, 1), B8(11111111));
    TEST_OK(B_TOGGLE(x, 2), B8(11111011));
    TEST_OK(B_TOGGLE(x, 2), B8(11111111));
    TEST_OK(B_TOGGLE(x, 3), B8(11110111));
    TEST_OK(B_TOGGLE(x, 3), B8(11111111));
    TEST_OK(B_TOGGLE(x, 4), B8(11101111));
    TEST_OK(B_TOGGLE(x, 4), B8(11111111));
    TEST_OK(B_TOGGLE(x, 5), B8(11011111));
    TEST_OK(B_TOGGLE(x, 5), B8(11111111));
    TEST_OK(B_TOGGLE(x, 6), B8(10111111));
    TEST_OK(B_TOGGLE(x, 6), B8(11111111));
    TEST_OK(B_TOGGLE(x, 7), B8(01111111));
    TEST_OK(B_TOGGLE(x, 7), B8(11111111));
}

void test_B_TURNOFF_1()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_TURNOFF_1(x), B8(11111110));
    TEST_OK(B_TURNOFF_1(x), B8(11111100));
    TEST_OK(B_TURNOFF_1(x), B8(11111000));
    TEST_OK(B_TURNOFF_1(x), B8(11110000));
    TEST_OK(B_TURNOFF_1(x), B8(11100000));
    TEST_OK(B_TURNOFF_1(x), B8(11000000));
    TEST_OK(B_TURNOFF_1(x), B8(10000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));

    x = B8(10101010);
    TEST_OK(B_TURNOFF_1(x), B8(10101000));
    TEST_OK(B_TURNOFF_1(x), B8(10100000));
    TEST_OK(B_TURNOFF_1(x), B8(10000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));

    x = B8(01010101);
    TEST_OK(B_TURNOFF_1(x), B8(01010100));
    TEST_OK(B_TURNOFF_1(x), B8(01010000));
    TEST_OK(B_TURNOFF_1(x), B8(01000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
}

void test_B_ISOLATE_1()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_ISOLATE_1(x), B8(00000001));
    TEST_OK(B_ISOLATE_1(x), B8(00000001));

    x = B8(11111110);
    TEST_OK(B_ISOLATE_1(x), B8(00000010));
    TEST_OK(B_ISOLATE_1(x), B8(00000010));

    x = B8(11111100);
    TEST_OK(B_ISOLATE_1(x), B8(00000100));
    TEST_OK(B_ISOLATE_1(x), B8(00000100));

    x = B8(11111000);
    TEST_OK(B_ISOLATE_1(x), B8(00001000));
    TEST_OK(B_ISOLATE_1(x), B8(00001000));

    x = B8(11110000);
    TEST_OK(B_ISOLATE_1(x), B8(00010000));
    TEST_OK(B_ISOLATE_1(x), B8(00010000));

    x = B8(11100000);
    TEST_OK(B_ISOLATE_1(x), B8(00100000));
    TEST_OK(B_ISOLATE_1(x), B8(00100000));

    x = B8(11000000);
    TEST_OK(B_ISOLATE_1(x), B8(01000000));
    TEST_OK(B_ISOLATE_1(x), B8(01000000));

    x = B8(10000000);
    TEST_OK(B_ISOLATE_1(x), B8(10000000));
    TEST_OK(B_ISOLATE_1(x), B8(10000000));

    x = B8(00000000);
    TEST_OK(B_ISOLATE_1(x), B8(00000000));

    x = B8(10000000);
    TEST_OK(B_ISOLATE_1(x), B8(10000000));

    x = B8(10001001);
    TEST_OK(B_ISOLATE_1(x), B8(00000001));

    x = B8(10001000);
    TEST_OK(B_ISOLATE_1(x), B8(00001000));
}

void test_B_PROPAGATE_1()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(10000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11100000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11110000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111100);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111110);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111111);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(00100000);
    TEST_OK(B_PROPAGATE_1(x), B8(00111111));
    TEST_OK(B_PROPAGATE_1(x), B8(00111111));

    x = B8(10101000);
    TEST_OK(B_PROPAGATE_1(x), B8(10101111));
    TEST_OK(B_PROPAGATE_1(x), B8(10101111));

    x = B8(10101010);
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));

    x = B8(10101010);
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
}

void test_B_ISOLATE_0()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_ISOLATE_0(x), B8(00000001));
    TEST_OK(B_ISOLATE_0(x), B8(00000010));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00000011);
    TEST_OK(B_ISOLATE_0(x), B8(00000100));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00000111);
    TEST_OK(B_ISOLATE_0(x), B8(00001000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00001111);
    TEST_OK(B_ISOLATE_0(x), B8(00010000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00011111);
    TEST_OK(B_ISOLATE_0(x), B8(00100000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00111111);
    TEST_OK(B_ISOLATE_0(x), B8(01000000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(01111111);
    TEST_OK(B_ISOLATE_0(x), B8(10000000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(11111111);
    TEST_OK(B_ISOLATE_0(x), B8(00000000));

    x = B8(01010101);
    TEST_OK(B_ISOLATE_0(x), B8(00000010));

    x = B8(01010111);
    TEST_OK(B_ISOLATE_0(x), B8(00001000));

    x = B8(01011111);
    TEST_OK(B_ISOLATE_0(x), B8(00100000));

    x = B8(01111111);
    TEST_OK(B_ISOLATE_0(x), B8(10000000));
}

void test_B_TURNON_0()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_TURNON_0(x), B8(00000001));
    TEST_OK(B_TURNON_0(x), B8(00000011));
    TEST_OK(B_TURNON_0(x), B8(00000111));
    TEST_OK(B_TURNON_0(x), B8(00001111));
    TEST_OK(B_TURNON_0(x), B8(00011111));
    TEST_OK(B_TURNON_0(x), B8(00111111));
    TEST_OK(B_TURNON_0(x), B8(01111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));

    x = B8(10101010);
    TEST_OK(B_TURNON_0(x), B8(10101011));
    TEST_OK(B_TURNON_0(x), B8(10101111));
    TEST_OK(B_TURNON_0(x), B8(10111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));

    x = B8(10000000);
    TEST_OK(B_TURNON_0(x), B8(10000001));
    TEST_OK(B_TURNON_0(x), B8(10000011));
    TEST_OK(B_TURNON_0(x), B8(10000111));
    TEST_OK(B_TURNON_0(x), B8(10001111));
    TEST_OK(B_TURNON_0(x), B8(10011111));
    TEST_OK(B_TURNON_0(x), B8(10111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));
}

int main()
{
    test_B8();
    test_B_EVEN();
    test_B_ODD();
    test_B_IS_SET();
    test_B_SET();
    test_B_UNSET();
    test_B_TOGGLE();
    test_B_TURNOFF_1();
    test_B_ISOLATE_1();
    test_B_PROPAGATE_1();
    test_B_ISOLATE_0();
    test_B_TURNON_0();

    TEST_END;

    return error_count ? EXIT_FAILURE : EXIT_SUCCESS;
}

Download "bithacks.h" header file:

Download: bithacks.h
Downloaded: 3864 times.
Download url: http://www.catonmat.net/download/bithacks.h

Download: bithacks-test.c
Downloaded: 3201 times.
Download url: http://www.catonmat.net/download/bithacks-test.c

The next post about this topic will be on advanced bithacks and extending bithacks.h with these new, advanced bithacks.

Have fun!

Google Python Search LibraryAs promised in my previous post on xgoogle library, I have added a module to get results from Google Sets.

Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it "red, green, blue," and it will predict other colors such as "yellow, black, white, brown, etc."

One of the most fascinating applications that this library can be used for is predicting domain names. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines "psychology.university.edu", "art.university.edu", "geography.university.edu", etc. Now, if we feed these names "psychology, art, geography" to Google Sets, it would come up with more names such as "history, mathematics, biology, and others". Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.

There are many other interesting applications. Black hat SEO's may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.

But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the introductory post of xgoogle library. Please see that post for more info.

The new module is called "googlesets", and to use it, import "GoogleSets" and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use "get_results()" member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.

Here is an example usage of the new module. It finds items related to programming languages "python" and "perl":

from xgoogle.googlesets import GoogleSets
gs = GoogleSets(['python', 'perl'])
items = gs.get_results()
for item in items:
  print item.encode('utf8')

Output:

python
perl
php
ruby
java
javascript
c++
c
cgi
tcl
c#

The output matches that of Google Sets itself:

Google Sets Predicted Items from Perl and Python

See the readme.txt file in the xgoogle archive for more examples.

Download "xgoogle" library:

Download: xgoogle library (.zip)
Downloaded: 21955 times.
Download url: http://www.catonmat.net/download/xgoogle.zip

Have fun and let me know if you find this library useful in any way in your own projects.

This article is part of the article series "Vim Plugins You Should Know About."
<- previous article next article ->

Vim Plugins, surround.vimThis is the fourth post in the article series "Vim Plugins You Should Know About". This time I am going to introduce you to a plugin called "snipmate.vim".

If you are intrigued by this topic, I suggest that you subscribe to my posts! For the introduction and first post in this article series, follow this link - Vim Plugins You Should Know About, Part I: surround.vim.

Snipmate.vim is probably the best snippets plugin for vim. A snippet is a piece of often-typed text or programming construct that you can insert into your document by using a trigger followed by a <tab>. It was written by Michael Sanders. He says he modeled this plugin after TextMate's snippets.

Here is an example usage of snipmate.vim. If you are a C programmer, then one of the most often used forms of a loop is "for (i=0; i<n; i++) { ... }". Without snippets you'd have to type this out every time. Even though it takes just another second, these seconds can add to minutes throughout the day and minutes can add to hours over longer periods of time. Why waste your time this way? With snippets you can type just "for<tab>" and snipmate will insert this whole construct in your source code automatically! If "i" or "n" weren't the variable you wanted to use, you can now use <tab> and <shift-tab> to jump to next/previous item in the loop and rename them!

Michael also created an introduction video for his plugin where he demonstrates how to use it. Check it out:

How to install snipmate.vim?

To get the latest version:

  • 1. Download snipmate.zip.
  • 2. Extract snipmate.zip to ~/.vim (on Unix/Linux) or ~\vimfiles (on Windows).
  • 3. Run :helptags ~/.vim/doc (on Unix/Linux) or :helptags ~/vimfiles/doc (on Windows) to rebuild the tags file (so that you can read :help snipmate.)
  • 4. Restart Vim.

The plugin comes with predefined snippets for more than a dozen languages (C, C++, HTML, Java, JavaScript, Objective C, Perl, PHP, Python, Ruby, Tcl, Shell, HTML, Mako templates, LaTeX, VimScript). Be sure to check out the snippet files in the "snippets" directory under your ~/.vim or ~\vimfiles directory.

If you need to define your own snippets (which you most likely will need), create a new file named "language-foo.snippets" in the "snippets" directory. For example, to define your own snippets for C language, you'd create a file called "c-foo.snippets" and place snippets in it.

To learn about snipmate snippet syntax, type ":help snipmate" and locate the syntax section in the help file.

Have Fun!

Have fun with this time saving plugin!

This article is part of the article series "Perl One-Liners Explained."
<- previous article next article ->

Perl One LinersThis is the second part of a seven-part article on famous Perl one-liners. In this part I will create various one-liners for line numbering. See part one for introduction of the series.

Famous Perl one-liners is my attempt to create "perl1line.txt" that is similar to "awk1line.txt" and "sed1line.txt" that have been so popular among Awk and Sed programmers.

The article on famous Perl one-liners will consist of at least seven parts:

The one-liners will make heavy use of Perl special variables. A few years ago I compiled all the Perl special variables in a single file and called it Perl special variable cheat-sheet. Even tho it's mostly copied out of perldoc perlvar, it's still handy to have in front of you. Print it!

Awesome news: I have written an e-book based on this article series. Check it out:

And here are today's one-liners:

Line Numbering

9. Number all lines in a file.

perl -pe '$_ = "$. $_"'

As I explained in the first one-liner, "-p" causes Perl to assume a loop around the program (specified by "-e") that reads each line of input into the " $_ " variable, executes the program and then prints the " $_ " variable.

In this one-liner I simply modify " $_ " and prepend the " $. " variable to it. The special variable " $. " contains the current line number of input.

The result is that each line gets its line number prepended.

10. Number only non-empty lines in a file.

perl -pe '$_ = ++$a." $_" if /./'

Here we employ the "action if condition" statement that executes "action" only if "condition" is true. In this case the condition is a regular expression "/./", which matches any character except newline (that is, it matches a non-empty line); and the action is " $_ = ++$a." $_" ", which prepends variable " $a " incremented by one to the current line. As we didn't use strict pragma, $a was created automatically.

The result is that at each non-empty line " $a " gets incremented by one and prepended to that line. And at each empty line nothing gets modified and the empty line gets printed as is.

11. Number and print only non-empty lines in a file (drop empty lines).

perl -ne 'print ++$a." $_" if /./'

This one-liner uses the "-n" program argument that places the line in " $_ " variable and then executes the program specified by "-e". Unlike "-p", it does not print the line after executing code in "-e", so we have to call "print" explicitly to get it printed.

The one-liner calls "print" only on lines that have at least one character in them. And exactly like in the previous one-liner, it increments the line number in variable " $a " by one for each non-empty line.

The empty lines simply get ignored and never get printed.

12. Number all lines but print line numbers only non-empty lines.

perl -pe '$_ = "$. $_" if /./'

This one-liner is similar to one-liner #10. Here I modify the " $_ " variable that holds the entire line only if the line has at least one character. All other lines (empty ones) get printed without line numbers.

13. Number only lines that match a pattern, print others unmodified.

perl -pe '$_ = ++$a." $_" if /regex/'

Here we again use the "action if condition" statement but the condition in this case is a pattern (regular expression) "/regex/". The action is the same as in one-liner #10. I don't want to repeat, see #10 for explanation.

14. Number and print only lines that match a pattern.

perl -ne 'print ++$a." $_" if /regex/'

This one-liner is almost exactly like #11. The only difference is that it prints numbered lines that match only "/regex/".

15. Number all lines, but print line numbers only for lines that match a pattern.

perl -pe '$_ = "$. $_" if /regex/'

This one-liner is similar to the previous one-liner and to one-liner #12. Here the line gets its line number prepended if it matches a /regex/, otherwise it just gets printed without a line number.

16. Number all lines in a file using a custom format (emulate cat -n).

perl -ne 'printf "%-5d %s", $., $_'

This one-liner uses the formatted print "printf" function to print the line number together with line. In this particular example the line numbers are left aligned on 5 char boundary.

Some other nice format strings are "%5d" that right-aligns line numbers on 5 char boundary and "%05d" that zero-fills and right-justifies the line numbers.

Here my Perl printf cheat sheet might come handy that lists all the possible format specifiers.

17. Print the total number of lines in a file (emulate wc -l).

perl -lne 'END { print $. }'

This one-liner uses the "END" block that Perl probably took as a feature from Awk language. The END block gets executed after the program has executed. In this case the program is the hidden loop over the input that was created by the "-n" argument. After it has looped over the input, the special variable " $. " contains the number of lines there was in the input. The END block prints this variable. The " -l " parameter sets the output record separator for "print" to a newline (so that we didn't have to print "$.\n").

Another way to do the same is:

perl -le 'print $n=()=<>'

This is a tricky one, but easy to understand if you know about Perl contexts. In this one-liner the " ()=<> " part causes the <> operator (the diamond operator) to evaluate in list context, that causes the diamond operator to read the whole file in a list. Next, " $n " gets evaluated in scalar context. Evaluating a list in a scalar context returns the number of elements in the list. Thus the " $n=()=<> " construction is equal to the number of lines in the input, that is number of lines in the file. The print statement prints this number out. The " -l " argument makes sure a newline gets added after printing out this number.

This is the same as writing the following, except longer:

perl -le 'print scalar(()=<>)'

And completely obvious version:

perl -le 'print scalar(@foo=<>)'

Yet another way to do it:

perl -ne '}{print $.'

This one-liner uses the eskimo operator "}{" in conjunction with "-n" command line argument. As I explained in one-liner #11, the "-n" argument forces Perl to assume a " while(<>) { } " loop around the program. The eskimo operator forces Perl to escape the loop, and the program turns out to be:

while (<>) {
}{                    # eskimo operator here
    print $.;
}

It's easy to see that this program just loops over all the input and after it's done doing so, it prints the " $. ", which is the number of lines in the input.

18. Print the number of non-empty lines in a file.

perl -le 'print scalar(grep{/./}<>)'

This one-liner uses the "grep" function that is similar to the grep Unix command. Given a list of values, " grep {condition} " returns only those values that match condition. In this case the condition is a regular expression that matches at least one character, so the input gets filtered and the "grep{/./}" returns all lines that were non empty. To get the number of characters we evaluate the list in scalar context and print the result. (As I mentioned in the previous one-liner list in scalar context evaluates to number of elements in the list).

A golfer's version of this one-liner would be to replace "scalar()" with " ~~ " (double bitwise negate), thus it can be shortened:

perl -le 'print ~~grep{/./}<>'

This can be made even shorter:

perl -le 'print~~grep/./,<>'

19. Print the number of empty lines in a file.

perl -lne '$a++ if /^$/; END {print $a+0}'

Here I use variable $a to count how many empty lines have I encountered. Once I have finished looping over all the lines, I print the value of $a in the END block. I use " $a+0 " construction to make sure " 0 " gets output if no lines were empty.

I could have also modified the previous one-liner:

perl -le 'print scalar(grep{/^$/}<>)'

Or written it with " ~~ ":

perl -le 'print ~~grep{/^$/}<>'

These last two versions are not as effective, as they would read the whole file in memory. Where as the first one would do it line by line.

20. Print the number of lines in a file that match a pattern (emulate grep -c).

perl -lne '$a++ if /regex/; END {print $a+0}'

This one-liner is basically the same as the previous one, except it increments the line counter $a by one in case a line matches a regular expression /regex/.

Perl one-liners explained e-book

I've now written the "Perl One-Liners Explained" e-book based on this article series. I went through all the one-liners, improved explanations, fixed mistakes and typos, added a bunch of new one-liners, added an introduction to Perl one-liners and a new chapter on Perl's special variables. Please take a look:

Have Fun!

Have fun with these one-liners. These were really easy this time. The next part is going to be about various calculations.

Can you think of other numbering operations that I did not include here?