When people talk about polymorphism in C++ they usually mean the thing of using a derived class through the base class pointer or reference, which is called subtype polymorphism. But they often forget that there are all kinds of other polymorphisms in C++, such as parametric polymorphism, ad-hoc polymorphism and coercion polymorphism.

These polymorphisms also go by different names in C++,

  • Subtype polymorphism is also known as runtime polymorphism.
  • Parametric polymorphism is also known as compile-time polymorphism.
  • Ad-hoc polymorphism is also known as overloading.
  • Coercion is also known as (implicit or explicit) casting.

In this article I'll illustrate all the polymorphisms through examples in C++ language and also give insight on why they have various other names.

Subtype Polymorphism (Runtime Polymorphism)

Subtype polymorphism is what everyone understands when they say "polymorphism" in C++. It's the ability to use derived classes through base class pointers and references.

Here is an example. Suppose you have various cats like these felines,

Polymorphic Cats
Polymorphic cats on a mat by James Halliday.

Since they are all of Felidae biological family, and they all should be able to meow, they can be represented as classes inheriting from Felid base class and overriding the meow pure virtual function,

// file cats.h

class Felid {
public:
 virtual void meow() = 0;
};

class Cat : public Felid {
public:
 void meow() { std::cout << "Meowing like a regular cat! meow!\n"; }
};

class Tiger : public Felid {
public:
 void meow() { std::cout << "Meowing like a tiger! MREOWWW!\n"; }
};

class Ocelot : public Felid {
public:
 void meow() { std::cout << "Meowing like an ocelot! mews!\n"; }
};

Now the main program can use Cat, Tiger and Ocelot interchangeably through Felid (base class) pointer,

#include <iostream>
#include "cats.h"

void do_meowing(Felid *cat) {
 cat->meow();
}

int main() {
 Cat cat;
 Tiger tiger;
 Ocelot ocelot;

 do_meowing(&cat);
 do_meowing(&tiger);
 do_meowing(&ocelot);
}

Here the main program passes pointers to cat, tiger and ocelot to do_meowing function that expects a pointer to Felid. Since they are all Felids, the program calls the right meow function for each felid and the output is:

Meowing like a regular cat! meow!
Meowing like a tiger! MREOWWW!
Meowing like an ocelot! mews!

Subtype polymorphism is also called runtime polymorphism for a good reason. The resolution of polymorphic function calls happens at runtime through an indirection via the virtual table. Another way of explaining this is that compiler does not locate the address of the function to be called at compile-time, instead when the program is run, the function is called by dereferencing the right pointer in the virtual table.

In type theory it's also known as inclusion polymorphism.

Parametric Polymorphism (Compile-Time Polymorphism)

Parametric polymorphism provides a means to execute the same code for any type. In C++ parametric polymorphism is implemented via templates.

One of the simplest examples is a generic max function that finds maximum of two of its arguments,

#include <iostream>
#include <string>

template <class T>
T max(T a, T b) {
 return a > b ? a : b;
}

int main() {
 std::cout << ::max(9, 5) << std::endl;     // 9

 std::string foo("foo"), bar("bar");
 std::cout << ::max(foo, bar) << std::endl; // "foo"
}

Here the max function is polymorphic on type T. Note, however, that it doesn't work on pointer types because comparing pointers compares the memory locations and not the contents. To get it working for pointers you'd have to specialize the template for pointer types and that would no longer be parametric polymorphism but would be ad-hoc polymorphism.

Since parametric polymorphism happens at compile time, it's also called compile-time polymorphism.

Ad-hoc Polymorphism (Overloading)

Ad-hoc polymorphism allows functions with the same name act differently for each type. For example, given two ints and the + operator, it adds them together. Given two std::strings it concatenates them together. This is called overloading.

Here is a concrete example that implements function add for ints and strings,

#include <iostream>
#include <string>

int add(int a, int b) {
 return a + b;
}

std::string add(const char *a, const char *b) {
 std::string result(a);
 result += b;
 return result;
}

int main() {
 std::cout << add(5, 9) << std::endl;
 std::cout << add("hello ", "world") << std::endl;
}

Ad-hoc polymorphism also appears in C++ if you specialize templates. Returning to the previous example about max function, here is how you'd write a max for two char *,

template <>
const char *max(const char *a, const char *b) {
 return strcmp(a, b) > 0 ? a : b;
}

Now you can call ::max("foo", "bar") to find maximum of strings "foo" and "bar".

Coercion Polymorphism (Casting)

Coercion happens when an object or a primitive is cast into another object type or primitive type. For example,

float b = 6; // int gets promoted (cast) to float implicitly
int a = 9.99 // float gets demoted to int implicitly

Explicit casting happens when you use C's type-casting expressions, such as (unsigned int *) or (int) or C++'s static_cast, const_cast, reinterpret_cast, or dynamic_cast.

Coercion also happens if the constructor of a class isn't explicit, for example,

#include <iostream>

class A {
 int foo;
public:
 A(int ffoo) : foo(ffoo) {}
 void giggidy() { std::cout << foo << std::endl; }
};

void moo(A a) {
 a.giggidy();
}

int main() {
 moo(55);     // prints 55
}

If you made the constructor of A explicit, that would no longer be possible. It's always a good idea to make your constructors explicit to avoid accidental conversions.

Also if a class defines conversion operator for type T, then it can be used anywhere where type T is expected.

For example,

class CrazyInt {
 int v;
public:
 CrazyInt(int i) : v(i) {}
 operator int() const { return v; } // conversion from CrazyInt to int
};

The CrazyInt defines a conversion operator to type int. Now if we had a function, let's say, print_int that took int as an argument, we could also pass it an object of type CrazyInt,

#include <iostream>

void print_int(int a) {
 std::cout << a << std::endl;
}

int main() {
 CrazyInt b = 55;
 print_int(999);    // prints 999
 print_int(b);      // prints 55
}

Subtype polymorphism that I discussed earlier is actually also coercion polymorphism because the derived class gets converted into base class type.

Have Fun!

Have fun with all the new knowledge about polymorphism!

This article is part of the article series "CommandLineFu One-Liners Explained."
<- previous article next article ->

CommandLineFu ExplainedHey everyone, this is the fourth article in the series on the most popular commandlinefu one-liners explained.

Here are the first three parts:

And here are today's one-liners:

31. Quickly access ASCII table.

$ man 7 ascii

Ever forgot a keycode for some ASCII character or escape code? Look no further, man ascii contains the 7-bit ASCII table. Take a look at it online.

Linux man pages are full of gems like these. One day I actually went through all the man pages to find the most interesting ones. An article about them is upcoming but before I get it published, here are a few interesting ones:

And finally the section 7 man pages that are most packed with wild and cool info,

  • man 7 bootparam - a tutorial and reference of Linux kernel boot parameters.
  • man 7 charsets - a Linux programmer's view of character sets and internationalization.
  • man 7 glob - how pathname globbing works.
  • man 7 hier - description of the Linux file system hierarchy.
  • man 7 operator - C language operator precedence and associativity table.
  • man 7 regex - basic and extended regular expression overview.
  • man 7 suffixes - lists common Linux file suffixes and the associated file types.
  • man 7 time - overview of time and timers.
  • man 7 units - kilo, kibi, mega, mebi, giga, gibi, ... - decimal (SI) and binary system of units.
  • man 7 utf8 - description of UTF-8 encoding.
  • man 7 url - description of URIs, URLs and URNs.

There are a lot more interesting man pages but these stood out from the rest.

32. Simple timer.

$ time read

This one-liner can be used a simple timer. For example, if you wish to time something, you can execute it when the event starts and press the return key when the event ends. It will output the time the event took to finish.

Here is how this one-liner works. First the time command times any command that is supplied to it. In this case the command supplied to it is read that reads a line from the standard input. As soon as you press enter, read finishes and time reports how long it took.

If you get annoyed by having to press enter, you can specify that read should return after having read 1 character,

$ time read -N 1

Now you can press any key to stop the timer.

If you wish to run a timer for a specific number of seconds, you can add -t flag for timeout,

$ time read -t 60

This would stop the timer after 60 seconds.

33. Shutdown a Windows machine.

$ net rpc shutdown -I IP_ADDRESS -U username%password

Everyone knows the net command, right? We all used to net use \\ip\ipc$ *, right? :)

Anyway, the net command comes as part of Samba, which allows you, for example, to use Linux in a Windows workgroup.

This particular one-liner executes the shutdown command on a Windows computer located at IP_ADDRESS with as the user username with password password.

To reboot a machine use the -r switch to net rpc:

$ net rpc shutdown -r -I IP_ADDRESS -U username%password

If you're on an unsecured network, don't forget about the good old nmblookup and smbclient tools that come with Samba.

34. Execute a command independently from the current shell.

$ (cd /tmp && ls)

This one-liner illustrates subshells. Here the commands cd /tmp and ls are executed but they do not affect the current shell. If you had done just cd /tmp && ls, your current shell would have changed directory to /tmp but in this one-liner it happens in a subshell and your current shell is not affected.

Surely, this is only a toy example. If you wanted to know what's in /tmp, you'd do just ls /tmp.

Actually, talking about cd, be aware of pushd and popd commands. They allow you to maintain a stack of directories you want to return to later. For example,

/long/path/is/long$ pushd .
/long/path/is/long$ cd /usr
/usr$ popd 
/long/path/is/long$

Or even shorter, passing the directory you're gonna cd to directly to pushd,

/long/path/is/long$ pushd /usr
/usr$ popd 
/long/path/is/long$

Another cool trick is to use cd - to return to the previous directory. Here is an example,

/home/pkrumins$ cd /tmp
/tmp$ cd -
/home/pkrumins$

35. Tunnel your SSH connection via intermediate host.

$ ssh -t reachable_host ssh unreachable_host

This one-liner creates an ssh connection to unreachable_host via reachable_host. It does it by executing the ssh unreachable_host on reachable_host. The -t forces ssh to allocate a pseudo-tty, which is necessary for working interactively in the second ssh to unreachable_host.

This one-liner can be generalized. You can tunnel through arbitrary number of ssh servers:

$ ssh -t host1 ssh -t host2 ssh -t host3 ssh -t host4 ...

Now catch me if you can. ;)

36. Clear the terminal screen.

$ CTRL+l

Pressing CTRL+l (that's small L) clears the screen leaving the current line at the top of the screen.

If you wish to clear just some line, you can use argumented version of CTRL+l - first press ESC, then the line you want to clear, let's say 21 (21st line), and then press the same CTRL+l. That will clear the 21st line on the screen without erasing the whole screen.

$ ESC 21 CTRL+l

This command outputs a special "clear-screen" sequence to the terminal. The same can be achieved by tput command,

$ tput clear

Another way to clear the terminal (usually when the screen gets garbled) is to use the reset command,

$ reset

37. Hear when the machine comes back online.

$ ping -a IP

Ever had a situation when you need to know when the system comes up after a reboot? Up until now you probably launched ping and either followed the timeouts until the system came back, or left it running and occasionally checked its output to see if the host is up. But that is unnecessary, you can make ping -a audible! As soon as the host at IP is back, ping will beep!

38. List 10 most often used commands.

$ history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head

The person who wrote it has the Unix mindset right. He's combining several shell commands to get the result he/she wants.

First, history outputs all the commands the person has executed. Next, awk counts how many times the second column $2 appears in the output. Once history has output all the commands and awk has counted them, awk loops over all the commands and outputs the count a[i] separated by space, followed by the command itself. Then sort takes this input and sorts numerically -n and reverses the output -r, so that most frequent commands were on top. Finally head outputs the first 10 most frequent history commands.

If you want to see more than 10 commands (or less), change head to head -20 for 20 commands or head -5 for 5 commands.

39. Check gmail for new mail.

$ curl -u you@gmail.com --silent "https://mail.google.com/mail/feed/atom" |
  perl -ne \
  '
    print "Subject: $1 " if /<title>(.+?)<\/title>/ && $title++;
    print "(from $1)\n" if /<email>(.+?)<\/email>/;
  '

Gmail is cool because they offer an Atom feed for the new mail. This one-liner instructs curl to retrieve the feed and authenticate as you@gmail.com. You'll be prompted a password after you execute the command. Next it feeds the output to perl. Perl extracts the title (subject) of each email and the sender's email. These two items are printed to stdout.

Here is a the output when I run the command,

Subject: i heard you liked windows! (from gates@microsoft.com)
Subject: got root? (from bofh@underground.org)

40. Watch Star-Wars via telnet.

$ telnet towel.blinkenlights.nl

Needs no explaining. Just telnet to the host to watch ASCII Star-Wars.

And here is another one,

$ telnet towel.blinkenlights.nl 666

Connecting on port 666 will spit out BOFH excuses.

That's it for today.

I hope you enjoyed the 4th part of the article. Tune in next time for the 5th part.

Oh, and I'd love if you followed me on Twitter!

It's interesting how the term "functor" means completely different things in various programming languages. Take C++ for example. Everyone who has mastered C++ knows that you call a class that implements operator() a functor. Now take Standard ML. In ML functors are mappings from structures to structures. Now Haskell. In Haskell functors are just homomorphisms over containers. And in Prolog functor means the atom at the start of a structure. They all are different. Let's take a closer look at each one.

Functors in C++

Functors in C++ are short for "function objects." Function objects are instances of C++ classes that have the operator() defined. If you define operator() on C++ classes you get objects that act like functions but can also store state. Here is an example,

#include <iostream>
#include <string>

class SimpleFunctor {
    std::string name_;
public:
    SimpleFunctor(const char *name) : name_(name) {}
    void operator()() { std::cout << "Oh, hello, " << name_ << endl; }
};

int main() {
    SimpleFunctor sf("catonmat");
    sf();  // prints "Oh, hello, catonmat"
}

Notice how I was able to call sf() in the main function, even though sf was an object? That's because I defined operator() in SimpleFunctor's class.

Most often functors in C++ are used as predicates, fake closures or comparison functions in STL algorithms. Here is another example. Suppose you have a list of integers and you wish to find the sum of all even ones, and the sum of all odd ones. Perfect job for a functor and for_each algorithm.

#include <algorithm>
#include <iostream>
#include <list>

class EvenOddFunctor {
    int even_;
    int odd_;
public:
    EvenOddFunctor() : even_(0), odd_(0) {}
    void operator()(int x) {
        if (x%2 == 0) even_ += x;
        else odd_ += x;
    }
    int even_sum() const { return even_; }
    int odd_sum() const { return odd_; }
};

int main() {
    EvenOddFunctor evenodd;
    
    int my_list[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    evenodd = std::for_each(my_list,
                  my_list+sizeof(my_list)/sizeof(my_list[0]),
                  evenodd);

    std::cout << "Sum of evens: " << evenodd.even_sum() << "\n";
    std::cout << "Sum of odds: " << evenodd.odd_sum() << std::endl;

    // output:
    // Sum of evens: 30
    // Sum of odds: 25
}

Here an instance of an EvenOddFunctor gets passed to for_each algorithm. The for_each algorithm iterates over each element in my_list and calls the functor. After it's done, it returns a copy of evenodd functor that contains the sum of evens and odds.

Functors in Standard ML

Vaguely talking in object-oriented terms, functors in ML are generic implementations of interfaces. In ML terms, functors are part of ML module system and they allow to compose structures.

Here is an example, suppose you want to write a plugin system and you wish all the plugins to implement the required interface, which, for simplicity, includes only the perform function. In ML you can first define a signature for plugins,

signature Plugin =
sig
    val perform : unit -> unit
end;

Now that we have defined an interface (signature) for plugins, we can implement two Plugins, let's say LoudPlugin and SilentPlugin. The implementation is done via structures,

structure LoudPlugin :> Plugin =
struct
    fun perform () = print "DOING JOB LOUDLY!\n"
end;

And the SilentPlugin,

structure SilentPlugin :> Plugin =
struct
    fun perform () = print "doing job silently\n"
end;

Now we get to functors. Remember functors in ML take structures as their arguments, so we can write one that takes Plugin as its argument,

functor Performer(P : Plugin) =
struct
    fun job () = P.perform ()
end;

This functor takes Plugin as P argument, and uses it in the job function, that calls P plugin's perform function.

Now let's use the Performer functor. Remember that a functor returns a structure,

structure LoudPerformer = Performer(LoudPlugin);
structure SilentPerformer = Performer(SilentPlugin);

LoudPerformer.job ();
SilentPerformer.job ();

This outputs,

DOING JOB LOUDLY!
doing job silently

This is probably the simplest possible example of Standard ML functors.

Functors in Haskell

Functors in Haskell is what real functors are supposed to be. Haskell functors resemble mathematical functors from category theory. In category theory a functor F is a mapping between two categories such that the structure of the category being mapped over is preserved, in other words, it's a homomorphism between two categories.

In Haskell this definition is implemented as a simple type class,

class Functor f where
  fmap :: (a -> b) -> f a -> f b

Looking back at ML example, a type class in Haskell is like a signature, except it's defined on types. It defines what operations a type has to implement to be an instance of this class. In this case, however, the Functor is not defined over types but over type constructors f. It says, a Functor is something that implements the fmap function that takes a function from type a to type b, and a value of type f a (a type constructed from type constructor f applied to type a) and returns a value of type f b.

To understand what it does, think of fmap as a function that applies an operation to each element in some kind of a container.

The simplest example of functors is regular lists and the map function that maps a function to each element in the list.

Prelude> map (+1) [1,2,3,4,5]
[2,3,4,5,6]

In this simple example, the Functor's fmap function is just map and type constructor f is [] - the list type constructor. Therefore the Functor instance for lists is defined as

instance Functor [] where
  fmap = map

Let's see if it really is true by using fmap instead of map in the previous example,

Prelude> fmap (+1) [1,2,3,4,5]
[2,3,4,5,6]

But notice that Functor definition does not say anything about preserving the structure! Therefore any sensible functor must satisfy the functor laws, which are part of the definition of the mathematical functor, implicitly. There are two rules on fmap:

fmap id = id
fmap (g . h) = fmap g . fmap h

The first rule says that mapping the identity function over every element in a container has no effect. The second rule says that a composition of two functions over every item in a container is the same as first mapping one function, and then mapping the other.

Another example of Functors that illustrate them the most vividly is operations over trees. Think of a tree as a container, then fmap maps a function over tree values, while preserving the tree's structure.

Let's define a Tree first,

data Tree a = Node (Tree a) (Tree a)
            | Leaf a
              deriving Show

This definition says that a Tree of type a is either a Node of two Trees (left and right branches) or a Leaf. The deriving Show expression allows us to inspect the Tree via show function.

Now we can define a Functor over Trees,

instance Functor Tree where
    fmap g (Leaf v) = Leaf (g v)
    fmap g (Node l r) = Node (fmap g l) (fmap g r)

This definition says, that fmap of function g over a Leaf with value v is just a Leaf of g applied to v. And fmap of g over a Node with left l and right r branches is just a Node of fmap applied to the values of left and right branches.

Now let's illustrate how fmap works on trees. Let's construct a tree with String leaves and map the length function over them to find out the length of each leaf.

Prelude> let tree = (Node (Node (Leaf "hello") (Leaf "foo")) (Leaf "baar"))
Prelude> fmap length tree
Node (Node (Leaf 5) (Leaf 3)) (Leaf 4)

Here I constructed the following tree,

           *
          / \
         /   \
        *  "baar"
       / \
      /   \
     /     \
    /       \
 "hello"  "foo"

And mapped length function over it, producing,

           *
          / \
         /   \
        *     4
       / \     
      /   \
     /     \
    /       \
   5         3

Another way of saying what fmap does is that is lifts a function from the "normal world" into the "f world."

In fact Functor is the most fundamental type class in Haskell because Monads, Applicatives and Arrows are all Functors. As I like to say it, Haskell starts where the functors start.

If you wish to learn more about Haskell type classes, start with the excellent article Typeclassopedia (starts at page 17).

Functors in Prolog

Finally, functors in Prolog. Functors in Prolog are the simplest of all. They refer to two things. The first is the atom at the start of the structure. Here is an example, given an expression,

?- likes(mary, pizza)

the functor is the first atom - likes.

The second is built-in predicate called functor. It returns the arity and the functor of a structure. For example,

?- functor(likes(mary, pizza), Functor, Arity).
Functor = likes
Arity = 2

That's it for functors in Prolog.

Conclusion

This article demonstrated how a simple term like "functor" can refer to completely different things in various programming languages. Therefore when you hear a the term "functor", it's important to know the context it's being mentioned in.

I thought I'd do a shorter article on catonmat this time. It goes hand in hand with my upcoming article series on "100% technical guide to anonymity" and it's much easier to write larger articles in smaller pieces. Then I can edit them together and produce the final article.

This article will be interesting for those who didn't know it already -- you can turn any Linux computer into a SOCKS5 (and SOCKS4) proxy in just one command:

ssh -N -D 0.0.0.0:1080 localhost

And it doesn't require root privileges. The ssh command starts up dynamic -D port forwarding on port 1080 and talks to the clients via SOCSK5 or SOCKS4 protocols, just like a regular SOCKS5 proxy would! The -N option makes sure ssh stays idle and doesn't execute any commands on localhost.

If you also wish the command to go into background as a daemon, then add -f option:

ssh -f -N -D 0.0.0.0:1080 localhost

To use it, just make your software use SOCKS5 proxy on your Linux computer's IP, port 1080, and you're done, all your requests now get proxied.

Access control can be implemented via iptables. For example, to allow only people from the ip 1.2.3.4 to use the SOCKS5 proxy, add the following iptables rules:

iptables -A INPUT --src 1.2.3.4 -p tcp --dport 1080 -j ACCEPT
iptables -A INPUT -p tcp --dport 1080 -j REJECT

The first rule says, allow anyone from 1.2.3.4 to connect to port 1080, and the other rule says, deny everyone else from connecting to port 1080.

Surely, executing iptables requires root privileges. If you don't have root privileges, and you don't want to leave your proxy open (and you really don't want to do that), you'll have to use some kind of a simple TCP proxy wrapper to do access control.

Here, I wrote one in Perl. It's called tcp-proxy.pl and it uses IO::Socket::INET to abstract sockets, and IO::Select to do connection multiplexing.

#!/usr/bin/perl
#

use warnings;
use strict;

use IO::Socket::INET;
use IO::Select;

my @allowed_ips = ('1.2.3.4', '5.6.7.8', '127.0.0.1', '192.168.1.2');
my $ioset = IO::Select->new;
my %socket_map;

my $debug = 1;

sub new_conn {
    my ($host, $port) = @_;
    return IO::Socket::INET->new(
        PeerAddr => $host,
        PeerPort => $port
    ) || die "Unable to connect to $host:$port: $!";
}

sub new_server {
    my ($host, $port) = @_;
    my $server = IO::Socket::INET->new(
        LocalAddr => $host,
        LocalPort => $port,
        ReuseAddr => 1,
        Listen    => 100
    ) || die "Unable to listen on $host:$port: $!";
}

sub new_connection {
    my $server = shift;
    my $client = $server->accept;
    my $client_ip = client_ip($client);

    unless (client_allowed($client)) {
        print "Connection from $client_ip denied.\n" if $debug;
        $client->close;
        return;
    }
    print "Connection from $client_ip accepted.\n" if $debug;

    my $remote = new_conn('localhost', 55555);
    $ioset->add($client);
    $ioset->add($remote);

    $socket_map{$client} = $remote;
    $socket_map{$remote} = $client;
}

sub close_connection {
    my $client = shift;
    my $client_ip = client_ip($client);
    my $remote = $socket_map{$client};
    
    $ioset->remove($client);
    $ioset->remove($remote);

    delete $socket_map{$client};
    delete $socket_map{$remote};

    $client->close;
    $remote->close;

    print "Connection from $client_ip closed.\n" if $debug;
}

sub client_ip {
    my $client = shift;
    return inet_ntoa($client->sockaddr);
}

sub client_allowed {
    my $client = shift;
    my $client_ip = client_ip($client);
    return grep { $_ eq $client_ip } @allowed_ips;
}

print "Starting a server on 0.0.0.0:1080\n";
my $server = new_server('0.0.0.0', 1080);
$ioset->add($server);

while (1) {
    for my $socket ($ioset->can_read) {
        if ($socket == $server) {
            new_connection($server);
        }
        else {
            next unless exists $socket_map{$socket};
            my $remote = $socket_map{$socket};
            my $buffer;
            my $read = $socket->sysread($buffer, 4096);
            if ($read) {
                $remote->syswrite($buffer);
            }
            else {
                close_connection($socket);
            }
        }
    }
}

To use it, you'll have to make a change to the previous configuration. Instead of running ssh SOCKS5 proxy on 0.0.0.0:1080, you'll need to run it on localhost:55555,

ssh -f -N -D 55555 localhost

After that, run the tcp-proxy.pl,

perl tcp-proxy.pl &

The TCP proxy will start listening on 0.0.0.0:1080 and will redirect only the allowed IPs in @allowed_ips list to localhost:55555.

Another possibility is to use another computer instead of your own as exit node. What I mean is you can do the following:

ssh -f -N -D 1080 other_computer.com

This will set up a SOCKS5 proxy on localhost:1080 but when you use it, ssh will automatically tunnel your requests (encrypted) via other_computer.com. This way you can hide what you're doing on the Internet from anyone who might be sniffing your link. They will see that you're doing something but the traffic will be encrypted so they won't be able to tell what you're doing.

That's it. You're now the proxy king!

Download tcp-proxy.pl

Download link: tcp proxy (tcp-proxy.pl)
Download URL: http://www.catonmat.net/download/tcp-proxy.pl
Downloaded: 5964 times

I also pushed the tcp-proxy.pl to GitHub: tcp-proxy.pl on GitHub. This project is also pretty nifty to generalize and make a program that redirects between any number of hosts:ports, not just two.

PS. I will probably also write "A definitive guide to ssh port forwarding" some time in the future because it's an interesting but little understood topic.

var http = require('http');

http.createServer(function(request, response) {
  var proxy = http.createClient(80, request.headers['host'])
  var proxy_request = proxy.request(request.method, request.url, request.headers);
  proxy_request.addListener('response', function (proxy_response) {
    proxy_response.addListener('data', function(chunk) {
      response.write(chunk, 'binary');
    });
    proxy_response.addListener('end', function() {
      response.end();
    });
    response.writeHead(proxy_response.statusCode, proxy_response.headers);
  });
  request.addListener('data', function(chunk) {
    proxy_request.write(chunk, 'binary');
  });
  request.addListener('end', function() {
    proxy_request.end();
  });
}).listen(8080);

This is just amazing. In 20 lines of node.js code and 10 minutes of time I was able to write a HTTP proxy. And it scales well, too. It's not a blocking HTTP proxy, it's event driven and asynchronous, meaning hundreds of people can use simultaneously and it will work well.

To get the proxy running all you have to do is download node.js, compile it, and run the proxy program via the node program:

$ ./configure --prefix=/home/pkrumins/installs/nodejs-0.1.92
$ make
$ make install

$ PATH=$PATH:/home/pkrumins/installs/nodejs-0.1.92/bin

$ node proxy.js

And from here you can take this proxy wherever your imagination takes. For example, you can start by adding logging:

var http = require('http');
var sys  = require('sys');

http.createServer(function(request, response) {
  sys.log(request.connection.remoteAddress + ": " + request.method + " " + request.url);
  var proxy = http.createClient(80, request.headers['host'])
  var proxy_request = proxy.request(request.method, request.url, request.headers);
  proxy_request.addListener('response', function (proxy_response) {
    proxy_response.addListener('data', function(chunk) {
      response.write(chunk, 'binary');
    });
    proxy_response.addListener('end', function() {
      response.end();
    });
    response.writeHead(proxy_response.statusCode, proxy_response.headers);
  });
  request.addListener('data', function(chunk) {
    proxy_request.write(chunk, 'binary');
  });
  request.addListener('end', function() {
    proxy_request.end();
  });
}).listen(8080);

Next, you can add a regex-based host blacklist in 15 additional lines:

var http = require('http');
var sys  = require('sys');
var fs   = require('fs');

var blacklist = [];

fs.watchFile('./blacklist', function(c,p) { update_blacklist(); });

function update_blacklist() {
  sys.log("Updating blacklist.");
  blacklist = fs.readFileSync('./blacklist').split('\n')
              .filter(function(rx) { return rx.length })
              .map(function(rx) { return RegExp(rx) });
}

http.createServer(function(request, response) {
  for (i in blacklist) {
    if (blacklist[i].test(request.url)) {
      sys.log("Denied: " + request.method + " " + request.url);
      response.end();
      return;
    }
  }

  sys.log(request.connection.remoteAddress + ": " + request.method + " " + request.url);
  var proxy = http.createClient(80, request.headers['host'])
  var proxy_request = proxy.request(request.method, request.url, request.headers);
  proxy_request.addListener('response', function(proxy_response) {
    proxy_response.addListener('data', function(chunk) {
      response.write(chunk, 'binary');
    });
    proxy_response.addListener('end', function() {
      response.end();
    });
    response.writeHead(proxy_response.statusCode, proxy_response.headers);
  });
  request.addListener('data', function(chunk) {
    proxy_request.write(chunk, 'binary);
  });
  request.addListener('end', function() {
    proxy_request.end();
  });
}).listen(8080);

update_blacklist();

Now to block proxy users from using Facebook, just echo facebook.com to blacklist file:

$ echo 'facebook.com' >> blacklist

The proxy server will automatically notice the changes to the file and update the blacklist.

Surely, a proxy server without IP control is no proxy server, so let's add that as well:

var http = require('http');
var sys  = require('sys');
var fs   = require('fs');

var blacklist = [];
var iplist    = [];

fs.watchFile('./blacklist', function(c,p) { update_blacklist(); });
fs.watchFile('./iplist', function(c,p) { update_iplist(); });

function update_blacklist() {
  sys.log("Updating blacklist.");
  blacklist = fs.readFileSync('./blacklist').split('\n')
              .filter(function(rx) { return rx.length })
              .map(function(rx) { return RegExp(rx) });
}

function update_iplist() {
  sys.log("Updating iplist.");
  iplist = fs.readFileSync('./iplist').split('\n')
           .filter(function(ip) { return ip.length });
}

http.createServer(function(request, response) {
  var allowed_ip = false;
  for (i in iplist) {
    if (iplist[i] == request.connection.remoteAddress) {
      allowed_ip = true;
      break;
    }
  }

  if (!allowed_ip) {
    sys.log("IP " + request.connection.remoteAddress + " is not allowed");
    response.end();
    return;
  }

  for (i in blacklist) {
    if (blacklist[i].test(request.url)) {
      sys.log("Denied: " + request.method + " " + request.url);
      response.end();
      return;
    }
  }

  sys.log(request.connection.remoteAddress + ": " + request.method + " " + request.url);
  var proxy = http.createClient(80, request.headers['host'])
  var proxy_request = proxy.request(request.method, request.url, request.headers);
  proxy_request.addListener('response', function(proxy_response) {
    proxy_response.addListener('data', function(chunk) {
      response.write(chunk, 'binary');
    });
    proxy_response.addListener('end', function() {
      response.end();
    });
    response.writeHead(proxy_response.statusCode, proxy_response.headers);
  });
  request.addListener('data', function(chunk) {
    proxy_request.write(chunk, 'binary');
  });
  request.addListener('end', function() {
    proxy_request.end();
  });
}).listen(8080);

update_blacklist();
update_iplist();

By default the proxy server will not allow any connections, so add all the IPs you want the proxy to be accessible from to iplist file:

$ echo '1.2.3.4' >> iplist

Finally, let's refactor the code a little:

var http = require('http');
var sys  = require('sys');
var fs   = require('fs');

var blacklist = [];
var iplist    = [];

fs.watchFile('./blacklist', function(c,p) { update_blacklist(); });
fs.watchFile('./iplist', function(c,p) { update_iplist(); });

function update_blacklist() {
  sys.log("Updating blacklist.");
  blacklist = fs.readFileSync('./blacklist').split('\n')
              .filter(function(rx) { return rx.length })
              .map(function(rx) { return RegExp(rx) });
}

function update_iplist() {
  sys.log("Updating iplist.");
  iplist = fs.readFileSync('./iplist').split('\n')
           .filter(function(rx) { return rx.length });
}

function ip_allowed(ip) {
  for (i in iplist) {
    if (iplist[i] == ip) {
      return true;
    }
  }
  return false;
}

function host_allowed(host) {
  for (i in blacklist) {
    if (blacklist[i].test(host)) {
      return false;
    }
  }
  return true;
}

function deny(response, msg) {
  response.writeHead(401);
  response.write(msg);
  response.end();
}

http.createServer(function(request, response) {
  var ip = request.connection.remoteAddress;
  if (!ip_allowed(ip)) {
    msg = "IP " + ip + " is not allowed to use this proxy";
    deny(response, msg);
    sys.log(msg);
    return;
  }

  if (!host_allowed(request.url)) {
    msg = "Host " + request.url + " has been denied by proxy configuration";
    deny(response, msg);
    sys.log(msg);
    return;
  }

  sys.log(ip + ": " + request.method + " " + request.url);
  var proxy = http.createClient(80, request.headers['host'])
  var proxy_request = proxy.request(request.method, request.url, request.headers);
  proxy_request.addListener('response', function(proxy_response) {
    proxy_response.addListener('data', function(chunk) {
      response.write(chunk, 'binary');
    });
    proxy_response.addListener('end', function() {
      response.end();
    });
    response.writeHead(proxy_response.statusCode, proxy_response.headers);
  });
  request.addListener('data', function(chunk) {
    proxy_request.write(chunk, 'binary);
  });
  request.addListener('end', function() {
    proxy_request.end();
  });
}).listen(8080);

update_blacklist();
update_iplist();

Again, it's amazing how fast you can write server software in node.js and JavaScript. It would have probably taken me a day to write the same in C. I love how fast you can prototype the software nowadays.

Download proxy.js

Download link: proxy server written in node.js
Download URL: http://www.catonmat.net/download/proxy.js
Downloaded: 13764 times

I am gonna build this proxy up, so I also put it on GitHub: proxy.js on GitHub

Happy proxying!