post 'good coders code, great reuse' to del.icio.us post 'good coders code, great reuse' to digg post 'good coders code, great reuse' to reddit subscribe to 'good coders code, great reuse' posts via feed
good coders code, great reuse

Comparing to another activity is useful if it helps you formulate questions, it's dangerous when you use it to justify answers.

Martin Fowler

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Hacker's Approach 17 Nov 2009 10:50 am
1 Star2 Stars3 Stars4 Stars5 Stars (7 votes, average: 4.71 out of 5)
Loading ... Loading ...

Feedburner used to have a really nice RSS subscriber growth graph. I loved it. But then one day they were acquired by Google and they changed their nice chart to an interactive flash thing that was slow and looked just awful.

Here is how awesome the graph used to look like:


This graph was taken from my “one year of blogging” post.

And how it looks now:

Crappy Feedburner Stats
This graph was taken from Feedburner stats dashboard today.
Choose “Show stats for” -> “all time” to generate this graph.

This critter takes 35MB of RAM, responds in 4 seconds and worst of all, looks very, very ugly. I don’t know why would anyone replace a nice 6.5KB image with a 35MB monster.

I don’t want to see this ugliness anymore, therefore I’ll create a Perl program that generates the awesome graph they used to have. I’ll write my thought process in creating this program in this post. Here it goes.

First I need to get the data somehow. I remember they had some kind of an API to get the data. A quick Google search for feedburner api returns this link Feedburner Awareness API. Ya, that’s it. This is the documentation of their API.

Accessing the following URL gets me subscriber count data from July 1, 2007 to November 17, 2009:

http://feedburner.google.com/api/awareness/1.0/GetFeedData?uri=catonmat&dates=2007-07-01,2009-11-17

Excellent, now I can write the Perl program. It will need to parse the XML data, draw the chart and save the image to a file.

Hmm, how should I invoke my program? Ok, here is how:

$ generate_feedburner_graph.pl <feed name> [<start date> [<end date>]]

# if end date is not specified, it’s set to today.
# if start date is not specified, it’s set to first
# day when the feed had at least one subscriber.

This program will use LibGD to generate the image. It will save it to a file called feed_name-start_date-end_date.png.

Now I need to find the colors used in the awesome feedburner graph. For this purpose I’ll use ColorZilla Firefox plugin. The green one is #95CF9C, the background is #F2F8FC, the light grid is #CCCECE, the one that separates the green area from background is #687279, and the x-y axis are #808080.

Alright, now I have everything I need to create the program.

… Several hours later …

Done!

One thing I forgot to mention is that you will need DejaVuSans TrueType font to run this program (it uses it to draw text). Download it and put the DejaVuSans.ttf in the same directory as the program.

#!/usr/bin/perl
#
# Feedburner graph generator
# Version 1.0
#

use warnings;
use strict;

use WWW::Mechanize;
use List::Util 'max';
use XML::Simple;
use POSIX;
use GD;

# This is the API URL that returns XML data with feed statistics by day.
my $feedburner_url = "http://feedburner.google.com/api/awareness/1.0/GetFeedData?uri=%s&dates=%s,%s";

# This function prints the usage and terminates the program.
sub usage {
    printf "Usage: %s <feed name> [<start date> [<end date>]]\n", $0;
    print "Parameters:\n";
    print "<feed name>  - your feed name, for example 'catonmat'\n";
    print "<start date> - start date (YYYY-MM-DD)\n";
    print "<end date>   - end date (YYYY-MM-DD), today if not specified\n";
    exit(1);
}

# This function checks if DejaVuSans font is present, if not
# it prints the instructions on how to download and terminates the program.
sub check_dejavu_sans {
    unless (-e 'DejaVuSans.ttf') {
        print "Please download DejaVu fonts and put DejaVuSans.ttf file in\n";
        print "the same directory as this program.\n";
        print "http://dejavu-fonts.org/wiki/index.php?title=Download\n";
        exit(1);
    }
}

# Given year, month, day from `struct tm` (man 3 mktime),
# it constructs a YYYY-MM-DD string.
sub format_date {
    my ($y, $m, $d) = @_;
    return sprintf("%04d-%02d-%02d", $y+1900, $m+1, $d);
}

# Given the `struct tm` (man 3 mktime) as a 9-list (perldoc -f localtime),
# it constructs a YYYY-MM-DD string.
sub yyyymmdd_from_9list {
    my ($y, $m, $d) = @_[5,4,3];
    return format_date $y, $m, $d;
}

# This function returns a YYYY-MM-DD string for today.
sub today {
    return yyyymmdd_from_9list localtime
}

# This function constructs the 9-list (perldoc -f localtime) for a
# date that was $months_ago months ago.
sub months_ago {
    my $months_ago = shift;
    my @date = @_;
    $date[4] -= $months_ago;
    return localtime mktime @date;
}

# Given feed data from feedburner's api (array of hashrefs), it finds
# the first date that had non-zero circulation.
# If no such date exists, it returns undef.
sub find_first_nonzero {
    my @feed_data = @_;
    return if $feed_data[0]->{circulation} != 0;
    my $prev_item;
    for my $item (@feed_data) {
        return $prev_item if $item->{circulation};
        $prev_item = $item;
    }
    return
}

# Given feed's name, this function finds the first date the
# feed had some subscribers, i.e., feed's start date.
sub find_start_date {
    my $feed = shift;
    print "Finding feed's start date...\n";
    my @ago = months_ago 6, localtime;
    my $end_date = today();
    while (1) {
        my $start_date = format_date @ago[5,4,3];

        print "Trying $start_date as start date...\n";
        my @feed_data = get_feed_data($feed, $start_date, $end_date);
        my $non_zero = find_first_nonzero(@feed_data);
        if ($non_zero) {
            print "Found $non_zero->{date} as start date!\n";
            return $non_zero->{date};
        }

        $end_date = yyyymmdd_from_9list @ago;
        @ago = months_ago 6, @ago;
    }
}

# This function returns an array of hashrefs of feeds data.
# Each hash contains 'reach', 'hits', 'date', and 'circulation' keys.
sub get_feed_data {
    my $raw_feed_data = get_raw_feed_data(@_);
    my $feed_data = XML::Simple->new->XMLin($raw_feed_data);
    if ($feed_data->{stat} ne "ok") {
        die $feed_data->{err}{msg}
    }
    return @{$feed_data->{'feed'}{'entry'}};
}

# This function formats the $feedburner_url and uses WWW::Mechanize
# to get the feed data via feedburner's API.
sub get_raw_feed_data {
    my ($feed, $start_date, $end_date) = @_;
    my $url = sprintf($feedburner_url, $feed, $start_date, $end_date);
    return WWW::Mechanize->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5')->get($url)->content;
}

# This function drops feed items when they can't fit in graph's width.
sub drop_data {
    my ($width, @data) = @_;
    my $len = $#data;
    my $delta = @data - $width;
    my @drop = map { int($len / $delta * $_) } 1..$delta;
    splice @data, $_, 1 for reverse @drop;
    return @data;
}

# This function duplicates feed items when there are not enough items
# to fill the whole graph.
sub dupe_data {
    my ($width, @data) = @_;
    my $len = $#data;
    my $delta = $width - @data;
    my @dupe = map { int($len / $delta * $_) } 1..$delta;
    splice @data, $_, 0, $data[$_] for reverse @dupe;
    return @data;
}

# This function draws the outline of the graph box where the green
# lines are drawn.
sub draw_outline {
    my ($gd, $grid, $xy, $bg) = @_;
    $gd->rectangle(40, 4, 482, 100, $grid);
    $gd->filledRectangle(41, 5, 481, 99, $bg);
    $gd->line(40, 4, 40, 100, $xy);
    $gd->line(38, 100, 482, 100, $xy);
}

# This function draws the grid lines.
sub draw_grid {
    my ($gd, $xy, $grid) = @_;

    # horizontal
    $gd->line(41, 26, 482, 26, $grid);
    $gd->line(38, 26, 40, 26, $xy);
    $gd->line(41, 63, 482, 63, $grid);
    $gd->line(38, 63, 40, 63, $xy);

    # vertical
    for (my $x = 77; $x <= 442; $x += 73) {
        $gd->line($x, 4, $x, 99, $grid);
        $gd->line($x, 100, $x, 102, $xy);
    }
}

# This function saves the $gd image to a file named
# "feed_name-start_date-end_date.png"
sub save_image {
    my ($gd, $feed, $start_date, $end_date, @data) = @_;

    my $filename = "$feed-$start_date-$end_date.png";
    $filename =~ s|/|_|g;

    open my $fh, '>', $filename or die $!;
    print $fh $gd->png;
    close $fh;

    print "Done. Image written to $filename\n";
}

# This function draws the date thingies on the x axis.
sub draw_date {
    my ($gd, $item, $text_color, $x) = @_;
    my @mons = qw/Nul Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/;

    my ($y, $m, $d) = $item->{date} =~ /(\d{4})-(\d{2})-(\d{2})/;
    $m = $mons[$m];
    my $text1 = sprintf("%s-", $m);
    my $text2 = sprintf("%s-%d", $m, $y);

    my @bounds = GD::Image->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, 0, 0, $text1);
    my $offset = $bounds[4];

    $gd->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, $x-$offset+2, 114, $text2);
}

# This function draws the feed usage numbers on the y axis.
sub draw_count {
    my ($gd, $count, $text_color, $y) = @_;

    my $text = int $count;

    my @bounds = GD::Image->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, 0, 0, $text);
    my $width = $bounds[4] - $bounds[6];

    $gd->stringTTF($text_color, './DejaVuSans.ttf', 7, 0, 34-$width, $y+4, $text);
}

# This function creates the GD image and draws everything.
sub draw_feedburner_image {
    my ($feed, $start_date, $end_date, @data) = @_;

    print "Creating the awesome feedburner image.\n";

    my $gd = GD::Image->new(490, 120, 1);
    my $white  = $gd->colorAllocate(0xff, 0xff, 0xff);
    my $green  = $gd->colorAllocate(0x95, 0xcf, 0x9c);
    my $bg     = $gd->colorAllocate(0xf2, 0xf8, 0xfc);
    my $grid   = $gd->colorAllocate(0xcc, 0xce, 0xce);
    my $xy     = $gd->colorAllocate(0x80, 0x80, 0x80);
    my $alphagrid = $gd->colorAllocateAlpha(0xcc, 0xce, 0xce, 0x30);
    my $border = $gd->colorAllocate(0x68, 0x72, 0x79);
    my $text   = $gd->colorAllocate(0, 0 , 0);

    $gd->alphaBlending(1);
    $gd->filledRectangle(0, 0, 489, 119, $white);
    $gd->setAntiAliased($border);

    draw_outline($gd, $grid, $xy, $bg);

    my $t_height = 90;
    my $t_width = 441;
    my $max_circulation = max map { $_->{circulation} } @data;

    my $compress_factor = @data/$t_width;

    if ($compress_factor > 1) {
        @data = drop_data($t_width, @data);
    }
    elsif ($compress_factor < 1) {
        @data = dupe_data($t_width, @data);
    }

    my ($prev_x1, $prev_y1);
    my $x = 41;
    my %x_markers = (77 => 1, 150 => 1, 223 => 1, 296 => 1, 369 => 1, 442 => 1);
    for my $item (@data) {
        my $height = int($item->{circulation}/$max_circulation * $t_height);
        my ($x1, $y1, $x2, $y2) = ($x, 99, $x, 99-$height);
        $gd->line($x1, $y1, $x2, $y2, $green);
        if ($prev_x1) {
            $gd->line($prev_x1, $prev_y1, $x2, $y2, gdAntiAliased);
        }
        ($prev_x1, $prev_y1) = ($x1, $y2);

        if (exists $x_markers{$x}) {
            draw_date($gd, $item, $text, $x)
        }

        $x++;
    }

    draw_grid($gd, $xy, $alphagrid);
    draw_count($gd, 0,  $text, 100);
    draw_count($gd, $max_circulation * 74/90, $text, 26);
    draw_count($gd, $max_circulation * 37/90, $text, 63);
    save_image($gd, $feed, $start_date, $end_date);
}

# The main function, does everything.
sub main {
    check_dejavu_sans;

    my $feed = shift || usage();
    my $start_date = shift || find_start_date($feed);
    my $end_date = shift || today();

    unless ($start_date =~ /^\d{4}-\d{2}-\d{2}$/) {
        die "Invalid start date. Format: YYYY-MM-DD."
    }
    unless ($end_date =~ /^\d{4}-\d{2}-\d{2}$/) {
        die "Invalid end date. Format: YYYY-MM-DD."
    }

    print "Getting feed data for $feed from $start_date to $end_date\n";
    my @feed_data = get_feed_data($feed, $start_date, $end_date);

    draw_feedburner_image($feed, $start_date, $end_date, @feed_data);
}

main @ARGV;

Download: generate_feedburner_graph.perl

Let’s test run it.

$ generate_feedburner_graph.pl catonmat
Finding feed's start date...
Trying 2009-05-17 as start date...
Trying 2008-11-17 as start date...
Trying 2008-05-17 as start date...
Trying 2007-11-17 as start date...
Trying 2007-05-17 as start date...
Found 2007-07-15 as start date!
Getting feed data for catonmat from 2007-07-15 to 2009-11-17
Creating the awesome feedburner image.
Done. Image written to catonmat-2007-07-15-2009-11-17.png

Here is the result:

catonmat feed statistics from 2007-07-15 to 2009-11-17
catonmat.net feed statistics from 2007-07-15 to 2009-11-17.

This looks divine. I love it!!!

As I was writing this I had the coolest idea to make a set of tools for probloggers. I added this idea to my idea-list and will try to make it happen. This tool could be the first in problogger tool suite!

Download “generate_feedburner_graph.pl“:

Download: generate_feedburner_graph.perl
Downloaded: 415 times.
Download url: http://www.catonmat.net/download/generate_feedburner_graph.perl

And finally, help me reach 10,000 subscribers! If you haven’t yet subscribed, subscribe to my blog!

Comments (7) Comments | Email Post Email 'Feedburner Graphs Suck, or How to Generate Nice Graphs for Feedburner' to a friend | Print Post Print 'Feedburner Graphs Suck, or How to Generate Nice Graphs for Feedburner' | Permalink Permalink to 'Feedburner Graphs Suck, or How to Generate Nice Graphs for Feedburner' | Trackback Trackback to 'Feedburner Graphs Suck, or How to Generate Nice Graphs for Feedburner'
(Popularity: 8%) 5,260 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Hacker's Approach 07 Jan 2009 01:10 pm
1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 5 out of 5)
Loading ... Loading ...

Save Time by Speeding Up VideosThis is going to be a small, technical tutorial on how to save a lot of time by watching videos at higher playback rates.

I first read about this idea from my most favorite personal development blog at Steve Pavlina.com. In his post “Overclock Your Audio Learning” he says that he occasionally listens to audios at 4.1x. At this speed 4 hour video/audio can be listened in less 1 hour!

I personally found it impossible to understand anything at 4x speed. My optimal listening speed is 1.65x - 2.1x.

To speed up the videos you will first need to download and install AviSynth. AviSynth is kind of a video programming language with which you can do all kinds of manipulations to videos programmatically. If you are on Windows, then during the installation make sure to associate .avs file extension with Windows Media Player and not Notepad.

Next, create this AviSynth script, and place it in the same directory as your video. Name the script as “speedup.avs” or something similar. Make sure the extension is “.avs” if you are on Windows!

file = "file_name_of_video.avi"
speedup = 1.65
pitch = 100

DirectShowSource(file)

audio_rate = last.audiorate
video_rate = last.framerate

AssumeSampleRate(int(audio_rate*speedup))
AssumeFPS(video_rate*speedup)
TimeStretch(pitch = pitch)

There are three variables that you can change in this simple script. The first is “file“. It should be the filename of the video you are about to watch. The next is “speedup“. It’s the new playback rate, you may set it to any value you wish. For example, if you set it to 2.0, then the video will play twice as fast as it normally would. And the last parameter to change is the “pitch“. You may change it to something lower than 100 when the video plays at higher speeds to make the speaker sound lower. I usually keep “speedup” at 1.65 and “pitch” at 75.

Once you have made your own configuration, just double click the .avs on Windows to play it at the new playback speed, or play it through mplayer on Linux!

Update: My blog readers bobb and crb suggested two new techniques on how to watch videos faster. Bobb suggested just to use mplayer. Turns out it already has an option to play videos faster!

mplayer -speed 1.65 file.avi

# use keys [ ], and { } to control the playback speed
# use backspace to reset video speed to normal.

Crb suggested to use MySpeed™ Plug-In for YouTube to speed up video on YouTube in real time.

Here are a few examples that I crafted at speeds 1x, 1.4x, 1.65x, 2x, 2.25x, 2.5x, 2.75x and 3x. They are from Alan Kay’s keynote speech “The computer revolution hasn’t happend yet” at OOPSLA Conference in 1997.

After a few minutes of listening at higher speeds, try going back to listen at 1x. It will seem incredibly slow because your mind will have adapted to the faster input rate.

Alan Kay’s talk at a normal speed. Total time: 5 mins.

Direct URL: http://www.youtube.com/watch?v=MaRODiPR-ZU
Alan Kay’s talk at 1.4x speed. Total time: 3:34 mins.

Direct URL: http://www.youtube.com/watch?v=Oc_3yk22gn8
Alan Kay’s talk at 1.65x speed. Total time: 3 mins.

Direct URL: http://www.youtube.com/watch?v=lrAq86Qk_rU
Alan Kay’s talk at 2x speed. Total time: 2:30 mins.

Direct URL: http://www.youtube.com/watch?v=WlwEq4HXB3Y
Alan Kay’s talk at 2.25x speed. Total time: 2:13 mins.

Direct URL: http://www.youtube.com/watch?v=b0K5JMjtP3w
Alan Kay’s talk at 2.5x speed. Total time: 2 mins.

Direct URL: http://www.youtube.com/watch?v=YBt01TFNIA0
Alan Kay’s talk at 2.75x speed. Total time: 1:49 mins.

Direct URL: http://www.youtube.com/watch?v=BNCqZp0XOVw
Alan Kay’s talk at 3x speed. Total time: 1:40 mins.

Direct URL: http://www.youtube.com/watch?v=8DsuyPNOqGw

Do you have any other techniques to speed up videos? I am also curious at what speeds do you feel the most comfortable watching the videos?

It would also be cool to create a hack that modifies youtube and google video players to make them play videos faster natively.

Ps. Did you know that I was a big fan of video lectures? I have been collecting math, physics, computer science video lectures for almost 3 years now and posting them to my other Free Science Online Blog.

Comments (36) Comments | Email Post Email 'How to Save Time by Watching Videos at Higher Playback Speeds' to a friend | Print Post Print 'How to Save Time by Watching Videos at Higher Playback Speeds' | Permalink Permalink to 'How to Save Time by Watching Videos at Higher Playback Speeds' | Trackback Trackback to 'How to Save Time by Watching Videos at Higher Playback Speeds'
(Popularity: 16%) 16,435 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Hacker's Approach 05 Aug 2008 10:25 pm
1 Star2 Stars3 Stars4 Stars5 Stars (8 votes, average: 5 out of 5)
Loading ... Loading ...

traffic shaping with iptablesA few years ago I worked as a Linux system administrator at a small (few hundred users) Internet service provider. Among all the regular system administrator duties, I also had the privilege to write various software and tools for Linux. One of my tasks was to write a tool to record how much traffic each of the clients was using.

The network for this provider was laid out in a very simple way. The gateway to the Internet was a single Linux box, which was a router, a firewall and performed traffic shaping. Now it had to be extended to do traffic accounting as well.

isp network diagram
Simplified network diagram, all that matters is that the gateway is a Linux box.

At that time I had already mastered IPTables and I had noticed that when listing the existing rules, iptables would display packet count and total byte count for each rule. I thought, yeah, why not use this for accounting? So I did. I created an empty rule (which gets passed through the firewall) for each IP address of users and a script which extracted the byte count.

Here is a detailed explanation of how I did it exactly.

First, let’s see what iptables shows us when we have just booted up.

# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

At this moment we have no rules added. That’s alright, let’s just get familiar with the output we will be interested in when we have some rules. Notice the ‘pkts‘ and ‘bytes‘ columns. The ‘pkts’ stands for packets and displays the total number of packets matched by the rule. The ‘bytes’ stands for total number of bytes matched by the rule. Notice also three so called “chains” - INPUT, FORWARD and OUTPUT. The INPUT chain is for packets destinated to the Linux box itself, OUTPUT chain is for packets leaving the Linux box (generated by programs running on the Linux box) and FORWARD is for packets passing through the box.

You might also be interested in the command line arguments that I used:

  • -L lists all the rules.
  • -n does not resolve the ip addresses.
  • -v lists the packet and byte count.
  • -x displays the byte count (otherwise it gets abbreviated to 200K, 3M, etc).

A more serious firewall might have the FORWARD chain filled up with various entries already. Not to mess with them, let’s create a new traffic accounting chain called TRAFFIC_ACCT:

# iptables -N TRAFFIC_ACCT
# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT (0 references)
    pkts      bytes target     prot opt in     out     source               destination

Now let’s redirect all the traffic going through the machine to match the rules in the TRAFFIC_ACCT chain:

# iptables -I FORWARD -j TRAFFIC_ACCT
# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 TRAFFIC_ACCT  all  —  *      *       0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT (0 references)
    pkts      bytes target     prot opt in     out     source               destination

A side note: if you had a Linux desktop computer, then you could insert the same rule in the INPUT chain (iptables -I INPUT -j TRAFFIC_ACCT) as all the packets would be destinated for your computer.

IPTables command argument -L can actually take the name of a chain to list the rules from. From now on we will only be interested in rules of TRAFFIC_ACCT chain:

# iptables -L -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination

Now to illustrate the main idea, we can play with the rules. For example, let’s do the breakdown of traffic by tcp, udp and icmp protocols. To do that we insert three rules in the TRAFFIC_ACCT chain - one to match tcp protocol, one to match udp protocol and the last one to match icmp protocol.

# iptables -A TRAFFIC_ACCT -p tcp
# iptables -A TRAFFIC_ACCT -p udp
# iptables -A TRAFFIC_ACCT -p icmp

After some time has passed, let’s look at what we have:

# iptables -L TRAFFIC_ACCT -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
    4356  2151124            tcp  --  *      *       0.0.0.0/0            0.0.0.0/0
     119    15964            udp  --  *      *       0.0.0.0/0            0.0.0.0/0
       3      168            icmp --  *      *       0.0.0.0/0            0.0.0.0/0

We see that 4356 tcp packets totaling 2151124 bytes (2 megabytes) have passed through the firewall, 119 udp packets and 3 icmp packets!

You can zero out the counters with -Z iptables command:

# iptables -Z TRAFFIC_ACCT
# iptables -L TRAFFIC_ACCT -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       0        0            tcp  —  *      *       0.0.0.0/0            0.0.0.0/0
       0        0            udp  —  *      *       0.0.0.0/0            0.0.0.0/0
       0        0            icmp —  *      *       0.0.0.0/0            0.0.0.0/0

You can remove all the rules from TRAFFIC_ACCT chain with -F iptables command:

# iptables -F TRAFFIC_ACCT
# iptables -L TRAFFIC_ACCT -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination

Another fun example you can do is count how many actual connections have been made:

# iptables -A TRAFFIC_ACCT -p tcp –syn
# iptables -L -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       5      276            tcp  —  *      *       0.0.0.0/0            0.0.0.0/0           tcp flags:0×16/0×02

Shows us that 5 tcp packets which start the connections have been sent. Pretty neat, isn’t it?

What I did when I was working as a sysadmin, was add user IP addresses to the TRAFFIC_ACCT chain. Then, I periodically listed and recorded traffic, and zero’ed it out.

You can even create two chains TRAFFIC_ACCT_IN and TRAFFIC_ACCT_OUT to match incoming and outgoing traffic.

# iptables -N TRAFFIC_ACCT_IN
# iptables -N TRAFFIC_ACCT_OUT
# iptables -I FORWARD -i eth0 -j TRAFFIC_ACCT_IN
# iptables -I FORWARD -o eth0 -j TRAFFIC_ACCT_OUT
# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 TRAFFIC_ACCT_OUT  all  --  *      eth0    0.0.0.0/0            0.0.0.0/0
       0        0 TRAFFIC_ACCT_IN  all  --  eth0   *       0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT_IN (1 references)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT_OUT (1 references)
    pkts      bytes target     prot opt in     out     source               destination

For example, to record incoming and outgoing traffic usage of IP addresses 192.168.1.2 and 192.168.1.3 you would do:

# iptables -A TRAFFIC_ACCT_IN --dst 192.168.1.2
# iptables -A TRAFFIC_ACCT_IN --dst 192.168.1.3
# iptables -A TRAFFIC_ACCT_OUT --src 192.168.1.2
# iptables -A TRAFFIC_ACCT_OUT --src 192.168.1.2

And to list the rules:

# iptables -L TRAFFIC_ACCT_IN -n -v -x
Chain TRAFFIC_ACCT_IN (1 references)
    pkts      bytes target     prot opt in     out     source               destination
     368   362120            all  --  *      *       0.0.0.0/0            192.168.1.2
      61     9186            all  --  *      *       0.0.0.0/0            192.168.1.3

# iptables -L TRAFFIC_ACCT_OUT -n -v -x
Chain TRAFFIC_ACCT_OUT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
     373    22687            all  --  *      *       192.168.1.2          0.0.0.0/0
     101    44711            all  --  *      *       192.168.1.3          0.0.0.0/0

That concludes it. You see that it is trivial to do accurate traffic accounting on a Linux machine. In a future post, I might publish a program which displays the traffic in a nice, visual, manner.

At the moment you can output the traffic in a nice manner with this combination of iptables and awk commands:

# iptables -L TRAFFIC_ACCT_IN -n -v -x | awk '$1 ~ /^[0-9]+$/ { printf "IP: %s, %d bytes\n", $8, $2 }'
IP: 192.168.1.2, 1437631 bytes
IP: 192.168.1.3, 449554 bytes

# iptables -L TRAFFIC_ACCT_OUT -n -v -x | awk '$1 ~ /^[0-9]+$/ { printf "IP: %s, %d bytes\n", $7, $2 }'
IP: 192.168.1.2, 88202 bytes
IP: 192.168.1.3, 244848 bytes

I first learned IPTables from this tutorial. It’s probably the best tutorial one can find on the subject.

If you did not understand any parts of the article, please let me know in the comments. I will update the post and explain those parts.

Comments (8) Comments | Email Post Email 'Traffic Accounting with Linux IPTables' to a friend | Print Post Print 'Traffic Accounting with Linux IPTables' | Permalink Permalink to 'Traffic Accounting with Linux IPTables' | Trackback Trackback to 'Traffic Accounting with Linux IPTables'
(Popularity: 11%) 15,265 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

Hacker's Approach 06 Jun 2008 08:30 pm
1 Star2 Stars3 Stars4 Stars5 Stars (17 votes, average: 4.18 out of 5)
Loading ... Loading ...

prime number problem of google treasure hunt 2008I just found out about Google’s Treasure Hunt 2008. They say that “it’s a puzzle contest designed to test yer problem-solving skills in computer science, networking, and low-level UNIX trivia.”

Apparently I have missed the first three puzzles (first, second and third) but I’ll give the fourth puzzle a shot.

The fourth problem is about prime numbers and it’s formulated as following:

Find the smallest number that can be expressed as
the sum of 7 consecutive prime numbers,
the sum of 17 consecutive prime numbers,
the sum of 41 consecutive prime numbers,
the sum of 541 consecutive prime numbers,
and is itself a prime number.

For example, 41 is the smallest prime number that can be expressed as
the sum of 3 consecutive primes (11 + 13 + 17 = 41) and
the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).

The Solution

I’ll write how I got the solution as I go, so I’ll mix the past and present tenses in this post. Sometimes I’ll write what I am going to do and sometimes I’ll write what I just did.

I have no desire to generate lists of prime numbers myself as it has been done infinitely many times already. I’ll just use a publicly available list of prime numbers! Here is a list of first fifty million primes.

I’ll use my Unix-fu to find the solution.

First I noticed that the primes are zipped and split into chunks of million primes per file. The file names are like “primes1.zip”, … “primes50.zip”.

A quick loop from 1 to 50 and wget gets all these files to my hard drive:

$ for i in $(seq 50); do wget "http://primes.utm.edu/lists/small/millions/primes$i.zip"; done

Next, I unzip all these files, and remove those zips to save space:

$ for i in $(seq 50); do unzip "primes$i.zip" && rm -f "primes$i.zip"; done

After doing that and looking at what I got, I realized that they were in some strange format, 8 primes per line, space padded and with some text on the first two lines. Here is an example how the first five lines look in primes1.txt file:

                 The First 1,000,000 Primes (from primes.utm.edu)

         2         3         5         7        11        13        17        19
        23        29        31        37        41        43        47        53
        59        61        67        71        73        79        83        89

I want all my primes to be in one file and one prime per line so I can extract N-th prime by looking at that line.
I used the following command to merge all the files into a single file:

for i in $(seq 50); do (awk 'BEGIN { OFS="\n" } NR > 2 {print $1,$2,$3,$4,$5,$6,$7,$8}' primes$i.txt >> primes.txt) && rm -f primes$i.txt; done

A quick verification that I did not lose any primes:

$ wc -l primes.txt
50000000 primes.txt

Now I’ll create four files which contain sums of 7, 17, 41 and 541 consecutive primes, not exceeding the biggest prime in primes.txt file. I did that with the following AWK one-liner:

$ last=$(tail -1 primes.txt)
$ for N in 7 17 41 541
  do
   awk 'BEGIN { prev[0] = 0 } NR < '$N' {prev[NR] = $1; sum += $1 } NR >= '$N' { psum += prev[NR-'$N']; delete prev[NR-'$N']; prev[NR] = $1; sum += $1; if (sum - psum > '$last') { exit } printf "%d\n", sum - psum }' primes.txt > primes$N.txt
  done

The command created primes7.txt, primes17.txt, primes 41.txt and primes541.txt files. These files contain sums of prime numbers and just some of them are primes.

The solution, if it exists in the given data set, is the intersect of all these files. If there are multiple items in the intersect, the smallest should be chosen and checked if it really was a prime.

$ sort -nm primes541.txt primes41.txt | uniq -d | sort -nm primes17.txt - | uniq -d | sort -nm primes7.txt - | uniq -d
7830239
$ grep -m1 7830239 primes.txt
7830239

We have found the solution! It’s 7830239!

I submitted the answer and after a few minutes it was confirmed to be correct! Awesome!

Your question: [7, 17, 41, 541]
Your answer: 7830239
Time received: 2008-06-06 23:33:26.268414 UTC

Correct answer: 7830239
Your answer was: Correct

Now leave a comment and tell me how you solved this problem!

Comments (40) Comments | Email Post Email 'Solving Google Treasure Hunt Puzzle 4: Prime Numbers' to a friend | Print Post Print 'Solving Google Treasure Hunt Puzzle 4: Prime Numbers' | Permalink Permalink to 'Solving Google Treasure Hunt Puzzle 4: Prime Numbers' | Trackback Trackback to 'Solving Google Treasure Hunt Puzzle 4: Prime Numbers'
(Popularity: 18%) 23,798 Views

Did you like this page? Subscribe to my posts!

I am now on Twitter! Meet me on Twitter here (my nick is pkrumins.)
Or on Google Buzz and Facebook.

ProgrammingHacker's Approach 02 Sep 2007 09:00 pm
1 Star2 Stars3 Stars4 Stars5 Stars (15 votes, average: 4.47 out of 5)
Loading ... Loading ...

digpicz: diggs missing picture sectionRemember, I launched Reddit Media: intelligent fun online last week (read how it was made)?

I have been getting emails that it would be a wise idea to launch a Digg media website. Yeah! Why not?
Since Digg already has a video section there is not much point in duplicating it. The new site could just be digg for pictures.

Update 2008.07.30: I received this PDF, which said that I was abusing Digg’s trademarks! So I closed the site. You may visit http://digg.picurls.com to see how it looked like. I also zipped up the contents of the site and you may download the whole site digpicz-2008-07-30.zip!

I don’t want to use word ‘digg’ in the domain name because people warned me that the trademark owner could take the domain away from me. I’ll just go with a single letter g as “dig” and pictures, to make it shorter picz. So the domain I bought is digpicz.com.

Update: The site has been launched, visit digpicz.com: digg’s missing picture section. Time taken to launch the site: ~7 hours.

visit-digpicz-now

Reusing the Reddit Media Generator Suite

I released full source code of the reddit media website (reddit media website generator suite (.zip)). It can now be totally reused with minor modifications to suit the digg for pictures website.

Only the following modifications need to be made:

  • A new extractor (data miner) has to be written which goes through all the stories on digg and finds ones with pic/pics/images/etc. words in titles or descriptions (In reddit generator suite it was the reddit_extractor.pl program (in /scripts directory in .zip file)). Digg, as opposite to Reddit, provides a public API to access its stories. I will use this API to go through all the stories and create the initial database of pictures and further monitor digg’s front page. This program will be called digg_extractor.pl
  • SQLite database structure has to be changed to include a link to Digg’s story, story’s description, a link to the user’s avatar.
  • The generate_feed function in static HTML page generator (page_gen.pl) has to be updated to create a digpicz rss feed.
  • HTML template files in /templates directory (in the .zip file) need to be updated to give the site more digg-like look.

That’s it! A few hours of work and we have a digg for pictures website running!

Digpicz Technical Design

Let’s create the data miner first. As I mentioned it’s called digg_extractor.pl, and it is a Perl script which uses Digg public API.

Continue reading 'Designing Digg Picture Website in a Matter of Hours' Continue reading ‘Designing Digg Picture Website in a Matter of Hours’

Comments (70) Comments | Email Post Email 'Designing Digg Picture Website in a Matter of Hours' to a friend | Print Post Print 'Designing Digg Picture Website in a Matter of Hours' | Permalink Permalink to 'Designing Digg Picture Website in a Matter of Hours' | Trackback Trackback to 'Designing Digg Picture Website in a Matter of Hours'
(Popularity: 27%) 44,096 Views

Did you like this page? Subscribe to my posts!

Page 1 of 212»