traffic shaping with iptablesA few years ago I worked as a Linux system administrator at a small (few hundred users) Internet service provider. Among all the regular system administrator duties, I also had the privilege to write various software and tools for Linux. One of my tasks was to write a tool to record how much traffic each of the clients was using.

The network for this provider was laid out in a very simple way. The gateway to the Internet was a single Linux box, which was a router, a firewall and performed traffic shaping. Now it had to be extended to do traffic accounting as well.

isp network diagram
Simplified network diagram, all that matters is that the gateway is a Linux box.

At that time I had already mastered IPTables and I had noticed that when listing the existing rules, iptables would display packet count and total byte count for each rule. I thought, yeah, why not use this for accounting? So I did. I created an empty rule (which gets passed through the firewall) for each IP address of users and a script which extracted the byte count.

Here is a detailed explanation of how I did it exactly.

First, let's see what iptables shows us when we have just booted up.

# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

At this moment we have no rules added. That's alright, let's just get familiar with the output we will be interested in when we have some rules. Notice the 'pkts' and 'bytes' columns. The 'pkts' stands for packets and displays the total number of packets matched by the rule. The 'bytes' stands for total number of bytes matched by the rule. Notice also three so called "chains" - INPUT, FORWARD and OUTPUT. The INPUT chain is for packets destinated to the Linux box itself, OUTPUT chain is for packets leaving the Linux box (generated by programs running on the Linux box) and FORWARD is for packets passing through the box.

You might also be interested in the command line arguments that I used:

  • -L lists all the rules.
  • -n does not resolve the ip addresses.
  • -v lists the packet and byte count.
  • -x displays the byte count (otherwise it gets abbreviated to 200K, 3M, etc).

A more serious firewall might have the FORWARD chain filled up with various entries already. Not to mess with them, let's create a new traffic accounting chain called TRAFFIC_ACCT:

# iptables -N TRAFFIC_ACCT
# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT (0 references)
    pkts      bytes target     prot opt in     out     source               destination

Now let's redirect all the traffic going through the machine to match the rules in the TRAFFIC_ACCT chain:

# iptables -I FORWARD -j TRAFFIC_ACCT
# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 TRAFFIC_ACCT  all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT (0 references)
    pkts      bytes target     prot opt in     out     source               destination

A side note: if you had a Linux desktop computer, then you could insert the same rule in the INPUT chain (iptables -I INPUT -j TRAFFIC_ACCT) as all the packets would be destinated for your computer.

IPTables command argument -L can actually take the name of a chain to list the rules from. From now on we will only be interested in rules of TRAFFIC_ACCT chain:

# iptables -L -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination

Now to illustrate the main idea, we can play with the rules. For example, let's do the breakdown of traffic by tcp, udp and icmp protocols. To do that we insert three rules in the TRAFFIC_ACCT chain - one to match tcp protocol, one to match udp protocol and the last one to match icmp protocol.

# iptables -A TRAFFIC_ACCT -p tcp
# iptables -A TRAFFIC_ACCT -p udp
# iptables -A TRAFFIC_ACCT -p icmp

After some time has passed, let's look at what we have:

# iptables -L TRAFFIC_ACCT -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
    4356  2151124            tcp  --  *      *       0.0.0.0/0            0.0.0.0/0
     119    15964            udp  --  *      *       0.0.0.0/0            0.0.0.0/0
       3      168            icmp --  *      *       0.0.0.0/0            0.0.0.0/0

We see that 4356 tcp packets totaling 2151124 bytes (2 megabytes) have passed through the firewall, 119 udp packets and 3 icmp packets!

You can zero out the counters with -Z iptables command:

# iptables -Z TRAFFIC_ACCT
# iptables -L TRAFFIC_ACCT -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       0        0            tcp  --  *      *       0.0.0.0/0            0.0.0.0/0
       0        0            udp  --  *      *       0.0.0.0/0            0.0.0.0/0
       0        0            icmp --  *      *       0.0.0.0/0            0.0.0.0/0

You can remove all the rules from TRAFFIC_ACCT chain with -F iptables command:

# iptables -F TRAFFIC_ACCT
# iptables -L TRAFFIC_ACCT -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination

Another fun example you can do is count how many actual connections have been made:

# iptables -A TRAFFIC_ACCT -p tcp --syn
# iptables -L -n -v -x
Chain TRAFFIC_ACCT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       5      276            tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp flags:0x16/0x02

Shows us that 5 tcp packets which start the connections have been sent. Pretty neat, isn't it?

What I did when I was working as a sysadmin, was add user IP addresses to the TRAFFIC_ACCT chain. Then, I periodically listed and recorded traffic, and zero'ed it out.

You can even create two chains TRAFFIC_ACCT_IN and TRAFFIC_ACCT_OUT to match incoming and outgoing traffic.

# iptables -N TRAFFIC_ACCT_IN
# iptables -N TRAFFIC_ACCT_OUT
# iptables -I FORWARD -i eth0 -j TRAFFIC_ACCT_IN
# iptables -I FORWARD -o eth0 -j TRAFFIC_ACCT_OUT
# iptables -L -n -v -x
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 TRAFFIC_ACCT_OUT  all  --  *      eth0    0.0.0.0/0            0.0.0.0/0
       0        0 TRAFFIC_ACCT_IN  all  --  eth0   *       0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT_IN (1 references)
    pkts      bytes target     prot opt in     out     source               destination

Chain TRAFFIC_ACCT_OUT (1 references)
    pkts      bytes target     prot opt in     out     source               destination

For example, to record incoming and outgoing traffic usage of IP addresses 192.168.1.2 and 192.168.1.3 you would do:

# iptables -A TRAFFIC_ACCT_IN --dst 192.168.1.2
# iptables -A TRAFFIC_ACCT_IN --dst 192.168.1.3
# iptables -A TRAFFIC_ACCT_OUT --src 192.168.1.2
# iptables -A TRAFFIC_ACCT_OUT --src 192.168.1.2

And to list the rules:

# iptables -L TRAFFIC_ACCT_IN -n -v -x
Chain TRAFFIC_ACCT_IN (1 references)
    pkts      bytes target     prot opt in     out     source               destination
     368   362120            all  --  *      *       0.0.0.0/0            192.168.1.2
      61     9186            all  --  *      *       0.0.0.0/0            192.168.1.3

# iptables -L TRAFFIC_ACCT_OUT -n -v -x
Chain TRAFFIC_ACCT_OUT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
     373    22687            all  --  *      *       192.168.1.2          0.0.0.0/0
     101    44711            all  --  *      *       192.168.1.3          0.0.0.0/0

That concludes it. You see that it is trivial to do accurate traffic accounting on a Linux machine. In a future post, I might publish a program which displays the traffic in a nice, visual, manner.

At the moment you can output the traffic in a nice manner with this combination of iptables and awk commands:

# iptables -L TRAFFIC_ACCT_IN -n -v -x | awk '$1 ~ /^[0-9]+$/ { printf "IP: %s, %d bytes\n", $8, $2 }'
IP: 192.168.1.2, 1437631 bytes
IP: 192.168.1.3, 449554 bytes

# iptables -L TRAFFIC_ACCT_OUT -n -v -x | awk '$1 ~ /^[0-9]+$/ { printf "IP: %s, %d bytes\n", $7, $2 }'
IP: 192.168.1.2, 88202 bytes
IP: 192.168.1.3, 244848 bytes

I first learned IPTables from this tutorial. It's probably the best tutorial one can find on the subject.

If you did not understand any parts of the article, please let me know in the comments. I will update the post and explain those parts.

python yesterday, today, tomorrowThis is the third post in an article series about Python video lectures. The previous two posts covered learning basics of Python and learning Python design patterns.

This video lecture is given by Google's "Über Tech Lead" Alex Martelli. In this video he talks about the most important language changes in each of the Python versions 2.2, 2.3, 2.4 and 2.5.

I am actually using an older version of Python, version 2.3.4. This lecture gave me a good insight of what new features to expect when I upgrade to a newer version of Python.

Here it is:

Interesting information from lecture:

  • [01:30] There are many versions of Python - Jython, IronPython, pypy and CPython.
  • [03:02] Python 2.2 was a backwards-compatible revolution. It introduced new-style objects, descriptors, iterators and generators, nested scopes, a lot of new modules in standard library.
  • [04:12] New rule for introducing extra features: 2.N.* has not extra features with respect to 2.N.
  • [04:32] Python 2.2 highlights: metaclasses, closures, generators and iterators.
  • [05:35] Python 2.3 was a stable version of Python with no changes to the language.
  • [06:05] Python 2.3 had a lot of optimizations, tweaks and fixes, such as import-from-zip, Karatsuba multiplication algorithm, and new stdlib modules - bz2, csv, datetime, heapq, itertools, logging, optparse, textwrap, timeit, and many others.
  • [08:50] Python 2.3 highlights: zip imports, sum builtin, enumerate builtin, extended slices, universal newlines.
  • [09:50] Python 2.4 added two new language features - generator expressions and decorators. New builtins were added - sorted, reversed and set, frozenset. New modules - collections, cookielib, decimal, subprocess.
  • [13:00] Example of generator expressions and decorators
  • [13:37] Example of sorted() and reversed() builtins.
  • [16:40] Python 2.5 was also evolution of language. It came with full support for RAII (with statement), introduced two new builtins - any and all, unified exceptions and added a ternary operator. New modules - ctypes, xml.etree, functools, hashlib, sqlite3, wsgiref, and others.
  • [18:40] Python 2.5 optimizations.
  • [23:25] RAII - Resource Allocation is Initialization.
  • [25:30] Examples of RAII.
  • [31:05] Python RAII is better than C++'s. Python's RAII can distinguish exception exits from normal ones.
  • [33:29] Example of writing your own context manager.
  • [36:30] Example of writing a RAII ready type with contextlib.
  • [38:05] Following Python's Zen, "Flat is better than nested", use contextlib.nested for multiple resources.
  • [40:40] Generator enhancements - yield can be inside a try clause, yield is now an expression (almost co-routines!).
  • [44:50] Python 2.5 absolute/relative imports.
  • [47:00] Joke - "If you exceed 200 dots when using relative imports, you have a serious psychological problem".
  • [47:45] Python 2.5 try/except/else/finally.
  • [48:55] Python 2.5 if/else ternary operator.
  • [49:35] Python 2.5 exceptions are new style.
  • [51:15] Python 2.5 any and all builtins.
  • [54:00] collections.defaultdict subclasses dict and overrides __missing__.
  • [56:55] ctypes is probably the most dangerous addition to Python. One mistake and you crash.
  • [01:01:30] hashlib replaces md5 and sha modules, and adds sha-(224|256|384|512). Uses OpenSSL as accelerator (if available).
  • [01:02:29] Lecture got cut here but the presentation still had two slides on sqlite3 and wsgiref!

Here is the timeline of Python versions. I hope Alex doesn't mind that I took it from his presentation. :)

python timeline of versions 2.2, 2.3 and 2.5

If you don't know what new-style objects are about, see these two tutorials:

Have fun writing better code in Python!

This article is part of the article series "Musical Geek Friday."
<- previous article next article ->

every operating system sucksIt's the Musical Geek Friday again! This week a song about how Every OS Sucks!

The song is written and performed a Canadian comedy group called Three Dead Trolls in a Baggie. The Trolls currently consist of comedians Wes Borg, Joe Bird and Paul Mather.

The cast of the Trolls has changed over the years. At one point, a woman named Kathleen was in the group. The members of the group are in their forties about now.

I could not find more information about this band, so I ask you, my readers, to share anything you know about them in the comments of this post!

The song is about the history of computers. It says that before the days there were operating systems, the computers worked nicely and did not suck, but now, thanks to all the various operating systems, they all suck!

Here it is! The Every OS Sucks song:

[audio:http://www.catonmat.net/download/three_dead_trolls_in_a_baggie-every_os_sucks.mp3]

Download this song: every os sucks.mp3 (musical geek friday #12)
Downloaded: 19964 times

Download lyrics: every os sucks lyrics (musical geek friday #12)
Downloaded: 2231 times

They also made a music video for this song, scroll to the bottom of this post for the video!

The lyrics of the song is quite lengthy:

You see, I come from a time in the nineteen-hundred-and-seventies when
computers were used for two things - to either go to the moon, or play
Pong... nothing in between. Y'see, you didn't need a fancy operating
system to play Pong, and the men who went to the moon -- God Bless 'em --
did it with no mouse, and a plain text-only black-and-white screen,
and 32 kilobytes of RAM.

But then 'round 'bout the late 70's, home computers started to do a
little more than play Pong... very little more. Like computers started
to play non-Pong-like games, and balance checkbooks, and why... you
could play Zaxxon on your Apple II, or... write a book! All with a
computer that had 32 kilobytes of RAM! It was good enough to go to
the moon, it was good enough for you.

It was a golden time. A time before Windows, a time before mouses, a
time before the internet and bloatware, and a time...
before every OS sucked.

*sigh*

[singing]

Well, way back in the olden times,
my computer worked for me.
I'd laugh and play, all night and day,
on Zork I, II and III.

The Amiga, VIC-20 and the Sinclair II,
The TRS 80 and the Apple II,
they did what they were supposed to do,
wasn't much... but it was enough.

But then Xerox made a prototype,
Steve Jobs came on the scene,
read "Of Mice and Menus," Windows, Icons
a trash, and a bitmap screen.

Well Stevie said to Xerox,
"Boys, turn your heads and cough."
And when no-one was looking,
he ripped their interfaces off.

Stole every feature that he had seen,
put it in a cute box with a tiny little screen,
Mac OS 1 ran that machine,
only cost five thousand bucks.

But it was slow, it was buggy,
so they wrote it again,
And now they're up to OS 10,
they'll charge you for the Beta, then charge you again,
but the Mac OS still sucks.

Every OS wastes your time,
from the desktop to the lap,
Everything since Apple Dos,
Just a bunch of crap.

From Microsoft, to Macintosh,
to Lin-- line-- lin-- lie... nux,
Every computer crashes,
'cause every OS sucks.

Well then Microsoft jumped in the game,
copied Apple's interface, with an OS named,
"Windows 3.1" - it was twice as lame,
but the stock price rose and rose.

Then Windows 95, then 98,
man solitaire never ran so great,
and every single version came out late,
but I guess that's the way it goes.

But that bloatware'll crash and delete your work,
NT, ME, man, none of 'em work.
Bill Gates may be richer than Captain Kirk,
but the Windows OS blows!
And sucks!
At the same time!

I'd trade it in, yeah right... for what?
It's top of the line from the Compuhut.
The fridge, stove and toaster, never crash on me,
I should be able to get online, without a PHD.

My phone doesn't take a week to boot it,
my TV doesn't crash when I mute it,
I miss ASCII text, and my floppy drive,
I wish VIC-20 was still alive...

But it ain't the hardware, man.

It's just that every OS sucks... and blows.

Now there's lih-nux or lie-nux,
I don't know how you say it,
or how you install it, or use it, or play it,
or where you download it, or what programs run,
but lih-nux, or lie-nux, don't look like much fun.

However you say it, it's getting great press,
though how it survives is anyone's guess,
If you ask me, it's a great big mess,
for elitist, nerdy shmucks.

"It's free!" they say, if you can get it to run,
the Geeks say, "Hey, that's half the fun!"
Yeah, but I got a girlfriend, and things to get done,
the Linux OS SUCKS.
(I'm sorry to say it, but it does.)

Every OS wastes your time,
from the desktop to the lap,
Everything since the abacus,
Just a bunch of crap.

From Microsoft, to Macintosh,
to lin-- line-- lin-- lie... nux.
Every computer crashes,
'cause every OS sucks.

Every computer crashes... 'cause every OS sucks!

Here is Three Dead Trolls in a Baggie performing the song:

Download "Every OS Sucks" Song

Download this song: every os sucks.mp3 (musical geek friday #12)
Downloaded: 19964 times

Download lyrics: every os sucks lyrics (musical geek friday #12)
Downloaded: 2231 times

Click to listen:
[audio:http://www.catonmat.net/download/three_dead_trolls_in_a_baggie-every_os_sucks.mp3]

Have fun and until next geeky Friday! :)

mysql performance tuningIn this post I'll cover a lecture on MySQL performance tuning.

This lecture is given by Jay Pipes. Jay works at MySQL and has written a book on MySQL. It's called Pro MySQL and it covers intermediate and advanced features of the database. He also has an interesting blog, which I am long subscribed to - Jay Pipes blog.

In this lecture Mr. Pipes talks about core concepts of profiling and benchmarking, about the most common sources of performance problems, about indexing, schema, coding guidelines, and a little about server parameter tuning.

Here is his talk at Google:

The most interesting performance tuning tips from the video:

  • [02:20] Don't benchmark without a goal. Have a goal like "improve performance by 20%". Otherwise you'll waste a lot of time tuning milliseconds out of your application.
  • [02:50] Change just one thing at a time and re-run the benchmarks.
  • [03:40] Disable the query cache by setting the cache size to 0 when running MySQL benchmarks.
  • [05:22] The best tool for profiling MySQL queries is the EXPLAIN command. Understand it!
  • [06:40] Log slow queries and use mysqldumpshow to parse the log. It also has an option (--log-queries-not-using-indexes) of logging any query that does not use an index on a table.
  • [07:40] Jeremy Zawodny wrote the mytop utility for monitoring the threads and overall performance of MySQL servers.
  • [08:55 && 11:30] Repeated queries on an un-indexed field will kill your application faster than anything else.
  • [09:30] Don't de-normalize just because you think it will be faster. Start with normalized database schemes.
  • [10:15] Server parameter tweaking is not a catch-all. Tuning server parameters can help but it's very specific to certain situations.
  • [12:05] If you use MyISAM storage engine, exploit covering indexes.
  • [12:50] Ensure good selectivity on index fields.
  • [14:45] On multi-column indexes, pay attention to order of fields within the index definition.
  • [15:40] Be aware that as your database grows, the data in the indexed fields can gradate, deteriorating the usefulness of that index. As you data grows, always examine if the indexes you originally thought are still relevant to the data.
  • [17:02] Example of a common index problem, where an index is created on multiple fields.
  • [20:30] Use the smallest data types possible. Don't use bigint, when int will do. Or, don't use char(200), when a varchar or smaller char() would do. Using the right type will fit more records in memory or index key block, meaning fewer reads, resulting in faster performance.
  • [21:30] Consider horizontally spitting many-columned tables if they contain a lot of NULLs or rarely used columns.
  • [23:35] Get rid of surrogate keys (with example).
  • [24:05 && 33:20] Be an SQL programmer who thinks in sets, not procedural programming paradigms.
  • [24:35] InnoDB can't optimize SELECT COUNT(*) queries. Use counter tables! That's how to scale InnoDB.
  • [27:20] Always try to isolate index fields on one side of condition in a query (with example).
  • [28:20] Avoid using CURRENT_DATE() as it invalidates the cache.
  • [29:34] Example of using calculated fields when searching on top level domain. Idea - put a reversed TLD in the table.
  • [33:20] Avoid correlated subqueries. Think in sets not loops! Here is a great article on visualizing SQL joins.
  • [34:50] Example of using derived tables to avoid correlated subqueries.
  • [36:25] Be aware of global and per-thread server variables.
  • [37:50] Enable query cache if your application is doing a lot more reads than writes!
  • [28:50] MySQL uses MyISAM for internal data storage.
  • [40:00] MySQL loves ram!
  • [40:35] Q and A.

Jay recently published slides from his Join-Fu talk. Go get them!

I enjoyed this talk a lot. I am an intermediate MySQL user and I had not read his book. It was really informative!

If you want to learn more about MySQL, and don't yet have his book, why not get his book:

reddit hacker topOver the last fortnight I have released two top-like applications to follow Reddit and Hacker News from the console. I called these applications Reddit Top and Hacker Top. I received a few emails and comments asking me to explain how the applications were made. I'll explain it in this article.

A few months ago, while I was creating the Reddit River website, I noticed that Python's standard library included a curses module. Having worked with curses and Curses Development Kit in C, I decided to refresh my curses skills, this time in Python.

Coding the application started with creating two separate Python modules for retrieving stories from Hacker News and Reddit.

If you look at the source code of Hacker Top and Reddit Top programs, you'll notice two Python modules called "pyhackerstories.py" and "pyredditstories.py".

Both of these modules follow the same interface and provide function get_stories(). The core functionality of this function can be easily understood from this code fragment:

stories = []
for i in range(pages):
    content = _get_page(url)
    entries = _extract_stories(content)
    stories.extend(entries)
    url = _get_next_page(content)
    if not url:
        break

The function iterates over the given number of Reddit or Hacker News pages and creates a list of objects (of type Story) containing information about stories on each page.

This function takes two optional parameters 'pages' and 'new' (pyredditstories.py also takes an optional 'subreddit' parameter). These parameters control how many pages of (new) stories to scrape.

Here is an example of using pyhackerstories.py module to get the titles and scores of the first five most popular stories on Hacker News:

>>> from pyhackerstories import get_stories
>>> stories = get_stories()
>>> for story in stories[:5]:
...     print "%-3d - %s" % (story.score, story.title)
...
30  - Rutgers Graduate Student Finds New Prime-Generating Formula
59  - Xobni VP Engineering leaves for own startup
69  - The Pooled-Risk Company Management Company
14  - Jeff Bonforte, CEO of Xobni, explains why Gabor left
52  - Google's Wikipedia clone Knol launches.

Each Story object contains the following properties:

  • position - story position
  • id - identifier used by Hacker News or Reddit to identify the story
  • title - title of the story
  • url - web address of the story
  • user - username of the user who submitted the story
  • score - number of upvotes the story has received
  • human_time - time the story was submitted
  • unix_time - unix time the story was submitted
  • comments - number of comments the story has received

Here is another example of retrieving most active Reddit users on Programming Subreddit (based on 5 pages of stories):

>>> from pyredditstories import get_stories
>>> stories = get_stories(subreddit='programming', pages=5)
>>> userdict = {}
>>> for story in stories:
...     userdict[story.user] = userdict.setdefault(story.user, 0) + 1
>>> users = [(userdict[u], u) for u in userdict]
>>> users.sort()
>>> users.reverse()
>>> for user in users[:5]:
...  print "%s: %d" % (user[1], user[0])
...
<a href="http://www.reddit.com/user/gst/">gst</a>: 19
<a href="http://www.reddit.com/user/dons/">dons</a>: 6
<a href="http://www.reddit.com/user/llimllib/">llimllib</a>: 4
<a href="http://www.reddit.com/user/synthespian/">synthespian</a>: 3
<a href="http://www.reddit.com/user/gthank/">gthank</a>: 3

The pyhackerstories.py and pyredditstories.py Python modules can also be used as standalone applications.

Executing pyredditstories.py with '--help' command line argument tells that:

$ ./pyredditstories.py  --help
usage: pyredditstories.py [options]

options:
  -h, --help   show this help message and exit
  -sSUBREDDIT  Subreddit to retrieve stories from. Default:
               front_page.
  -pPAGES      How many pages of stories to output. Default: 1.
  -n           Retrieve new stories. Default: nope.

Here is an example of executing pyredditstories.py to get five most popular programming stories:

$ ./pyredditstories.py -s programming | grep '^title' | head -5

title: "Turns out my nephew is really good with computers, so we're going to give him the job!"
title: Sun Microsystems funding Haskell on multicore OpenSPARC!
title: I love parser combinators [Haskell]
title: 2D collision detection for SVG - demo of intersection routines with SVG/Javascript
title: Priorities: Solaris vs Linux

That's about enough information about these modules to create all kinds of wonderful things! Just play around a little! :)

Download pyhackerstories.py: pyhackerstories.py
Download pyredditstories.py: pyredditstories.py

Now I'll briefly explain how the curses user interface was made. I'll cover only the ideas and will not go deeply into code.

hacker top, detailed mode

I started with reading Curses Programming with Python howto. This howto explained how to do all the basic curses operations in Python - how to get into curses mode, how to create windows, print colorful text and how to handle user input.

The user interface had two requirements - it had to be responsive while the new stories were being retrieved from the sites, and it had to be independent from the website, meaning that it should be easy to display data from a different website with minor (or no) modifications to the interface code.

The first requirement was satisfied by creating a separate thread (see Retriever class) which periodically called get_stories() and passed the stories to the interface via a queue.

The second requirement was already satisfied when I created the pyredditstories.py and pyhackerstories.py modules. As I mentioned, these modules provided the same interface for retrieving stories and could be used almost interchangeably.

That's basically it! If you have any specific questions, feel free to ask them in the comments!

Ps. I am thinking of releasing Dzone Top and Digg Top, and then merge all these Top programs into a single social news console program. Are there any volunteers who would like to help me?

Download Hacker Top and Reddit Top Programs

Hacker Top

Download link: hacker top program (5454 downloads)

Reddit Top

Download link: reddit top program (3993 downloads)

These programs are released under GNU General Public License.