reddit hacker top

Over the last fortnight I have released two top-like applications to follow Reddit and Hacker News from the console. I called these applications Reddit Top and Hacker Top. I received a few emails and comments asking me to explain how the applications were made. I'll explain it in this article.

A few months ago, while I was creating the Reddit River website, I noticed that Python's standard library included a curses module. Having worked with curses and Curses Development Kit in C, I decided to refresh my curses skills, this time in Python.

Coding the application started with creating two separate Python modules for retrieving stories from Hacker News and Reddit.

If you look at the source code of Hacker Top and Reddit Top programs, you'll notice two Python modules called "pyhackerstories.py" and "pyredditstories.py".

Both of these modules follow the same interface and provide function get_stories(). The core functionality of this function can be easily understood from this code fragment:

stories = []
for i in range(pages):
    content = _get_page(url)
    entries = _extract_stories(content)
    stories.extend(entries)
    url = _get_next_page(content)
    if not url:
        break

The function iterates over the given number of Reddit or Hacker News pages and creates a list of objects (of type Story) containing information about stories on each page.

This function takes two optional parameters 'pages' and 'new' (pyredditstories.py also takes an optional 'subreddit' parameter). These parameters control how many pages of (new) stories to scrape.

Here is an example of using pyhackerstories.py module to get the titles and scores of the first five most popular stories on Hacker News:

>>> from pyhackerstories import get_stories
>>> stories = get_stories()
>>> for story in stories[:5]:
...     print "%-3d - %s" % (story.score, story.title)
...
30  - Rutgers Graduate Student Finds New Prime-Generating Formula
59  - Xobni VP Engineering leaves for own startup
69  - The Pooled-Risk Company Management Company
14  - Jeff Bonforte, CEO of Xobni, explains why Gabor left
52  - Google's Wikipedia clone Knol launches.

Each Story object contains the following properties:

  • position - story position
  • id - identifier used by Hacker News or Reddit to identify the story
  • title - title of the story
  • url - web address of the story
  • user - username of the user who submitted the story
  • score - number of upvotes the story has received
  • human_time - time the story was submitted
  • unix_time - unix time the story was submitted
  • comments - number of comments the story has received

Here is another example of retrieving most active Reddit users on Programming Subreddit (based on 5 pages of stories):

>>> from pyredditstories import get_stories
>>> stories = get_stories(subreddit='programming', pages=5)
>>> userdict = {}
>>> for story in stories:
...     userdict[story.user] = userdict.setdefault(story.user, 0) + 1
>>> users = [(userdict[u], u) for u in userdict]
>>> users.sort()
>>> users.reverse()
>>> for user in users[:5]:
...  print "%s: %d" % (user[1], user[0])
...
<a href="http://www.reddit.com/user/gst/">gst</a>: 19
<a href="http://www.reddit.com/user/dons/">dons</a>: 6
<a href="http://www.reddit.com/user/llimllib/">llimllib</a>: 4
<a href="http://www.reddit.com/user/synthespian/">synthespian</a>: 3
<a href="http://www.reddit.com/user/gthank/">gthank</a>: 3

The pyhackerstories.py and pyredditstories.py Python modules can also be used as standalone applications.

Executing pyredditstories.py with '--help' command line argument tells that:

$ ./pyredditstories.py  --help
usage: pyredditstories.py [options]

options:
  -h, --help   show this help message and exit
  -sSUBREDDIT  Subreddit to retrieve stories from. Default:
               front_page.
  -pPAGES      How many pages of stories to output. Default: 1.
  -n           Retrieve new stories. Default: nope.

Here is an example of executing pyredditstories.py to get five most popular programming stories:

$ ./pyredditstories.py -s programming | grep '^title' | head -5

title: "Turns out my nephew is really good with computers, so we're going to give him the job!"
title: Sun Microsystems funding Haskell on multicore OpenSPARC!
title: I love parser combinators [Haskell]
title: 2D collision detection for SVG - demo of intersection routines with SVG/Javascript
title: Priorities: Solaris vs Linux

That's about enough information about these modules to create all kinds of wonderful things! Just play around a little! :)

Download pyhackerstories.py: pyhackerstories.py
Download pyredditstories.py: pyredditstories.py

Now I'll briefly explain how the curses user interface was made. I'll cover only the ideas and will not go deeply into code.

hacker top, detailed mode

I started with reading Curses Programming with Python howto. This howto explained how to do all the basic curses operations in Python - how to get into curses mode, how to create windows, print colorful text and how to handle user input.

The user interface had two requirements - it had to be responsive while the new stories were being retrieved from the sites, and it had to be independent from the website, meaning that it should be easy to display data from a different website with minor (or no) modifications to the interface code.

The first requirement was satisfied by creating a separate thread (see Retriever class) which periodically called get_stories() and passed the stories to the interface via a queue.

The second requirement was already satisfied when I created the pyredditstories.py and pyhackerstories.py modules. As I mentioned, these modules provided the same interface for retrieving stories and could be used almost interchangeably.

That's basically it! If you have any specific questions, feel free to ask them in the comments!

Ps. I am thinking of releasing Dzone Top and Digg Top, and then merge all these Top programs into a single social news console program. Are there any volunteers who would like to help me?

Download Hacker Top and Reddit Top Programs

Hacker Top

Download link: hacker top program (5611 downloads)

Reddit Top

Download link: reddit top program (4128 downloads)

These programs are released under GNU General Public License.

Comments

July 24, 2008, 10:23

Yes, I am interested in the project :D

bsergean Permalink
July 24, 2008, 17:35

It looks like you don't need it, but you can have a look at cplay code, a python curses apps, to get interesting pieces of code.

bsergean Permalink
July 24, 2008, 17:36

I haven't tried your program but I remember that cplay was handling several "tabs", so you could have one for each website: Digg, Reedit ... for example.

alex dante Permalink
July 24, 2008, 18:47

I highly recommend the Urwid module, which lets you target both curses and the web: https://excess.org/urwid/

Maybe not so important for a console-based app, but hey, you get a web version for free :)

July 24, 2008, 22:13

ilSignorCarlo, I just sent you an email :)

bsergean, thanks for suggestion, i am going to look how it was coded right now!

alex, thanks, i looked at screenshots, looks interesting :)

July 25, 2008, 01:31

These programs are pretty cool. I think an integrated app with tabs for each website would be nicer than multiple apps, or perhaps a command line switch for which site. Also, integration with a browser (console or a call to firefox) when you press the number of a submission would rock.

July 25, 2008, 10:25

So you scraped some pages, put the data into a bunch of objects, and then displayed it with ncurses. Do we really need an explanation of how every trivial program was coded?

Jay Permalink
August 23, 2008, 09:02

@Elerra: Give the guy a break, just because this might not interest you, doesn't mean you have to be a dick about it.

It's really awesome that Peteris publishes his projects and explains how he went about solving a problem.Even if I built a similar app before, always interesting to see how someone else did it.

May 27, 2009, 07:42

hey, very nice coding dude, this is exactly what i have been looking for all day long, I couldnt figure out how to code something useful like this myself.

how long have you been coding to work this out on you're self, im fairly new, few months and into basics..

thanks..

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the word "quake3": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements