digpicz: diggs missing picture section

Remember, I launched Reddit Media: intelligent fun online last week (read how it was made)?

I have been getting emails that it would be a wise idea to launch a Digg media website. Yeah! Why not?
Since Digg already has a video section there is not much point in duplicating it. The new site could just be digg for pictures.

Update 2008.07.30: I received this PDF, which said that I was abusing Digg's trademarks! So I closed the site. You may visit http://digg.picurls.com to see how it looked like. I also zipped up the contents of the site and you may download the whole site digpicz-2008-07-30.zip!

I don't want to use word 'digg' in the domain name because people warned me that the trademark owner could take the domain away from me. I'll just go with a single letter g as "dig" and pictures, to make it shorter picz. So the domain I bought is digpicz.com.

Update: The site has been launched, visit digpicz.com: digg's missing picture section. Time taken to launch the site: ~7 hours.


Reusing the Reddit Media Generator Suite

I released full source code of the reddit media website (reddit media website generator suite (.zip)). It can now be totally reused with minor modifications to suit the digg for pictures website.

Only the following modifications need to be made:

  • A new extractor (data miner) has to be written which goes through all the stories on digg and finds ones with pic/pics/images/etc. words in titles or descriptions (In reddit generator suite it was the reddit_extractor.pl program (in /scripts directory in .zip file)). Digg, as opposite to Reddit, provides a public API to access its stories. I will use this API to go through all the stories and create the initial database of pictures and further monitor digg's front page. This program will be called digg_extractor.pl
  • SQLite database structure has to be changed to include a link to Digg's story, story's description, a link to the user's avatar.
  • The generate_feed function in static HTML page generator (page_gen.pl) has to be updated to create a digpicz rss feed.
  • HTML template files in /templates directory (in the .zip file) need to be updated to give the site more digg-like look.

That's it! A few hours of work and we have a digg for pictures website running!

Digpicz Technical Design

Let's create the data miner first. As I mentioned it's called digg_extractor.pl, and it is a Perl script which uses Digg public API.

First, we need to get familiar with Digg API. Skimming over Basic API Concepts page we find just a few imporant points:

Next, to make our data miner get the stories, let's look at Summary of API Features. It mentions List Stories endpoint which "Fetches a list of stories from Digg." This is exactly what we want!

We are interested only in stories which made it to the front page, the API documentation tells us we should issue a GET /stories/popular request to http://services.digg.com.

I typed the following address in my web browser and got a nice XML response with 10 latest stories:


The documentation also lists count and offset arguments which control number of stories to retrieve and offset in complete story list.

So the general algorithm is clear, start at offset=0, loop until we go through all the stories, parse each bucket of stories and extract stories with pics in them.

We want to use the simplest Perl's library possible to parse XML. There is a great one from CPAN which is perfect for this job. It's called XML::Simple. It provides an XMLin function which given an XML string returns a reference to a parsed hash data structure. Easy as 3.141592!

This script prints out picture stories which made it to the front page in human readable format. Each story is printed as a paragraph:

title: story title
type: story type
desc: story description
url: story url
digg_url: url to original story on digg
category: digg category of the story
short_category: short digg cateogry name
user: name of the user who posted the story
user_pic: url to user pic
date: date story appeared on digg YYYY-MM-DD HH:MM:SS
<new line>

The script has one constant ITEMS_PER_REQUEST which defined how many stories (items) to get per API request. Currently it's set to 15 which is stories per one Digg page.

The script takes an optional argument which specifies how many requests to make. On each request, story offset is advanced by ITEMS_PER_REQUEST. Specifying no argument goes through all the stories which appeared on Digg.

For example, to print out current picture posts which are currently on the front page of Digg, we could use command:

./digg_extractor.pl 1

Here is a sample of real output of this command:

$ ./digg_extractor.pl 1
title: 13 Dumbest Drivers in the World [PICS]
type: pictures
desc: Think of this like an even funnier Darwin awards, but for dumbass driving (and with images).
url: http://wtfzup.com/2007/09/02/unlucky-13-dumbest-drivers-in-the-world/
digg_url: http://digg.com/offbeat_news/13_Dumbest_Drivers_in_the_World_PICS
category: Offbeat News
short_category: offbeat_news
user: suxmonkey
user_pic: http://digg.com/userimages/s/u/x/suxmonkey/large6009.jpg
date: 2007-09-02 14:00:06

This input is then fed into db_inserter.pl script which inserts this data into SQLite database.

Then page_gen.pl is ran which generates the static HTML contents.
Please refer to the original post of the reddit media website generator to find more details.

Summing it up, only one new script had to be written and some minor changes to existing scripts had to be made to generate the new website.

Here is this new script digg_extractor.pl:
digg extractor (perl script, digg picture website generator)

Click http://digg.picurls.com to visit the site!

Here are all the scripts packed together with basic documentation:

Download Digg's Picture Website Generator Scripts

All the scripts in a single .zip:
Download link: digg picture website generator suite (.zip)
Downloaded: 3311 times

For newcomers, digg is a democratic social news website where users decide its contents.

From their faq:

What is Digg?

Digg is a place for people to discover and share content from anywhere on the web. From the biggest online destinations to the most obscure blog, Digg surfaces the best stuff as voted on by our users. You won’t find editors at Digg — we’re here to provide a place where people can collectively determine the value of content and we’re changing the way people consume information online.

How do we do this? Everything on Digg — from news to videos to images to Podcasts — is submitted by our community (that would be you). Once something is submitted, other people see it and Digg what they like best. If your submission rocks and receives enough Diggs, it is promoted to the front page for the millions of our visitors to see.


September 03, 2007, 05:01

Well done Peter! That is great coding.

September 03, 2007, 05:43

Awesome work and congradulations! You might want to add something like 'This site is not affiliated with Digg.com' at the bottom (kind of like Duggtrends has) just to be safe, but man, this is a great thing. Well designed, functional and easy on the eyes! Great job!

September 03, 2007, 07:37

Nice work. Pearl makes me sad though. It does beat curl though. You are definitely a coder from the look of your site. It's very funny to me as a coder because my designer friends always make fun of my design (because I'm a coder).

Keep it up.

jimm Permalink
September 03, 2007, 08:52

I think i just found my new favorite coder blog!

September 03, 2007, 10:04

Very nicely done. I was thinking of making something like this, but using slashdot's miner was pure genius


September 03, 2007, 16:26

Have you not heard of http://www.picli.com or something?

September 03, 2007, 17:04

@Jane: Picli is something totally different.. and looks ugly

September 03, 2007, 18:06

About digpicz...

You should set BODY background color; not everyone uses white. Also visit W3C validator service :)

September 04, 2007, 09:00

This site is getting even more traffic for the Digg pictures that made the homepage

The Digg pictures and Videos should be combined

September 04, 2007, 10:27

This is outstandingly great, can we partner and put something together for www.mediarati.com.
as well as for another domain of mine known as www.digthings.com.

ronald Permalink
September 05, 2007, 05:46

---- picli is far from ugly, check out their latest version that went live today

JLearn Permalink
September 05, 2007, 11:15

Nice one Peter. First Redditmedia, now this!! What next? ;-)

September 05, 2007, 13:11

Just came here from Digg. It's impressive how you manage to create fully fledged looking sites from minimal amounts of code. I don't know much perl myself though but I don't think it could be done as easily in PHP!

I wouldn't agree with the motto that great coders reuse code in general as from my experience, alot of coders reuse code so much that if the finer points of the code require work, they're usually at a loss!

Your site makes a good argument for it though ;)

September 06, 2007, 17:06

JLearn, the next is a popurls like website for pictures!

It will be done differently than redditmedia and digurlz. I am now writing "The Making of picurls, Part I (of II)".

I hope to finish it by Sunday evening. :)

September 10, 2007, 14:17

Nice informative blog, agree on most of the part infact all , but web desing is never been so eazy and cheaply with all latest media.

October 13, 2007, 00:17

Hey, I didn't use your script but I did make a new interface for Digg's picture section at Blocr, check it out.

October 13, 2007, 06:41

Iwo, great job! I love the interface. I see that you use mootools. I'll consider adding some nice effects to my next projects using them as well :)

October 13, 2007, 19:21

Thanks man. I can't wait to check out picurls. I love the fact that you are offering your code and details to everyone, that's the way to go... Too bad I never used perl before...

October 29, 2007, 15:49

Hello, very nice site, keep up good job!
Admin good, very good.

Idetrorce Permalink
December 15, 2007, 13:40

very interesting, but I don't agree with you

January 11, 2008, 05:51

Ok, but would also hope we get a similar one for videos, unless you mean included here.

February 02, 2008, 13:20

chat tahankk you

February 04, 2008, 14:33

thanks a lot

February 05, 2008, 02:05

thanx Peter

jphantom Permalink
February 05, 2008, 22:50

I saw that you use HttpWatch Professional, do you know of a similar opensource linux program?

February 05, 2008, 23:17

jphantom, yes, there are a few open source products.

If you use FireFox browser you can get LiveHeaders extension, but it is a little hard to use.

You can also use Achilles, which is a Man in the Middle proxy:

And also webscarab:

I find HttpWatch the best piece of software for this job (even though it's commercial, but I am ready to pay for it, if it reduces time I spent analysing traffic from minutes to seconds!)

February 07, 2008, 13:41

Iwo, great job! I love the interface. I see that you use mootools. I’ll consider adding some nice effects to my next projects using them as well

March 15, 2008, 21:43

thanks for all comments.

March 20, 2008, 11:52

Snx for you job!
It has very much helped me!

March 23, 2008, 06:31

does anyone knows if there is any other information about this subject in other languages?

May 25, 2008, 04:19

Hi webmaster!

May 25, 2008, 04:29

Hi webmaster!

dan Permalink
June 11, 2008, 18:45

hi im just wondering:
Q. how do you make a second page
on a website and link it
to your homepage?

thanks dan

July 26, 2008, 07:38

Very nice article! Thanks for this!

September 08, 2008, 23:17

Good project thanx

stephanazs Permalink
September 20, 2008, 16:49

Interesting facts.I have bookmarked this site. stephanazs

September 21, 2008, 00:16

does anyone knows if there is any other information about this subject in other languages?

RaiulBaztepo Permalink
March 28, 2009, 22:05

Very Interesting post! Thank you for such interesting resource!
PS: Sorry for my bad english, I'v just started to learn this language ;)
See you!
Your, Raiul Baztepo

April 07, 2009, 23:27

Hi !! ^_^
My name is Piter Kokoniz. oOnly want to tell, that your blog is really cool
And want to ask you: will you continue to post in this blog in future?
Sorry for my bad english:)

Leave a new comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "cloud_27": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.