Remember, I launched Reddit Media: intelligent fun online last week (read how it was made)?
I have been getting emails that it would be a wise idea to launch a Digg media website. Yeah! Why not?
Since Digg already has a video section there is not much point in duplicating it. The new site could just be digg for pictures.
Update 2008.07.30: I received this PDF, which said that I was abusing Digg’s trademarks! So I closed the site. You may visit http://digg.picurls.com to see how it looked like. I also zipped up the contents of the site and you may download the whole site digpicz-2008-07-30.zip!
I don’t want to use word ‘digg’ in the domain name because people warned me that the trademark owner could take the domain away from me. I’ll just go with a single letter g as “dig” and pictures, to make it shorter picz. So the domain I bought is digpicz.com.
Update: The site has been launched, visit digpicz.com: digg’s missing picture section. Time taken to launch the site: ~7 hours.
Reusing the Reddit Media Generator Suite
I released full source code of the reddit media website (reddit media website generator suite (.zip)). It can now be totally reused with minor modifications to suit the digg for pictures website.
Only the following modifications need to be made:
- A new extractor (data miner) has to be written which goes through all the stories on digg and finds ones with pic/pics/images/etc. words in titles or descriptions (In reddit generator suite it was the reddit_extractor.pl program (in /scripts directory in .zip file)). Digg, as opposite to Reddit, provides a public API to access its stories. I will use this API to go through all the stories and create the initial database of pictures and further monitor digg’s front page. This program will be called digg_extractor.pl
- SQLite database structure has to be changed to include a link to Digg’s story, story’s description, a link to the user’s avatar.
- The generate_feed function in static HTML page generator (page_gen.pl) has to be updated to create a digpicz rss feed.
- HTML template files in /templates directory (in the .zip file) need to be updated to give the site more digg-like look.
That’s it! A few hours of work and we have a digg for pictures website running!
Digpicz Technical Design
Let’s create the data miner first. As I mentioned it’s called digg_extractor.pl, and it is a Perl script which uses Digg public API.
First, we need to get familiar with Digg API. Skimming over Basic API Concepts page we find just a few imporant points:
- All requests must include an Application Key which is any valid absolute URI (used just for statistics).
- Returned data can be in XML, JSON or Serialized PHP data formats.
Next, to make our data miner get the stories, let’s look at Summary of API Features. It mentions List Stories endpoint which “Fetches a list of stories from Digg.” This is exactly what we want!
We are interested only in stories which made it to the front page, the API documentation tells us we should issue a GET /stories/popular request to http://services.digg.com.
I typed the following address in my web browser and got a nice XML response with 10 latest stories:
http://services.digg.com/stories/popular?appkey=http%3A%2F%2Fdigpicz.com
The documentation also lists count and offset arguments which control number of stories to retrieve and offset in complete story list.
So the general algorithm is clear, start at offset=0, loop until we go through all the stories, parse each bucket of stories and extract stories with pics in them.
We want to use the simplest Perl’s library possible to parse XML. There is a great one from CPAN which is perfect for this job. It’s called XML::Simple. It provides an XMLin function which given an XML string returns a reference to a parsed hash data structure. Easy as 3.141592!
This script prints out picture stories which made it to the front page in human readable format. Each story is printed as a paragraph:
title: story title type: story type desc: story description url: story url digg_url: url to original story on digg category: digg category of the story short_category: short digg cateogry name user: name of the user who posted the story user_pic: url to user pic date: date story appeared on digg YYYY-MM-DD HH:MM:SS <new line>
The script has one constant ITEMS_PER_REQUEST which defined how many stories (items) to get per API request. Currently it’s set to 15 which is stories per one Digg page.
The script takes an optional argument which specifies how many requests to make. On each request, story offset is advanced by ITEMS_PER_REQUEST. Specifying no argument goes through all the stories which appeared on Digg.
For example, to print out current picture posts which are currently on the front page of Digg, we could use command:
./digg_extractor.pl 1
Here is a sample of real output of this command:
$ ./digg_extractor.pl 1 title: 13 Dumbest Drivers in the World [PICS] type: pictures desc: Think of this like an even funnier Darwin awards, but for dumbass driving (and with images). url: http://wtfzup.com/2007/09/02/unlucky-13-dumbest-drivers-in-the-world/ digg_url: http://digg.com/offbeat_news/13_Dumbest_Drivers_in_the_World_PICS category: Offbeat News short_category: offbeat_news user: suxmonkey user_pic: http://digg.com/userimages/s/u/x/suxmonkey/large6009.jpg date: 2007-09-02 14:00:06
This input is then fed into db_inserter.pl script which inserts this data into SQLite database.
Then page_gen.pl is ran which generates the static HTML contents.
Please refer to the original post of the reddit media website generator to find more details.
Summing it up, only one new script had to be written and some minor changes to existing scripts had to be made to generate the new website.
Here is this new script digg_extractor.pl:
digg extractor (perl script, digg picture website generator) (1031)
Click http://digg.picurls.com to visit the site!
Here are all the scripts packed together with basic documentation:
All the scripts in a single .zip:
Download link: digg picture website generator suite (.zip)
Downloaded: 1570 times
For newcomers, digg is a democratic social news website where users decide its contents.
From their faq:
What is Digg?
Digg is a place for people to discover and share content from anywhere on the web. From the biggest online destinations to the most obscure blog, Digg surfaces the best stuff as voted on by our users. You won’t find editors at Digg — we’re here to provide a place where people can collectively determine the value of content and we’re changing the way people consume information online.
How do we do this? Everything on Digg — from news to videos to images to Podcasts — is submitted by our community (that would be you). Once something is submitted, other people see it and Digg what they like best. If your submission rocks and receives enough Diggs, it is promoted to the front page for the millions of our visitors to see.

(14 votes, average: 4.5 out of 5)
|
|
|


September 3rd, 2007 at 5:01 am
Well done Peteris! That is great coding.
September 3rd, 2007 at 5:43 am
Awesome work and congradulations! You might want to add something like ‘This site is not affiliated with Digg.com’ at the bottom (kind of like Duggtrends has) just to be safe, but man, this is a great thing. Well designed, functional and easy on the eyes! Great job!
September 3rd, 2007 at 7:37 am
Nice work. Pearl makes me sad though. It does beat curl though. You are definitely a coder from the look of your site. It’s very funny to me as a coder because my designer friends always make fun of my design (because I’m a coder).
Keep it up.
September 3rd, 2007 at 8:52 am
I think i just found my new favorite coder blog!
September 3rd, 2007 at 10:04 am
Very nicely done. I was thinking of making something like this, but using slashdot’s miner was pure genius
Anirudh
September 3rd, 2007 at 1:49 pm
[…] It took Peteris only 7 hours to create to page - read up here for the details. […]
September 3rd, 2007 at 2:32 pm
[…] new missing pics section has explained how he built the site, using Digg’s API. Check it out here. As a bonus, he is going to release the full source code for everyone. Hopefully Digg.com does take […]
September 3rd, 2007 at 4:26 pm
Have you not heard of http://www.picli.com or something?
September 3rd, 2007 at 5:04 pm
@Jane: Picli is something totally different.. and looks ugly
September 3rd, 2007 at 6:06 pm
About digpicz…
You should set BODY background color; not everyone uses white. Also visit W3C validator service
September 4th, 2007 at 9:00 am
This site is getting even more traffic for the Digg pictures that made the homepage
The Digg pictures and Videos should be combined
September 4th, 2007 at 10:27 am
This is outstandingly great, can we partner and put something together for www.mediarati.com.
as well as for another domain of mine known as www.digthings.com.
September 4th, 2007 at 1:48 pm
[…] of how the site was made can be found here. Digpicz pulls its data from the Digg API so it should be fairly accurate and thorough, although as […]
September 4th, 2007 at 8:01 pm
[…] that have pic/pics/images/etc. words in titles or descriptions and puts them on his page. He has a post on his site about what it took to make it work and if you read thru it you will see that it took a good amount […]
September 4th, 2007 at 11:37 pm
[…] get more details on how Peteris build the site, see his blog. addthis_url = […]
September 5th, 2007 at 2:16 am
[…] détails de la création du site sont disponibles ici. DigPicz tire ses données de l’API de Digg ; ce qui devrait permettre des résultats assez […]
September 5th, 2007 at 5:46 am
—- picli is far from ugly, check out their latest version that went live today
September 5th, 2007 at 11:15 am
Nice one Peteris. First Redditmedia, now this!! What next?
September 5th, 2007 at 1:11 pm
Just came here from Digg. It’s impressive how you manage to create fully fledged looking sites from minimal amounts of code. I don’t know much perl myself though but I don’t think it could be done as easily in PHP!
I wouldn’t agree with the motto that great coders reuse code in general as from my experience, alot of coders reuse code so much that if the finer points of the code require work, they’re usually at a loss!
Your site makes a good argument for it though
September 6th, 2007 at 12:14 pm
[…] the lack of the highly requested pictures section at Digg. Nothing spectacular about that, yet the writeup on how the author did it is interesting (I always find it interesting to see how someone else solves something) Spread the […]
September 6th, 2007 at 5:06 pm
JLearn, the next is a popurls like website for pictures!
It will be done differently than redditmedia and digurlz. I am now writing “The Making of picurls, Part I (of II)”.
I hope to finish it by Sunday evening.
September 10th, 2007 at 8:40 am
[…] 5. catonmat.net - Designing Digg Picture Website in a Matter of Hours […]
September 10th, 2007 at 2:17 pm
Nice informative blog, agree on most of the part infact all , but web desing is never been so eazy and cheaply with all latest media.
http://www.websites.design.com.au/
September 15th, 2007 at 11:45 pm
[…] I was creating the second site, digpicz, it struck me - why not create a single site similar to popurls which aggregates posts to […]
October 13th, 2007 at 12:17 am
Hey, I didn’t use your script but I did make a new interface for Digg’s picture section at Blocr, check it out.
October 13th, 2007 at 6:41 am
Iwo, great job! I love the interface. I see that you use mootools. I’ll consider adding some nice effects to my next projects using them as well
October 13th, 2007 at 7:21 pm
Thanks man. I can’t wait to check out picurls. I love the fact that you are offering your code and details to everyone, that’s the way to go… Too bad I never used perl before…
October 29th, 2007 at 3:49 pm
Hello, very nice site, keep up good job!
Admin good, very good.
November 10th, 2007 at 4:27 pm
[…] read more | digg story […]
November 27th, 2007 at 11:26 am
[…] how long does it take to create a pictures section when it took this guy 7 hours [http://www.catonmat.net/blog/designing-digg-picture-website/]? And even if they wanted to make a really good pictures section with great integration, how about […]
November 28th, 2007 at 9:06 am
[…] end of the year.” But seriously, how long does it take to create a pictures section when it took this guy 7 hours? And even if they wanted to make a really good pictures section with great integration, how […]
December 2nd, 2007 at 2:10 pm
[…] end of the year.” But seriously, how long does it take to create a pictures section when it took this guy 7 hours? And even if they wanted to make a really good pictures section with great integration, how […]
December 8th, 2007 at 3:04 am
hydrocodone prescription 260free online prescription hydrocodone online 260free
December 15th, 2007 at 1:40 pm
very interesting, but I don’t agree with you
Idetrorce
December 17th, 2007 at 4:43 am
[…] Dude makes a Digg pics page in 7 hours - takes Kevin Rose 12 months […]
January 11th, 2008 at 5:51 am
Ok, but would also hope we get a similar one for videos, unless you mean included here.
February 2nd, 2008 at 1:20 pm
chat tahankk you
February 4th, 2008 at 2:11 pm
Thanks great post…
February 4th, 2008 at 2:12 pm
Iwo, great job! I love the interface. I see that you use mootools. I’ll consider adding some nice effects to my next projects using them as well
February 4th, 2008 at 2:12 pm
nice thanksss
February 4th, 2008 at 2:13 pm
Just came here from Digg. It’s impressive how you manage to create fully fledged looking sites from minimal amounts of code. I don’t know much perl myself though but I don’t think it could be done as easily in PHP!
I wouldn’t agree with the motto that great coders reuse code in general as from my experience, alot of coders reuse code so much that if the finer points of the code require work, they’re usually at a loss!
Your site makes a good argument for it though
February 4th, 2008 at 2:13 pm
Dude makes a Digg pics page in 7 hours - takes Kevin Rose 12 month
February 4th, 2008 at 2:13 pm
greattttttttttt THANKS !!!
February 4th, 2008 at 2:14 pm
I think i just found my new favorite coder blog!
February 4th, 2008 at 2:14 pm
thanks dude
February 4th, 2008 at 2:33 pm
thanks a lot
February 5th, 2008 at 1:31 am
thanks
February 5th, 2008 at 2:05 am
thanx Peteris
February 5th, 2008 at 7:04 am
100 mb hosting almak istiyorum http://www.esintihosting.com u tavsiye ediyo herkes orda 24 ytl ama 10 ytl e de satanlar var acaba hangisinden alsam bi fikir verirmisiniz
February 5th, 2008 at 10:40 pm
Tanks !
February 5th, 2008 at 10:50 pm
I saw that you use HttpWatch Professional, do you know of a similar opensource linux program?
February 5th, 2008 at 11:17 pm
jphantom, yes, there are a few open source products.
If you use FireFox browser you can get LiveHeaders extension, but it is a little hard to use.
You can also use Achilles, which is a Man in the Middle proxy:
http://www.mavensecurity.com/achilles
And also webscarab:
http://dawes.za.net/rogan/webscarab/
I find HttpWatch the best piece of software for this job (even though it’s commercial, but I am ready to pay for it, if it reduces time I spent analysing traffic from minutes to seconds!)
February 7th, 2008 at 1:41 pm
Iwo, great job! I love the interface. I see that you use mootools. I’ll consider adding some nice effects to my next projects using them as well
March 15th, 2008 at 9:43 pm
thanks for all comments.
March 20th, 2008 at 11:52 am
Snx for you job!
It has very much helped me!
March 23rd, 2008 at 6:31 am
does anyone knows if there is any other information about this subject in other languages?
May 25th, 2008 at 4:19 am
Hi webmaster!
May 25th, 2008 at 4:29 am
Hi webmaster!
June 11th, 2008 at 6:45 pm
hi im just wondering:
Q. how do you make a second page
on a website and link it
to your homepage?
thanks dan
July 22nd, 2008 at 8:24 am
[…] Designing Digg Picture Website in a Matter of Hours (43′962 views) […]
July 22nd, 2008 at 9:52 am
[…] http://www.catonmat.net/blog/designing-digg-picture-website/ […]
July 26th, 2008 at 7:38 am
Very nice article! Thanks for this!
doumo_arigatou
July 30th, 2008 at 3:23 am
[…] look at it: download digpicz.com (17 mb). Digpicz.com was released open-source from the beginning: read how it was designed and download source code. If you loved digpicz, you might like picurls.com, which is popurls for […]